On 8/20/25 8:06 AM, Fabio M. De Francesco wrote: > Add documentation on how to resolve conflicts between CXL Fixed Memory > Windows, Platform Low Memory Holes, intermediate Switch and Endpoint > Decoders. > > Cc: Ira Weiny <ira.weiny@xxxxxxxxx> > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@xxxxxxxxxxxxxxx> > --- > > v3 -> v4: Show and explain how CFMWS, Root Decoders, Intermediate > Switch and Endpoint Decoders match and attach Regions in > x86 platforms with Low Memory Holes (Dave, Gregory, Ira) > Remove a wrong argument about large interleaves (Jonathan) > > v2 -> v3: Rework a few phrases for better clarity. > Fix grammar and syntactic errors (Randy, Alok). > Fix semantic errors ("size does not comply", Alok). > Fix technical errors ("decoder's total memory?", Alok). > > v1 -> v2: Rewrite "Summary of the Change" section, 3r paragraph. > > Documentation/driver-api/cxl/conventions.rst | 111 +++++++++++++++++++ > 1 file changed, 111 insertions(+) > > diff --git a/Documentation/driver-api/cxl/conventions.rst b/Documentation/driver-api/cxl/conventions.rst > index da347a81a237..714240ed2e04 100644 > --- a/Documentation/driver-api/cxl/conventions.rst > +++ b/Documentation/driver-api/cxl/conventions.rst > @@ -45,3 +45,114 @@ Detailed Description of the Change > ---------------------------------- > > <Propose spec language that corrects the conflict.> > + > + > +Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders > +============================================================================ > + > +Document > +-------- > + > +CXL Revision 3.2, Version 1.0 > + > +License > +------- > + > +SPDX-License Identifier: CC-BY-4.0 > + > +Creator/Contributors > +-------------------- > + > +Fabio M. De Francesco, Intel > +Dan J. Williams, Intel > +Mahesh Natu, Intel > + > +Summary of the Change > +--------------------- > + > +According to the current CXL Specifications (Revision 3.2, Version 1.0), spell out CXL on first use > +the CXL Fixed Memory Window Structure (CFMWS) describes zero or more Host > +Physical Address (HPA) windows associated with each CXL Host Bridge. Each > +window represents a contiguous HPA range that may be interleaved across > +one or more targets, including CXL Host Bridges. Each window has a set of > +restrictions that govern its usage. It is the OSPM’s responsibility to spell out OSPM on first use. > +utilize each window for the specified use. > + > +Table 9-22 states the Window Size field contains the total number of > +consecutive bytes of HPA this window represents. This value must be a > +multiple of the Number of Interleave Ways * 256 MB. > + > +Platform Firmware (BIOS) might reserve physical addresses below 4 GB, > +such as the Low Memory Hole for PCIe MMIO. In such cases, the CFMWS Range Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a memory gap such as the Low Memory Hole for PCIe MMIO may exist. > +Size may not adhere to the NIW * 256 MB rule. > + > +On these systems, BIOS publishes CFMWS to communicate the active System > +Physical Address (SPA) ranges that map to a subset of the Host Physical > +Address (HPA) ranges. The SPA range trims out the hole, resulting in lost So in the first paragraph, HPA is said to be described by CFMWS. But here a brand new term SPA is introduced. I think you may need a paragraph above this to talk about SPA vs HPA and SPA's relationship to CFMWS. Otherwise I think unless the reader is knowledgeable all this, it is very confusing. > +capacity in the endpoint with no SPA to map to the CXL HPA range that > +exceeds the matching CFMWS range. > + > +E.g, a real x86 platform with two CFMWS, 384 GB total memory, and LMH > +starting at 2 GB: > + > +Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways | Granularity > + 0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 | 256 > + 1 | 4 GB | 380 GB | 0 GB | 380 GB | 12 | 256 > + > +HDM decoder base and HDM decoder size represent all the 12 Endpoint > +Decoders of a 12 way region and all the intermediate Switch Decoders. > +They are configured by the BIOS according to the NIW * 256MB rule, > +resulting in a HPA range size of 3GB. > + > +The CFMWS Base and CFMWS Size are used to configure the Root Decoder HPA > +range base and size. CFMWS cannot intersect Memory Holes, then the CFMWS[0] > +size is smaller (2GB) than that of the Switch and Endpoint Decoders that > +make the hierarchy (3GB). > + > +On that platform, only the first 2GB will be potentially usable but, > +because of the current specs, Linux fails to make them available to the > +users. The driver expects that Root Decoder HPA size, which is equal to > +the CFMWS from which it is configured, to be greater or equal to the > +matching Switch and Endpoint HDM Decoders. > + > +The CXL driver fails to construct Regions and to attach Endpoint and > +intermediate Switch Decoders to those Regions after their construction. > + > +In order to succeed with Region construction and Decoders attachment, > +Linux must construct Regions with Root Decoders size, and then attach to 'a Region' and 'Root Decoder'? > +them all the intermediate Switch and Endpoint Decoders that are part of the > +hierarchy, even though the Decoders HPA range sizes may be larger than > +those Regions whose sizes are trimmed by Low Memory Holes. > + > +Benefits of the Change > +---------------------- > + > +Without this change, the OSPM wouldn't match Intermediate and Endpoint s/Without this change,/Without the change/ > +Decoders with Root Decoders configured with CFMWS HPA sizes that don't > +align with the NIW * 256MB constraint, leading to lost memdev capacity. s/, leading/and leads to/ DJ > +This change allows the OSPM to construct Regions and attach Intermediate > +Switch and Endpoint Decoders to them, so that the addressable part of the > +memory devices total capacity is not lost. > + > +References > +---------- > + > +Compute Express Link Specification Revision 3.2, Version 1.0 > +<https://www.computeexpresslink.org/> > + > +Detailed Description of the Change > +---------------------------------- > + > +The description of the Window Size field in table 9-22 needs to account > +for platforms with Low Memory Holes, where SPA ranges might be subsets of > +the endpoints' HPA. Therefore, it has to be changed to the following: > + > +"The total number of consecutive bytes of HPA this window represents. > +This value shall be a multiple of NIW * 256 MB. On platforms that reserve > +physical addresses below 4 GB, such as the Low Memory Hole for PCIe MMIO > +on x86 or a requirement for greater than 8-way interleave CXL Regions > +starting at address 0, an instance of CFMWS whose Base HPA is 0 might have > +a window size that doesn't align with the NIW * 256 MB constraint. Note > +that the matching intermediate Switch and Endpoint Decoders' HPA range > +sizes must still align to the above-mentioned rule, but the memory capacity > +that exceeds the CFMWS window size will not be accessible."