From: Shiju Jose <shiju.jose@xxxxxxxxxx> Update the Documentation/edac/scrub.rst to include usecases and policies for CXL memory device-based, CXL region-based patrol scrub control and CXL Error Check Scrub (ECS). Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> --- Documentation/edac/scrub.rst | 75 ++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst index daab929cdba1..6132853a02fe 100644 --- a/Documentation/edac/scrub.rst +++ b/Documentation/edac/scrub.rst @@ -264,3 +264,78 @@ Sysfs files are documented in `Documentation/ABI/testing/sysfs-edac-scrub` `Documentation/ABI/testing/sysfs-edac-ecs` + +Examples +-------- + +The usage takes the form shown in these examples: + +1. CXL memory Patrol Scrub + +The following are the usecases identified why we might increase the scrub rate. + +- Scrubbing is needed at device granularity because a device is showing + unexpectedly high errors, the scrub control needs to be at device + granularity + +- Scrubbing may apply to memory that isn't online at all yet.Likely this + is setting system wide defaults on boot. + +- Scrubbing at higher rate because software has decided that we want + more reliability for particular data, calling this Differentiated + Reliability. That data sits in a region which may cover part of multiple + devices. The region interfaces are about supporting this use case. + +1.1. Device based scrubbing + +CXL memory is exposed to memory management subsystem and ultimately userspace +via CXL devices. + +When combining control via the device interfaces and region interfaces see +1.2 Region bases scrubbing. + +Sysfs files for scrubbing are documented in +`Documentation/ABI/testing/sysfs-edac-scrub` + +1.2. Region based scrubbing + +CXL memory is exposed to memory management subsystem and ultimately userspace +via CXL regions. CXL Regions represent mapped memory capacity in system +physical address space. These can incorporate one or more parts of multiple CXL +memory devices with traffic interleaved across them. The user may want to control +the scrub rate via this more abstract region instead of having to figure out the +constituent devices and program them separately. The scrub rate for each device +covers the whole device. Thus if multiple regions use parts of that device then +requests for scrubbing of other regions may result in a higher scrub rate than +requested for this specific region. + +Userspace must follow below set of rules on how to set the scrub rates for any +mixture of requirements. + +1. Taking each region in turn from lowest desired scrub rate to highest and set + their scrub rates. Later regions may override the scrub rate on individual + devices (and hence potentially whole regions). + +2. Take each device for which enhanced scrubbing is required (higher rate) and + set those scrub rates. This will override the scrub rates of individual devices + leaving any that are not specifically set to scrub at the maximum rate required + for any of the regions they are involved in backing. + +Sysfs files for scrubbing are documented in +`Documentation/ABI/testing/sysfs-edac-scrub` + +2. CXL memory Error Check Scrub (ECS) + +The Error Check Scrub (ECS) feature enables a memory device to perform error +checking and correction (ECC) and count single-bit errors. The associated +memory controller triggers the ECS mode with a trigger sent to the memory +device. However, CXL ECS control allows the user to change the attributes +for error count mode and threshold for reporting errors and reset the ECS +counter only. Thus, the scope of start Error Check Scrub on a memory device +lies within a memory controller or platform when it is detecting unexpectedly +high errors. Userspace allows to control the error count mode, threshold +number of errors for a segment count indicating a number of segments +having at least a threshold number of errors and reset the ECS counter. + +Sysfs files for scrubbing are documented in +`Documentation/ABI/testing/sysfs-edac-ecs` -- 2.43.0