On Tue, Sep 9, 2025 at 1:35 AM Hanjun Guo <guohanjun@xxxxxxxxxx> wrote: > > On 2025/8/30 11:02, Jiaqi Yan wrote: > > EINJ driver today only allows injection request to go through for two > > kinds of IORESOURCE_MEM: IORES_DESC_PERSISTENT_MEMORY and > > IORES_DESC_SOFT_RESERVED. This check prevents user of EINJ to test > > memory corrupted in many interesting areas: > > > > - Legacy persistent memory > > - Memory claimed to be used by ACPI tables or NV storage > > - Kernel crash memory and others > > > > There is need to test how kernel behaves when something consumes memory > > errors in these memory regions. For example, if certain ACPI table is > > corrupted, does kernel crash gracefully to prevent "silent data > > corruption". For another example, legacy persistent memory, when managed > > by Device DAX, does support recovering from Machine Check Exception > > raised by memory failure, hence worth to be tested. > > > > However, attempt to inject memory error via EINJ to legacy persistent > > memory or ACPI owned memory fails with -EINVAL. > > > > Allow EINJ to inject at address except it is MMIO. Leave it to the BIOS > > or firmware to decide what is a legitimate injection target. > > > > In addition to the test done in [1], on a machine having the following > > iomem resources: > > > > ... > > 01000000-08ffffff : Crash kernel > > 768f0098-768f00a7 : APEI EINJ > > ... > > 768f4000-77323fff : ACPI Non-volatile Storage > > 77324000-777fefff : ACPI Tables > > 777ff000-777fffff : System RAM > > 77800000-7fffffff : Reserved > > 80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff] > > 90040000-957fffff : PCI Bus 0000:00 > > ... > > 300000000-3ffffffff : Persistent Memory (legacy) > > ... > > > > I commented __einj_error_inject during the test and just tested when > > injecting a memory error at each start address shown above: > > - 0x80000000 and 0x90040000 both failed with EINVAL > > - request passed through for all other addresses > > > > ... > > > Changelog > > > > v1 [1] -> v2: > > - In addition to allow IORES_DESC_PERSISTENT_MEMORY_LEGACY, open the > > door wider and only exclude MMIO per suggestion from Tony [2] > > - Rebased to commit 11e7861d680c ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm") > > > > [1] https://lore.kernel.org/linux-acpi/20250825223348.3780279-1-jiaqiyan@xxxxxxxxxx > > [2] https://lore.kernel.org/linux-acpi/SJ1PR11MB60835824926BEE57F094DE6FFC39A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > For the Changelog, it's better to move it to below... > > > > > Signed-off-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx> > > --- > > ... here. Noted, thanks! > > > drivers/acpi/apei/einj-core.c | 50 +++++++++++++++++++++++++++++------ > > 1 file changed, 42 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c > > index 2561b045acc7b..904930409fdb2 100644 > > --- a/drivers/acpi/apei/einj-core.c > > +++ b/drivers/acpi/apei/einj-core.c > > @@ -656,6 +656,44 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, > > return rc; > > } > > > > +/* Allow almost all types of address except MMIO. */ > > +static bool is_allowed_range(u64 base_addr, u64 size) > > +{ > > + int i; > > + /* > > + * MMIO region is usually claimed with IORESOURCE_MEM + IORES_DESC_NONE. > > + * However, IORES_DESC_NONE is treated like a wildcard when we check if > > + * region intersects with known resource. So do an allow list check for > > + * IORES_DESCs that definitely or most likely not MMIO. > > + */ > > + int non_mmio_desc[] = { > > + IORES_DESC_CRASH_KERNEL, > > + IORES_DESC_ACPI_TABLES, > > + IORES_DESC_ACPI_NV_STORAGE, > > + IORES_DESC_PERSISTENT_MEMORY, > > + IORES_DESC_PERSISTENT_MEMORY_LEGACY, > > + /* Treat IORES_DESC_DEVICE_PRIVATE_MEMORY as MMIO. */ > > + IORES_DESC_RESERVED, > > + IORES_DESC_SOFT_RESERVED, > > + IORES_DESC_CXL, > > Sorry, I'm not familiar with CXL, but I see the code in einj_error_inject(): > > /* > * Injections targeting a CXL 1.0/1.1 port have to be injected > * via the einj_cxl_rch_error_inject() path as that does the proper > * validation of the given RCRB base (MMIO) address. > */ > if (einj_is_cxl_error_type(type) && (flags & SETWA_FLAGS_MEM)) > return -EINVAL; > > So eject an error for CXL memory, there is a new interface which > means it's not handled here, do we need to remove IORES_DESC_CXL? Thanks for catching this. I agree we should remove IORES_DESC_CXL. It is unnecessary to check if base_addr intersects IORES_DESC_CXL. My previous (and totally wrong) thought was, if the error type happens to be vendor-defined CXL memory, that is, for some reason a vendor implements EINJ for CXL but not using type bit[12:17], einj_is_cxl_error_type will be false and the checks for base_addr will happen, and I wanted the check pass through. Now I think this weird case should be impossible given no vendor ever tries to add IORES_DESC_CXL to the previous check. Will fix both places in V3. > > Thanks > Hanjun