RE: PATCH v3 ACPI: APEI: GHES: Don't offline huge pages just because BIOS asked

"Luck, Tony" <tony.luck@xxxxxxxxx> · Fri, 5 Sep 2025 19:58:19 +0000

> So the issue is the result of inaccurate MCA record about per rank CE
> threshold being crossed. If OS offline the indicted page, it might be
> signaled to offline another 4K page in the same rank upon access.

It appears that the BIOS that resulted in this report sensibly treats crossing
the rank error threshold as needing a one-time report via GHES.

> Both MCA and offline-op are performance hitter, and as argued by this
> patch, offline doesn't help except loosing a already corrected page.
>
> Here we choose to bypass hugetlb page simply because it's huge.  Is it
> possible to argue that because the page is huge, it's less likely to get
> another MCA on another page from the same rank?

If there really is a problem with the rank, it likely affects most pages (or
at least most pages on the same NUMA node) because memory access
is (usually) interleaved between channels, and accesses within a 4K page
may hash to different ranks withing a channel.

> A while back this patch
> 56374430c5dfc mm/memory-failure: userspace controls soft-offlining pages
> has provided userspace control over whether to soft offline, could it be
> a more preferable option?

Thanks for pointing that one out. I'll feed that back to the original reporter
and see if it is an acceptable solution.

> I don't know, the patch itself is fine, it's the issue that it has
> exposed that is more concerning.

Agreed. The root problem is the BIOS using this threshold reporting
mechanism, without there being a way for the OS to determine the
scope of memory affected by the threshold.

When this was originally implemented, the expectation was that the
scope would be a 4K page.

-Tony