Re: PATCH v3 ACPI: APEI: GHES: Don't offline huge pages just because BIOS asked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9/5/2025 12:58 PM, Luck, Tony wrote:
So the issue is the result of inaccurate MCA record about per rank CE
threshold being crossed. If OS offline the indicted page, it might be
signaled to offline another 4K page in the same rank upon access.

It appears that the BIOS that resulted in this report sensibly treats crossing
the rank error threshold as needing a one-time report via GHES.

Both MCA and offline-op are performance hitter, and as argued by this
patch, offline doesn't help except loosing a already corrected page.

Here we choose to bypass hugetlb page simply because it's huge.  Is it
possible to argue that because the page is huge, it's less likely to get
another MCA on another page from the same rank?

If there really is a problem with the rank, it likely affects most pages (or
at least most pages on the same NUMA node) because memory access
is (usually) interleaved between channels, and accesses within a 4K page
may hash to different ranks withing a channel.

A while back this patch
56374430c5dfc mm/memory-failure: userspace controls soft-offlining pages
has provided userspace control over whether to soft offline, could it be
a more preferable option?

Thanks for pointing that one out. I'll feed that back to the original reporter
and see if it is an acceptable solution.

I don't know, the patch itself is fine, it's the issue that it has
exposed that is more concerning.

Agreed. The root problem is the BIOS using this threshold reporting
mechanism, without there being a way for the OS to determine the
scope of memory affected by the threshold.

When this was originally implemented, the expectation was that the
scope would be a 4K page.

Thanks!

BTW, forgot to ask another question.
	ghes_do_proc
	  bool sync = is_hest_sync_notify(ghes);
	  [..]
		queued = ghes_handle_memory_failure(gdata, sev, sync);
	  [..]
	  if (sync && !queued) {
	      force_sig(SIGBUS);
The question is, in the CE MCE case, 'sync' is never 'true' by design, correct?

thanks,
-jane


	
	



-Tony





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux