On 10.09.25 18:15, Kyle Meyer wrote:
Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
Since HugeTLB pages are preallocated, reducing the available HugeTLB
page pool can cause allocation failures.
/proc/sys/vm/enable_soft_offline provides a sysctl interface to
disable/enable soft offline:
0 - Soft offline is disabled.
1 - Soft offline is enabled.
The current sysctl interface does not distinguish between HugeTLB pages
and other page types.
Disable soft offline for HugeTLB pages by default (1) and extend the
sysctl interface to preserve existing behavior (2):
0 - Soft offline is disabled.
1 - Soft offline is enabled (excluding HugeTLB pages).
2 - Soft offline is enabled (including HugeTLB pages).
Update documentation for the sysctl interface, reference the sysctl
interface in the sysfs ABI documentation, and update HugeTLB soft
offline selftests.
I'm sure you spotted that the documentation for
"/sys/devices/system/memory/soft_offline_pag" resides under "testing".
If your read about MADV_SOFT_OFFLINE in the man page it clearly says:
"This feature is intended for testing of memory error-handling code; it
is available only if the kernel was configured with CONFIG_MEMORY_FAILURE."
So I'm sorry to say: I miss why we should add all this complexity to
make a feature used for testing soft-offlining work differently for
hugetlb folios -- with a testing interface.
--
Cheers
David / dhildenb