On Wed, Sep 10, 2025 at 09:44:24AM -0700, Jiaqi Yan wrote: > On Wed, Sep 10, 2025 at 9:16 AM Kyle Meyer <kyle.meyer@xxxxxxx> wrote: > > > > Soft offlining a HugeTLB page reduces the available HugeTLB page pool. > > Since HugeTLB pages are preallocated, reducing the available HugeTLB > > page pool can cause allocation failures. > > > > /proc/sys/vm/enable_soft_offline provides a sysctl interface to > > disable/enable soft offline: > > > > 0 - Soft offline is disabled. > > 1 - Soft offline is enabled. > > > > The current sysctl interface does not distinguish between HugeTLB pages > > and other page types. > > > > Disable soft offline for HugeTLB pages by default (1) and extend the > > sysctl interface to preserve existing behavior (2): > > > > 0 - Soft offline is disabled. > > 1 - Soft offline is enabled (excluding HugeTLB pages). > > 2 - Soft offline is enabled (including HugeTLB pages). > > > > Update documentation for the sysctl interface, reference the sysctl > > interface in the sysfs ABI documentation, and update HugeTLB soft > > offline selftests. > > > > Reported-by: Shawn Fan <shawn.fan@xxxxxxxxx> > > Suggested-by: Tony Luck <tony.luck@xxxxxxxxx> > > Signed-off-by: Kyle Meyer <kyle.meyer@xxxxxxx> > > --- > > > > Tony's original patch disabled soft offline for HugeTLB pages when > > a correctable memory error reported via GHES (with "error threshold > > exceeded" set) happened to be on a HugeTLB page: > > > > https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@xxxxxxxxx > > > > This patch disables soft offline for HugeTLB pages by default > > (not just from GHES). > > > > --- > > .../ABI/testing/sysfs-memory-page-offline | 6 ++++ > > Documentation/admin-guide/sysctl/vm.rst | 18 ++++++++--- > > mm/memory-failure.c | 21 ++++++++++-- > > .../selftests/mm/hugetlb-soft-offline.c | 32 +++++++++++++------ > > 4 files changed, 60 insertions(+), 17 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline > > index 00f4e35f916f..befb89ae39ec 100644 > > --- a/Documentation/ABI/testing/sysfs-memory-page-offline > > +++ b/Documentation/ABI/testing/sysfs-memory-page-offline > > @@ -20,6 +20,12 @@ Description: > > number, or a error when the offlining failed. Reading > > the file is not allowed. > > > > + Soft-offline can be disabled/enabled via sysctl: > > + /proc/sys/vm/enable_soft_offline > > + > > + For details, see: > > + Documentation/admin-guide/sysctl/vm.rst > > + > > What: /sys/devices/system/memory/hard_offline_page > > Date: Sep 2009 > > KernelVersion: 2.6.33 > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > > index 4d71211fdad8..ae56372bd604 100644 > > --- a/Documentation/admin-guide/sysctl/vm.rst > > +++ b/Documentation/admin-guide/sysctl/vm.rst > > @@ -309,19 +309,29 @@ physical memory) vs performance / capacity implications in transparent and > > HugeTLB cases. > > > > For all architectures, enable_soft_offline controls whether to soft offline > > -memory pages. When set to 1, kernel attempts to soft offline the pages > > -whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to > > -the request to soft offline the pages. Its default value is 1. > > +memory pages: > > + > > +- 0: Soft offline is disabled. > > +- 1: Soft offline is enabled (excluding HugeTLB pages). > > +- 2: Soft offline is enabled (including HugeTLB pages). > > Would it be better to keep/inherit the previous documented behavior "1 > - Soft offline is enabled (no matter what type of the page is)"? Thus > it will have no impact to users that are very nervous about corrected > memory errors and willing to lose hugetlb page. Something like: > > enum soft_offline { > SOFT_OFFLINE_DISABLED = 0, > SOFT_OFFLINE_ENABLED, > SOFT_OFFLINE_ENABLED_SKIP_HUGETLB, > // SOFT_OFFLINE_ENABLED_SKIP_XXX... > }; I don't have a strong opinion on the default because there's a sysctl interface, but that seems reasonable. I'll wait for more feedback before putting together a v2. > > + > > +The default is 1. > > + > > +If soft offline is disabled for the requested page type, EOPNOTSUPP is returned. > > > > It is worth mentioning that after setting enable_soft_offline to 0, the > > following requests to soft offline pages will not be performed: > > > > +- Request to soft offline from sysfs (soft_offline_page). > > + > > - Request to soft offline pages from RAS Correctable Errors Collector. > > > > -- On ARM, the request to soft offline pages from GHES driver. > > +- On ARM and X86, the request to soft offline pages from GHES driver. > > > > - On PARISC, the request to soft offline pages from Page Deallocation Table. > > > > +Note: Soft offlining a HugeTLB page reduces the HugeTLB page pool. > > + > > extfrag_threshold > > ================= > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index fc30ca4804bf..cb59a99b48c5 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -64,11 +64,18 @@ > > #include "internal.h" > > #include "ras/ras_event.h" > > > > +enum soft_offline { > > + SOFT_OFFLINE_DISABLED = 0, > > + SOFT_OFFLINE_ENABLED_SKIP_HUGETLB, > > + SOFT_OFFLINE_ENABLED > > +}; > > + > > static int sysctl_memory_failure_early_kill __read_mostly; > > > > static int sysctl_memory_failure_recovery __read_mostly = 1; > > > > -static int sysctl_enable_soft_offline __read_mostly = 1; > > +static int sysctl_enable_soft_offline __read_mostly = > > + SOFT_OFFLINE_ENABLED_SKIP_HUGETLB; > > > > atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); > > > > @@ -150,7 +157,7 @@ static const struct ctl_table memory_failure_table[] = { > > .mode = 0644, > > .proc_handler = proc_dointvec_minmax, > > .extra1 = SYSCTL_ZERO, > > - .extra2 = SYSCTL_ONE, > > + .extra2 = SYSCTL_TWO, > > } > > }; > > > > @@ -2799,12 +2806,20 @@ int soft_offline_page(unsigned long pfn, int flags) > > return -EIO; > > } > > > > - if (!sysctl_enable_soft_offline) { > > + if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) { > > pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n"); > > put_ref_page(pfn, flags); > > return -EOPNOTSUPP; > > } > > > > + if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGETLB) { > > + if (folio_test_hugetlb(pfn_folio(pfn))) { > > + pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n"); > > + put_ref_page(pfn, flags); > > + return -EOPNOTSUPP; > > + } > > + } > > + > > mutex_lock(&mf_mutex); > > > > if (PageHWPoison(page)) { > > diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c > > index f086f0e04756..7e2873cd0a6d 100644 > > --- a/tools/testing/selftests/mm/hugetlb-soft-offline.c > > +++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c > > @@ -1,10 +1,15 @@ > > // SPDX-License-Identifier: GPL-2.0 > > /* > > * Test soft offline behavior for HugeTLB pages: > > - * - if enable_soft_offline = 0, hugepages should stay intact and soft > > - * offlining failed with EOPNOTSUPP. > > - * - if enable_soft_offline = 1, a hugepage should be dissolved and > > - * nr_hugepages/free_hugepages should be reduced by 1. > > + * > > + * - if enable_soft_offline = 0 (SOFT_OFFLINE_DISABLED), HugeTLB pages > > + * should stay intact and soft offlining failed with EOPNOTSUPP. > > + * > > + * - if enable_soft_offline = 1 (SOFT_OFFLINE_ENABLED_SKIP_HUGETLB), HugeTLB pages > > + * should stay intact and soft offlining failed with EOPNOTSUPP. > > + * > > + * - if enable_soft_offline = 2 (SOFT_OFFLINE_ENABLED), a HugeTLB page should be > > + * dissolved and nr_hugepages/free_hugepages should be reduced by 1. > > * > > * Before running, make sure more than 2 hugepages of default_hugepagesz > > * are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB: > > @@ -32,6 +37,12 @@ > > > > #define EPREFIX " !!! " > > > > +enum soft_offline { > > + SOFT_OFFLINE_DISABLED = 0, > > + SOFT_OFFLINE_ENABLED_SKIP_HUGETLB, > > + SOFT_OFFLINE_ENABLED > > +}; > > + > > static int do_soft_offline(int fd, size_t len, int expect_errno) > > { > > char *filemap = NULL; > > @@ -83,7 +94,7 @@ static int set_enable_soft_offline(int value) > > char cmd[256] = {0}; > > FILE *cmdfile = NULL; > > > > - if (value != 0 && value != 1) > > + if (value < SOFT_OFFLINE_DISABLED || value > SOFT_OFFLINE_ENABLED) > > return -EINVAL; > > > > sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value); > > @@ -155,7 +166,7 @@ static int create_hugetlbfs_file(struct statfs *file_stat) > > static void test_soft_offline_common(int enable_soft_offline) > > { > > int fd; > > - int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP; > > + int expect_errno = (enable_soft_offline == SOFT_OFFLINE_ENABLED) ? 0 : EOPNOTSUPP; > > struct statfs file_stat; > > unsigned long hugepagesize_kb = 0; > > unsigned long nr_hugepages_before = 0; > > @@ -198,7 +209,7 @@ static void test_soft_offline_common(int enable_soft_offline) > > // No need for the hugetlbfs file from now on. > > close(fd); > > > > - if (enable_soft_offline) { > > + if (enable_soft_offline == SOFT_OFFLINE_ENABLED) { > > if (nr_hugepages_before != nr_hugepages_after + 1) { > > ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n"); > > return; > > @@ -219,10 +230,11 @@ static void test_soft_offline_common(int enable_soft_offline) > > int main(int argc, char **argv) > > { > > ksft_print_header(); > > - ksft_set_plan(2); > > + ksft_set_plan(3); > > > > - test_soft_offline_common(1); > > - test_soft_offline_common(0); > > + test_soft_offline_common(SOFT_OFFLINE_ENABLED); > > + test_soft_offline_common(SOFT_OFFLINE_ENABLED_SKIP_HUGETLB); > > + test_soft_offline_common(SOFT_OFFLINE_DISABLED); > > Thanks for updating the test code! Looks good to me. > > > > > ksft_finished(); > > } > > -- > > 2.51.0 > >