On Wed, Sep 10, 2025 at 10:50 AM Kyle Meyer <kyle.meyer@xxxxxxx> wrote: > > On Wed, Sep 10, 2025 at 09:44:24AM -0700, Jiaqi Yan wrote: > > On Wed, Sep 10, 2025 at 9:16 AM Kyle Meyer <kyle.meyer@xxxxxxx> wrote: > > > > > > Soft offlining a HugeTLB page reduces the available HugeTLB page pool. > > > Since HugeTLB pages are preallocated, reducing the available HugeTLB > > > page pool can cause allocation failures. > > > > > > /proc/sys/vm/enable_soft_offline provides a sysctl interface to > > > disable/enable soft offline: > > > > > > 0 - Soft offline is disabled. > > > 1 - Soft offline is enabled. > > > > > > The current sysctl interface does not distinguish between HugeTLB pages > > > and other page types. > > > > > > Disable soft offline for HugeTLB pages by default (1) and extend the > > > sysctl interface to preserve existing behavior (2): > > > > > > 0 - Soft offline is disabled. > > > 1 - Soft offline is enabled (excluding HugeTLB pages). > > > 2 - Soft offline is enabled (including HugeTLB pages). > > > > > > Update documentation for the sysctl interface, reference the sysctl > > > interface in the sysfs ABI documentation, and update HugeTLB soft > > > offline selftests. > > > > > > Reported-by: Shawn Fan <shawn.fan@xxxxxxxxx> > > > Suggested-by: Tony Luck <tony.luck@xxxxxxxxx> > > > Signed-off-by: Kyle Meyer <kyle.meyer@xxxxxxx> > > > --- > > > > > > Tony's original patch disabled soft offline for HugeTLB pages when > > > a correctable memory error reported via GHES (with "error threshold > > > exceeded" set) happened to be on a HugeTLB page: > > > > > > https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@xxxxxxxxx > > > > > > This patch disables soft offline for HugeTLB pages by default > > > (not just from GHES). > > > > > > --- > > > .../ABI/testing/sysfs-memory-page-offline | 6 ++++ > > > Documentation/admin-guide/sysctl/vm.rst | 18 ++++++++--- > > > mm/memory-failure.c | 21 ++++++++++-- > > > .../selftests/mm/hugetlb-soft-offline.c | 32 +++++++++++++------ > > > 4 files changed, 60 insertions(+), 17 deletions(-) > > > > > > diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline > > > index 00f4e35f916f..befb89ae39ec 100644 > > > --- a/Documentation/ABI/testing/sysfs-memory-page-offline > > > +++ b/Documentation/ABI/testing/sysfs-memory-page-offline > > > @@ -20,6 +20,12 @@ Description: > > > number, or a error when the offlining failed. Reading > > > the file is not allowed. > > > > > > + Soft-offline can be disabled/enabled via sysctl: > > > + /proc/sys/vm/enable_soft_offline > > > + > > > + For details, see: > > > + Documentation/admin-guide/sysctl/vm.rst > > > + > > > What: /sys/devices/system/memory/hard_offline_page > > > Date: Sep 2009 > > > KernelVersion: 2.6.33 > > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > > > index 4d71211fdad8..ae56372bd604 100644 > > > --- a/Documentation/admin-guide/sysctl/vm.rst > > > +++ b/Documentation/admin-guide/sysctl/vm.rst > > > @@ -309,19 +309,29 @@ physical memory) vs performance / capacity implications in transparent and > > > HugeTLB cases. > > > > > > For all architectures, enable_soft_offline controls whether to soft offline > > > -memory pages. When set to 1, kernel attempts to soft offline the pages > > > -whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to > > > -the request to soft offline the pages. Its default value is 1. > > > +memory pages: > > > + > > > +- 0: Soft offline is disabled. > > > +- 1: Soft offline is enabled (excluding HugeTLB pages). > > > +- 2: Soft offline is enabled (including HugeTLB pages). > > > > Would it be better to keep/inherit the previous documented behavior "1 > > - Soft offline is enabled (no matter what type of the page is)"? Thus > > it will have no impact to users that are very nervous about corrected > > memory errors and willing to lose hugetlb page. Something like: > > > > enum soft_offline { > > SOFT_OFFLINE_DISABLED = 0, > > SOFT_OFFLINE_ENABLED, > > SOFT_OFFLINE_ENABLED_SKIP_HUGETLB, > > // SOFT_OFFLINE_ENABLED_SKIP_XXX... > > }; > > I don't have a strong opinion on the default because there's a sysctl > interface, but that seems reasonable. I'll wait for more feedback before > putting together a v2. Yeah, no strong opinion from me either, as long as SOFT_OFFLINE_DISABLED is still 0 (used by our fleet). In case you don't need to send out v2: Reviewed-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx> > > > > + > > > +The default is 1. > > > + > > > +If soft offline is disabled for the requested page type, EOPNOTSUPP is returned. > > > > > > It is worth mentioning that after setting enable_soft_offline to 0, the > > > following requests to soft offline pages will not be performed: > > > > > > +- Request to soft offline from sysfs (soft_offline_page). > > > + > > > - Request to soft offline pages from RAS Correctable Errors Collector. > > > > > > -- On ARM, the request to soft offline pages from GHES driver. > > > +- On ARM and X86, the request to soft offline pages from GHES driver. > > > > > > - On PARISC, the request to soft offline pages from Page Deallocation Table. > > > > > > +Note: Soft offlining a HugeTLB page reduces the HugeTLB page pool. > > > + > > > extfrag_threshold > > > ================= > > > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > > index fc30ca4804bf..cb59a99b48c5 100644 > > > --- a/mm/memory-failure.c > > > +++ b/mm/memory-failure.c > > > @@ -64,11 +64,18 @@ > > > #include "internal.h" > > > #include "ras/ras_event.h" > > > > > > +enum soft_offline { > > > + SOFT_OFFLINE_DISABLED = 0, > > > + SOFT_OFFLINE_ENABLED_SKIP_HUGETLB, > > > + SOFT_OFFLINE_ENABLED > > > +}; > > > + > > > static int sysctl_memory_failure_early_kill __read_mostly; > > > > > > static int sysctl_memory_failure_recovery __read_mostly = 1; > > > > > > -static int sysctl_enable_soft_offline __read_mostly = 1; > > > +static int sysctl_enable_soft_offline __read_mostly = > > > + SOFT_OFFLINE_ENABLED_SKIP_HUGETLB; > > > > > > atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); > > > > > > @@ -150,7 +157,7 @@ static const struct ctl_table memory_failure_table[] = { > > > .mode = 0644, > > > .proc_handler = proc_dointvec_minmax, > > > .extra1 = SYSCTL_ZERO, > > > - .extra2 = SYSCTL_ONE, > > > + .extra2 = SYSCTL_TWO, > > > } > > > }; > > > > > > @@ -2799,12 +2806,20 @@ int soft_offline_page(unsigned long pfn, int flags) > > > return -EIO; > > > } > > > > > > - if (!sysctl_enable_soft_offline) { > > > + if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) { > > > pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n"); > > > put_ref_page(pfn, flags); > > > return -EOPNOTSUPP; > > > } > > > > > > + if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGETLB) { > > > + if (folio_test_hugetlb(pfn_folio(pfn))) { > > > + pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n"); > > > + put_ref_page(pfn, flags); > > > + return -EOPNOTSUPP; > > > + } > > > + } > > > + > > > mutex_lock(&mf_mutex); > > > > > > if (PageHWPoison(page)) { > > > diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c > > > index f086f0e04756..7e2873cd0a6d 100644 > > > --- a/tools/testing/selftests/mm/hugetlb-soft-offline.c > > > +++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c > > > @@ -1,10 +1,15 @@ > > > // SPDX-License-Identifier: GPL-2.0 > > > /* > > > * Test soft offline behavior for HugeTLB pages: > > > - * - if enable_soft_offline = 0, hugepages should stay intact and soft > > > - * offlining failed with EOPNOTSUPP. > > > - * - if enable_soft_offline = 1, a hugepage should be dissolved and > > > - * nr_hugepages/free_hugepages should be reduced by 1. > > > + * > > > + * - if enable_soft_offline = 0 (SOFT_OFFLINE_DISABLED), HugeTLB pages > > > + * should stay intact and soft offlining failed with EOPNOTSUPP. > > > + * > > > + * - if enable_soft_offline = 1 (SOFT_OFFLINE_ENABLED_SKIP_HUGETLB), HugeTLB pages > > > + * should stay intact and soft offlining failed with EOPNOTSUPP. > > > + * > > > + * - if enable_soft_offline = 2 (SOFT_OFFLINE_ENABLED), a HugeTLB page should be > > > + * dissolved and nr_hugepages/free_hugepages should be reduced by 1. > > > * > > > * Before running, make sure more than 2 hugepages of default_hugepagesz > > > * are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB: > > > @@ -32,6 +37,12 @@ > > > > > > #define EPREFIX " !!! " > > > > > > +enum soft_offline { > > > + SOFT_OFFLINE_DISABLED = 0, > > > + SOFT_OFFLINE_ENABLED_SKIP_HUGETLB, > > > + SOFT_OFFLINE_ENABLED > > > +}; > > > + > > > static int do_soft_offline(int fd, size_t len, int expect_errno) > > > { > > > char *filemap = NULL; > > > @@ -83,7 +94,7 @@ static int set_enable_soft_offline(int value) > > > char cmd[256] = {0}; > > > FILE *cmdfile = NULL; > > > > > > - if (value != 0 && value != 1) > > > + if (value < SOFT_OFFLINE_DISABLED || value > SOFT_OFFLINE_ENABLED) > > > return -EINVAL; > > > > > > sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value); > > > @@ -155,7 +166,7 @@ static int create_hugetlbfs_file(struct statfs *file_stat) > > > static void test_soft_offline_common(int enable_soft_offline) > > > { > > > int fd; > > > - int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP; > > > + int expect_errno = (enable_soft_offline == SOFT_OFFLINE_ENABLED) ? 0 : EOPNOTSUPP; > > > struct statfs file_stat; > > > unsigned long hugepagesize_kb = 0; > > > unsigned long nr_hugepages_before = 0; > > > @@ -198,7 +209,7 @@ static void test_soft_offline_common(int enable_soft_offline) > > > // No need for the hugetlbfs file from now on. > > > close(fd); > > > > > > - if (enable_soft_offline) { > > > + if (enable_soft_offline == SOFT_OFFLINE_ENABLED) { > > > if (nr_hugepages_before != nr_hugepages_after + 1) { > > > ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n"); > > > return; > > > @@ -219,10 +230,11 @@ static void test_soft_offline_common(int enable_soft_offline) > > > int main(int argc, char **argv) > > > { > > > ksft_print_header(); > > > - ksft_set_plan(2); > > > + ksft_set_plan(3); > > > > > > - test_soft_offline_common(1); > > > - test_soft_offline_common(0); > > > + test_soft_offline_common(SOFT_OFFLINE_ENABLED); > > > + test_soft_offline_common(SOFT_OFFLINE_ENABLED_SKIP_HUGETLB); > > > + test_soft_offline_common(SOFT_OFFLINE_DISABLED); > > > > Thanks for updating the test code! Looks good to me. > > > > > > > > ksft_finished(); > > > } > > > -- > > > 2.51.0 > > >