Hi Tony, On Mon, Sep 8, 2025 at 10:01 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > On Mon, Sep 08, 2025 at 02:14:42PM -0500, Kyle Meyer wrote: > > Optionally, a 3rd setting could be added to /proc/sys/vm/enable_soft_offline: > > > > 0: Soft offline is disabled. > > 1: Soft offline is enabled for normal pages (skip hugepages). > > 2: Soft offline is enabled for normal pages and hugepages. > > Seems like a solid plan. > > Needs an update to Documentation/admin-guide/sysctl/vm.rst > to match the new functionality and describe the reason that > users might want to avoid taking huge pages offline. > > Also this line in existing text: > > - On ARM, the request to soft offline pages from GHES driver. > > should read: > > - On ARM and X86, the request to soft offline pages from GHES driver. > > > > > Maybe something like... > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index fc30ca4804bf..efa535d405a8 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -64,11 +64,17 @@ > > #include "internal.h" > > #include "ras/ras_event.h" > > > > +enum soft_offline { > > + SOFT_OFFLINE_DISABLED = 0, > > + SOFT_OFFLINE_ENABLED_SKIP_HUGEPAGES, > > + SOFT_OFFLINE_ENABLED > > +}; > > + > > static int sysctl_memory_failure_early_kill __read_mostly; > > > > static int sysctl_memory_failure_recovery __read_mostly = 1; > > -static int sysctl_enable_soft_offline __read_mostly = 1; > > +static int sysctl_enable_soft_offline __read_mostly = SOFT_OFFLINE_SKIP_HUGEPAGES; > > This is changing the default behavior (which allowed offline > of huge pages). I'm in favor of the change. But you should call it > out explicitly in the commit message when you write up a patch. > > > > > atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); > > > > @@ -150,7 +156,7 @@ static const struct ctl_table memory_failure_table[] = { > > .mode = 0644, > > .proc_handler = proc_dointvec_minmax, > > .extra1 = SYSCTL_ZERO, > > - .extra2 = SYSCTL_ONE, > > + .extra2 = SYSCTL_TWO, > > } > > }; > > > > @@ -2799,12 +2805,20 @@ int soft_offline_page(unsigned long pfn, int flags) > > return -EIO; > > } > > > > - if (!sysctl_enable_soft_offline) { > > - pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n"); > > + if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) { > > + pr_info("disabled by /proc/sys/vm/enable_soft_offline\n"); > > put_ref_page(pfn, flags); > > return -EOPNOTSUPP; > > } > > > > + if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGEPAGES) { > > + if (folio_test_hugetlb(pfn_folio(pfn))) { > > + pr_info("disabled by /proc/sys/vm/enable_soft_offline\n"); > > + put_ref_page(pfn, flags); > > + return -EOPNOTSUPP; > > + } > > + } > > + > > mutex_lock(&mf_mutex); > > > > if (PageHWPoison(page)) { > > > > > > I don't know, the patch itself is fine, it's the issue that it has > > > > exposed that is more concerning. Is the $subject patch still relevant? I gather from the conversation following it that it isn't.