On Mon, Aug 04, 2025 at 07:07:06PM +0200, David Hildenbrand wrote: > > Yeah I really don't like this. This seems overly complicated and too > > fiddly. Also if I want a static PMD, do I want to wait a minute for next > > attempt? > > > > Also doing things this way we might end up: > > > > 0. Enabling CONFIG_STATIC_HUGE_ZERO_FOLIO > > 1. Not doing anything that needs a static PMD for a while + get fragmentation. > > 2. Do something that needs it - oops can't get order-9 page, and waiting 60 > > seconds between attempts > > 3. This is silent so you think you have it switched on but are actually getting > > bad performance. > > > > I appreciate wanting to reuse this code, but we need to find a way to do this > > really really early, and get rid of this arbitrary time out. It's very aribtrary > > and we have no easy way of tracing how this might behave under workload. > > > > Also we end up pinning an order-9 page either way, so no harm in getting it > > first thing? > > What we could do, to avoid messing with memblock and two ways of initializing a huge zero folio early, and just disable the shrinker. Nice, I like this approach! > > Downside is that the page is really static (not just when actually used at least once). I like it: Well I'm not sure this is a downside :P User is explicitly enabling an option that says 'I'm cool to lose an order-9 page for this'. > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 0ce86e14ab5e1..8e2aa18873098 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -153,6 +153,7 @@ config X86 > select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64 > select ARCH_WANT_HUGETLB_VMEMMAP_PREINIT if X86_64 > select ARCH_WANTS_THP_SWAP if X86_64 > + select ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO if X86_64 > select ARCH_HAS_PARANOID_L1D_FLUSH > select ARCH_WANT_IRQS_OFF_ACTIVATE_MM > select BUILDTIME_TABLE_SORT > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 7748489fde1b7..ccfa5c95f14b1 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -495,6 +495,17 @@ static inline bool is_huge_zero_pmd(pmd_t pmd) > struct folio *mm_get_huge_zero_folio(struct mm_struct *mm); > void mm_put_huge_zero_folio(struct mm_struct *mm); > +static inline struct folio *get_static_huge_zero_folio(void) > +{ > + if (!IS_ENABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO)) > + return NULL; > + > + if (unlikely(!huge_zero_folio)) > + return NULL; > + > + return huge_zero_folio; > +} > + > static inline bool thp_migration_supported(void) > { > return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); > @@ -685,6 +696,11 @@ static inline int change_huge_pud(struct mmu_gather *tlb, > { > return 0; > } > + > +static inline struct folio *get_static_huge_zero_folio(void) > +{ > + return NULL; > +} > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > static inline int split_folio_to_list_to_order(struct folio *folio, > diff --git a/mm/Kconfig b/mm/Kconfig > index e443fe8cd6cf2..366a6d2d771e3 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -823,6 +823,27 @@ config ARCH_WANT_GENERAL_HUGETLB > config ARCH_WANTS_THP_SWAP > def_bool n > +config ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO > + def_bool n > + > +config STATIC_HUGE_ZERO_FOLIO > + bool "Allocate a PMD sized folio for zeroing" > + depends on ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO && TRANSPARENT_HUGEPAGE > + help > + Without this config enabled, the huge zero folio is allocated on > + demand and freed under memory pressure once no longer in use. > + To detect remaining users reliably, references to the huge zero folio > + must be tracked precisely, so it is commonly only available for mapping > + it into user page tables. > + > + With this config enabled, the huge zero folio can also be used > + for other purposes that do not implement precise reference counting: > + it is allocated statically and never freed, allowing for more > + wide-spread use, for example, when performing I/O similar to the > + traditional shared zeropage. > + > + Not suitable for memory constrained systems. > + > config MM_ID > def_bool n > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index ff06dee213eb2..f65ba3e6f0824 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -866,9 +866,14 @@ static int __init thp_shrinker_init(void) > huge_zero_folio_shrinker->scan_objects = shrink_huge_zero_folio_scan; > shrinker_register(huge_zero_folio_shrinker); > - deferred_split_shrinker->count_objects = deferred_split_count; > - deferred_split_shrinker->scan_objects = deferred_split_scan; > - shrinker_register(deferred_split_shrinker); > + if (IS_ENABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO)) { > + if (!get_huge_zero_folio()) > + pr_warn("Allocating static huge zero folio failed\n"); > + } else { > + deferred_split_shrinker->count_objects = deferred_split_count; > + deferred_split_shrinker->scan_objects = deferred_split_scan; > + shrinker_register(deferred_split_shrinker); > + } > return 0; > } > -- > 2.50.1 > > > Now, one thing I do not like is that we have "ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO" but > then have a user-selectable option. > > Should we just get rid of ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO? Yeah, though I guess we probably need to make it need CONFIG_MMU if so? Probably don't want to provide it if it might somehow break things? I guess we could keep it as long as CONFIG_STATIC_HUGE_ZERO_FOLIO depend on something sensible like CONFIG_MMU maybe 64-bit too? Anyway this approach looks generally good! > > -- > Cheers, > > David / dhildenb > Cheers, Lorenzo