Re: [PATCH 3/5] mm: add static huge zero folio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 04, 2025 at 07:07:06PM +0200, David Hildenbrand wrote:
> > Yeah I really don't like this. This seems overly complicated and too
> > fiddly. Also if I want a static PMD, do I want to wait a minute for next
> > attempt?
> >
> > Also doing things this way we might end up:
> >
> > 0. Enabling CONFIG_STATIC_HUGE_ZERO_FOLIO
> > 1. Not doing anything that needs a static PMD for a while + get fragmentation.
> > 2. Do something that needs it - oops can't get order-9 page, and waiting 60
> >     seconds between attempts
> > 3. This is silent so you think you have it switched on but are actually getting
> >     bad performance.
> >
> > I appreciate wanting to reuse this code, but we need to find a way to do this
> > really really early, and get rid of this arbitrary time out. It's very aribtrary
> > and we have no easy way of tracing how this might behave under workload.
> >
> > Also we end up pinning an order-9 page either way, so no harm in getting it
> > first thing?
>
> What we could do, to avoid messing with memblock and two ways of initializing a huge zero folio early, and just disable the shrinker.

Nice, I like this approach!

>
> Downside is that the page is really static (not just when actually used at least once). I like it:

Well I'm not sure this is a downside :P

User is explicitly enabling an option that says 'I'm cool to lose an order-9
page for this'.

>
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0ce86e14ab5e1..8e2aa18873098 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -153,6 +153,7 @@ config X86
>  	select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP	if X86_64
>  	select ARCH_WANT_HUGETLB_VMEMMAP_PREINIT if X86_64
>  	select ARCH_WANTS_THP_SWAP		if X86_64
> +	select ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO if X86_64
>  	select ARCH_HAS_PARANOID_L1D_FLUSH
>  	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
>  	select BUILDTIME_TABLE_SORT
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7748489fde1b7..ccfa5c95f14b1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -495,6 +495,17 @@ static inline bool is_huge_zero_pmd(pmd_t pmd)
>  struct folio *mm_get_huge_zero_folio(struct mm_struct *mm);
>  void mm_put_huge_zero_folio(struct mm_struct *mm);
> +static inline struct folio *get_static_huge_zero_folio(void)
> +{
> +	if (!IS_ENABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO))
> +		return NULL;
> +
> +	if (unlikely(!huge_zero_folio))
> +		return NULL;
> +
> +	return huge_zero_folio;
> +}
> +
>  static inline bool thp_migration_supported(void)
>  {
>  	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
> @@ -685,6 +696,11 @@ static inline int change_huge_pud(struct mmu_gather *tlb,
>  {
>  	return 0;
>  }
> +
> +static inline struct folio *get_static_huge_zero_folio(void)
> +{
> +	return NULL;
> +}
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  static inline int split_folio_to_list_to_order(struct folio *folio,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e443fe8cd6cf2..366a6d2d771e3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -823,6 +823,27 @@ config ARCH_WANT_GENERAL_HUGETLB
>  config ARCH_WANTS_THP_SWAP
>  	def_bool n
> +config ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO
> +	def_bool n
> +
> +config STATIC_HUGE_ZERO_FOLIO
> +	bool "Allocate a PMD sized folio for zeroing"
> +	depends on ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO && TRANSPARENT_HUGEPAGE
> +	help
> +	  Without this config enabled, the huge zero folio is allocated on
> +	  demand and freed under memory pressure once no longer in use.
> +	  To detect remaining users reliably, references to the huge zero folio
> +	  must be tracked precisely, so it is commonly only available for mapping
> +	  it into user page tables.
> +
> +	  With this config enabled, the huge zero folio can also be used
> +	  for other purposes that do not implement precise reference counting:
> +	  it is allocated statically and never freed, allowing for more
> +	  wide-spread use, for example, when performing I/O similar to the
> +	  traditional shared zeropage.
> +
> +	  Not suitable for memory constrained systems.
> +
>  config MM_ID
>  	def_bool n
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ff06dee213eb2..f65ba3e6f0824 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -866,9 +866,14 @@ static int __init thp_shrinker_init(void)
>  	huge_zero_folio_shrinker->scan_objects = shrink_huge_zero_folio_scan;
>  	shrinker_register(huge_zero_folio_shrinker);
> -	deferred_split_shrinker->count_objects = deferred_split_count;
> -	deferred_split_shrinker->scan_objects = deferred_split_scan;
> -	shrinker_register(deferred_split_shrinker);
> +	if (IS_ENABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO)) {
> +		if (!get_huge_zero_folio())
> +			pr_warn("Allocating static huge zero folio failed\n");
> +	} else {
> +		deferred_split_shrinker->count_objects = deferred_split_count;
> +		deferred_split_shrinker->scan_objects = deferred_split_scan;
> +		shrinker_register(deferred_split_shrinker);
> +	}
>  	return 0;
>  }
> --
> 2.50.1
>
>
> Now, one thing I do not like is that we have "ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO" but
> then have a user-selectable option.
>
> Should we just get rid of ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO?

Yeah, though I guess we probably need to make it need CONFIG_MMU if so?
Probably don't want to provide it if it might somehow break things?

I guess we could keep it as long as CONFIG_STATIC_HUGE_ZERO_FOLIO depend on
something sensible like CONFIG_MMU maybe 64-bit too?

Anyway this approach looks generally good!

>
> --
> Cheers,
>
> David / dhildenb
>

Cheers, Lorenzo




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux