Re: [PATCH v7 mm-new 04/10] mm: thp: enable THP allocation exclusively through khugepaged

Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> · Thu, 11 Sep 2025 16:58:35 +0100

On Wed, Sep 10, 2025 at 10:44:41AM +0800, Yafang Shao wrote:
> Currently, THP allocation cannot be restricted to khugepaged alone while
> being disabled in the page fault path. This limitation exists because
> disabling THP allocation during page faults also prevents the execution of
> khugepaged_enter_vma() in that path.

This is quite confusing, I see what you mean - you want to be able to disable
page fault THP but not khugepaged THP _at the point of possibly faulting in a
THP aligned VMA_.

It seems this patch makes khugepaged_enter_vma() unconditional for an anonymous
VMA, rather than depending on the return value specified by
thp_vma_allowable_order().

So I think a clearer explanation is:

	khugepaged_enter_vma() ultimately invokes any attached BPF function with
	the TVA_KHUGEPAGED flag set when determining whether or not to enable
	khugepaged THP for a freshly faulted in VMA.

	Currently, on fault, we invoke this in do_huge_pmd_anonymous_page(), as
	invoked by create_huge_pmd() and only when we have already checked to
	see if an allowable TVA_PAGEFAULT order is specified.

	Since we might want to disallow THP on fault-in but allow it via
	khugepaged, we move things around so we always attempt to enter
	khugepaged upon fault.

Having said all this, I'm very confused.

Why are we doing this?

We only enable khugepaged _early_ when we know we're faulting in a huge PMD
here.

I guess we do this because, if we are allowed to do the pagefault, maybe
something changed that might have previously disallowed khugepaged to run for
the mm.

But now we're just checking unconditionally for... no reason?

if BPF disables page fault but not khugepaged, then surely the mm would already
be under be khugepaged if it could be?

It's sort of immaterial if we get a pmd_none() that is not-faultable for
whatever reason but BPF might say is khugepaged'able, because it'd have already
set this.

This is because if we just map a new VMA, we already let khugepaged have it via
khugepaged_enter_vma() in __mmap_new_vma() and in the merge paths.

I mean maybe I'm missing something here :)

>
> With the introduction of BPF, we can now implement THP policies based on
> different TVA types. This patch adjusts the logic to support this new
> capability.
>
> While we could also extend prtcl() to utilize this new policy, such a

Typo: prtcl -> prctl

> change would require a uAPI modification.

Hm, in what respect? PR_SET_THP_DISABLE?

>
> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> ---
>  mm/huge_memory.c |  1 -
>  mm/memory.c      | 13 ++++++++-----
>  2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 523153d21a41..1e9e7b32e2cf 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1346,7 +1346,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>  	ret = vmf_anon_prepare(vmf);
>  	if (ret)
>  		return ret;
> -	khugepaged_enter_vma(vma, vma->vm_flags);
>
>  	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
>  			!mm_forbids_zeropage(vma->vm_mm) &&
> diff --git a/mm/memory.c b/mm/memory.c
> index d8819cac7930..d0609dc1e371 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -6289,11 +6289,14 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>  	if (pud_trans_unstable(vmf.pud))
>  		goto retry_pud;
>
> -	if (pmd_none(*vmf.pmd) &&
> -	    thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) {
> -		ret = create_huge_pmd(&vmf);
> -		if (!(ret & VM_FAULT_FALLBACK))
> -			return ret;
> +	if (pmd_none(*vmf.pmd)) {
> +		if (vma_is_anonymous(vma))
> +			khugepaged_enter_vma(vma, vm_flags);
> +		if (thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) {
> +			ret = create_huge_pmd(&vmf);
> +			if (!(ret & VM_FAULT_FALLBACK))
> +				return ret;
> +		}
>  	} else {
>  		vmf.orig_pmd = pmdp_get_lockless(vmf.pmd);
>
> --
> 2.47.3
>