On Wed, Sep 10, 2025 at 10:44:41AM +0800, Yafang Shao wrote: > Currently, THP allocation cannot be restricted to khugepaged alone while > being disabled in the page fault path. This limitation exists because > disabling THP allocation during page faults also prevents the execution of > khugepaged_enter_vma() in that path. This is quite confusing, I see what you mean - you want to be able to disable page fault THP but not khugepaged THP _at the point of possibly faulting in a THP aligned VMA_. It seems this patch makes khugepaged_enter_vma() unconditional for an anonymous VMA, rather than depending on the return value specified by thp_vma_allowable_order(). So I think a clearer explanation is: khugepaged_enter_vma() ultimately invokes any attached BPF function with the TVA_KHUGEPAGED flag set when determining whether or not to enable khugepaged THP for a freshly faulted in VMA. Currently, on fault, we invoke this in do_huge_pmd_anonymous_page(), as invoked by create_huge_pmd() and only when we have already checked to see if an allowable TVA_PAGEFAULT order is specified. Since we might want to disallow THP on fault-in but allow it via khugepaged, we move things around so we always attempt to enter khugepaged upon fault. Having said all this, I'm very confused. Why are we doing this? We only enable khugepaged _early_ when we know we're faulting in a huge PMD here. I guess we do this because, if we are allowed to do the pagefault, maybe something changed that might have previously disallowed khugepaged to run for the mm. But now we're just checking unconditionally for... no reason? if BPF disables page fault but not khugepaged, then surely the mm would already be under be khugepaged if it could be? It's sort of immaterial if we get a pmd_none() that is not-faultable for whatever reason but BPF might say is khugepaged'able, because it'd have already set this. This is because if we just map a new VMA, we already let khugepaged have it via khugepaged_enter_vma() in __mmap_new_vma() and in the merge paths. I mean maybe I'm missing something here :) > > With the introduction of BPF, we can now implement THP policies based on > different TVA types. This patch adjusts the logic to support this new > capability. > > While we could also extend prtcl() to utilize this new policy, such a Typo: prtcl -> prctl > change would require a uAPI modification. Hm, in what respect? PR_SET_THP_DISABLE? > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > --- > mm/huge_memory.c | 1 - > mm/memory.c | 13 ++++++++----- > 2 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 523153d21a41..1e9e7b32e2cf 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1346,7 +1346,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) > ret = vmf_anon_prepare(vmf); > if (ret) > return ret; > - khugepaged_enter_vma(vma, vma->vm_flags); > > if (!(vmf->flags & FAULT_FLAG_WRITE) && > !mm_forbids_zeropage(vma->vm_mm) && > diff --git a/mm/memory.c b/mm/memory.c > index d8819cac7930..d0609dc1e371 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -6289,11 +6289,14 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > if (pud_trans_unstable(vmf.pud)) > goto retry_pud; > > - if (pmd_none(*vmf.pmd) && > - thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) { > - ret = create_huge_pmd(&vmf); > - if (!(ret & VM_FAULT_FALLBACK)) > - return ret; > + if (pmd_none(*vmf.pmd)) { > + if (vma_is_anonymous(vma)) > + khugepaged_enter_vma(vma, vm_flags); > + if (thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) { > + ret = create_huge_pmd(&vmf); > + if (!(ret & VM_FAULT_FALLBACK)) > + return ret; > + } > } else { > vmf.orig_pmd = pmdp_get_lockless(vmf.pmd); > > -- > 2.47.3 >