Re: [PATCH v7 mm-new 04/10] mm: thp: enable THP allocation exclusively through khugepaged

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 12, 2025 at 9:48 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
>
> On Fri, Sep 12, 2025 at 02:17:01PM +0800, Yafang Shao wrote:
> > On Thu, Sep 11, 2025 at 11:58 PM Lorenzo Stoakes
> > <lorenzo.stoakes@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Sep 10, 2025 at 10:44:41AM +0800, Yafang Shao wrote:
> > > > Currently, THP allocation cannot be restricted to khugepaged alone while
> > > > being disabled in the page fault path. This limitation exists because
> > > > disabling THP allocation during page faults also prevents the execution of
> > > > khugepaged_enter_vma() in that path.
> > >
> > > This is quite confusing, I see what you mean - you want to be able to disable
> > > page fault THP but not khugepaged THP _at the point of possibly faulting in a
> > > THP aligned VMA_.
> > >
> > > It seems this patch makes khugepaged_enter_vma() unconditional for an anonymous
> > > VMA, rather than depending on the return value specified by
> > > thp_vma_allowable_order().
> >
> > The functions thp_vma_allowable_order(TVA_PAGEFAULT) and
> > thp_vma_allowable_order(TVA_KHUGEPAGED) are functionally equivalent
> > within the page fault handler; they always yield the same result.
> > Consequently, their execution order is irrelevant.
>
> It seems hard to definitely demonstrate that by checking !in_pf vs not in this
> situation :) but it seems broadly true afaict.
>
> So they differ only in that one starts khugepaged, the other tries to
> establish a THP on fault via create_huge_pmd().

right

>
> >
> > The change reorders these two calls and, in doing so, also moves the
> > call to vmf_anon_prepare(vmf). This alters the control flow:
> > - before this change:  The logic checked the return value of
> > vmf_anon_prepare() between the two thp_vma_allowable_order() calls.
> >
> >     thp_vma_allowable_order(TVA_PAGEFAULT);
> >     ret = vmf_anon_prepare(vmf);
> >     if (ret)
> >         return ret;
> >     thp_vma_allowable_order(TVA_KHUGEPAGED);
>
> I mean it's also _only if_ the TVA_PAGEFAULT invocation succeeds that the
> TVA_KHUGEPAGED one happens.
>
> >
> >  - after this change: The logic now executes both
> > thp_vma_allowable_order() calls first and does not check the return
> > value of vmf_anon_prepare().
> >
> >     thp_vma_allowable_order(TVA_KHUGEPAGED);
> >     thp_vma_allowable_order(TVA_PAGEFAULT);
> >     ret = vmf_anon_prepare(vmf); // Return value 'ret' is ignored.
>
> Hm this is confusing, your code does:
>
> +       if (pmd_none(*vmf.pmd)) {
> +               if (vma_is_anonymous(vma))
> +                       khugepaged_enter_vma(vma, vm_flags);
> +               if (thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) {
> +                       ret = create_huge_pmd(&vmf);
> +                       if (!(ret & VM_FAULT_FALLBACK))
> +                               return ret;
> +               }
>
> So the ret is absolutely not ignored, but whether it succeeds or not, we still
> invoke khugepaged_enter_vma().
>
> Previously we would not have one this had vmf_anon_prepare() failed in
> do_huge_pmd_anonymous_page().
>
> Which I guess is what you mean?
>
> >
> > This change is safe because the return value of vmf_anon_prepare() can
> > be safely ignored. This function checks for transient system-level
> > conditions (e.g., memory pressure, THP availability) that might
> > prevent an immediate THP allocation. It does not guarantee that a
> > subsequent allocation will succeed.
> >
> > This behavior is consistent with the policy in hugepage_madvise(),
> > where a VMA is queued for khugepaged before a definitive allocation
> > check. If the system is under pressure, khugepaged will simply retry
> > the allocation at a more opportune time.
>
> OK. I do note though that the khugepaged being kicked off is at mm_struct level.

The unit of operation for khugepaged is the mm_struct itself. It
processes the entire mm even when only a single VMA within it is a
candidate for a THP.

>
> So us trying to invoke khugepaged on the mm again is about.. something having
> changed that would previously have prevented us but now doesn't?
>
> That is, a product of thp_vma_allowable_order() right?
>
> So probably a sysfs change or similar?
>
> But I guess it makes sense to hook in BPF whenever this is the case because this
> _could_ be the point at which khugepaged enters the mm, and we want to select
> the allowable order at this time.
>
> So on basis of the two checks being effectively equivalent (on assumption this
> is always the case) then the change is fairly reasonable.

Yes, that is exactly what I mean.

>
> Though I would put this information, that the checks are equivalent, in the
> commit message so it's really clear.

will add it.

-- 
Regards
Yafang





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux