On Mon, Apr 28, 2025 at 8:12 PM Nico Pache <npache@xxxxxxxxxx> wrote: > Introduce the ability for khugepaged to collapse to different mTHP sizes. > While scanning PMD ranges for potential collapse candidates, keep track > of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit > represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If > mTHPs are enabled we remove the restriction of max_ptes_none during the > scan phase so we dont bailout early and miss potential mTHP candidates. > > After the scan is complete we will perform binary recursion on the > bitmap to determine which mTHP size would be most efficient to collapse > to. max_ptes_none will be scaled by the attempted collapse order to > determine how full a THP must be to be eligible. > > If a mTHP collapse is attempted, but contains swapped out, or shared > pages, we dont perform the collapse. [...] > @@ -1208,11 +1211,12 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > vma_start_write(vma); > anon_vma_lock_write(vma->anon_vma); > > - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address, > - address + HPAGE_PMD_SIZE); > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _address, > + _address + (PAGE_SIZE << order)); > mmu_notifier_invalidate_range_start(&range); > > pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */ > + > /* > * This removes any huge TLB entry from the CPU so we won't allow > * huge and small TLB entries for the same virtual address to It's not visible in this diff, but we're about to do a pmdp_collapse_flush() here. pmdp_collapse_flush() tears down the entire page table, meaning it tears down 2MiB of address space; and it assumes that the entire page table exclusively corresponds to the current VMA. I think you'll need to ensure that the pmdp_collapse_flush() only happens for full-size THP, and that mTHP only tears down individual PTEs in the relevant range. (That code might get a bit messy, since the existing THP code tears down PTEs in a detached page table, while mTHP would have to do it in a still-attached page table.)