On 15/05/25 8:52 am, Nico Pache wrote:
Introduce the ability for khugepaged to collapse to different mTHP sizes. While scanning PMD ranges for potential collapse candidates, keep track of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If mTHPs are enabled we remove the restriction of max_ptes_none during the scan phase so we dont bailout early and miss potential mTHP candidates. After the scan is complete we will perform binary recursion on the bitmap to determine which mTHP size would be most efficient to collapse to. max_ptes_none will be scaled by the attempted collapse order to determine how full a THP must be to be eligible. If a mTHP collapse is attempted, but contains swapped out, or shared pages, we dont perform the collapse. For non PMD collapse we much leave the anon VMA write locked until after we collapse the mTHP
Why? I know that Hugh pointed out locking errors; I am yet to catch up on that thread, but you need to explain in the description why you do what you do. [--snip---]
- - spin_lock(pmd_ptl); - BUG_ON(!pmd_none(*pmd)); - folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); - folio_add_lru_vma(folio, vma); - pgtable_trans_huge_deposit(mm, pmd, pgtable); - set_pmd_at(mm, address, pmd, _pmd); - update_mmu_cache_pmd(vma, address, pmd); - deferred_split_folio(folio, false); - spin_unlock(pmd_ptl); + if (order == HPAGE_PMD_ORDER) { + pgtable = pmd_pgtable(_pmd); + _pmd = folio_mk_pmd(folio, vma->vm_page_prot); + _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); + + spin_lock(pmd_ptl); + BUG_ON(!pmd_none(*pmd)); + folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE); + folio_add_lru_vma(folio, vma); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, address, pmd, _pmd); + update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); + spin_unlock(pmd_ptl); + } else { /* mTHP collapse */ + mthp_pte = mk_pte(&folio->page, vma->vm_page_prot); + mthp_pte = maybe_mkwrite(pte_mkdirty(mthp_pte), vma); + + spin_lock(pmd_ptl);
Nico, I've noticed a few occasions where my review comments have not been acknowledged - for example, [1]. It makes it difficult to follow up and contributes to some frustration on my end. I'd appreciate if you could make sure to respond to feedback, even if you are disagreeing with my comments. Thanks! [1] https://lore.kernel.org/all/08d13445-5ed1-42ea-8aee-c1dbde24407e@xxxxxxx/ [---snip---]