Re: [PATCH v10 05/13] khugepaged: generalize __collapse_huge_page_* for mTHP support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20.08.25 16:22, Lorenzo Stoakes wrote:
On Tue, Aug 19, 2025 at 07:41:57AM -0600, Nico Pache wrote:
generalize the order of the __collapse_huge_page_* functions
to support future mTHP collapse.

mTHP collapse can suffer from incosistant behavior, and memory waste
"creep". disable swapin and shared support for mTHP collapse.

I'd just note that mTHP collapse will initially not honor these two parameters (spell them out), failing if anything is swapped out or shared.


No functional changes in this patch.

Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
Acked-by: David Hildenbrand <david@xxxxxxxxxx>
Co-developed-by: Dev Jain <dev.jain@xxxxxxx>
Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
Signed-off-by: Nico Pache <npache@xxxxxxxxxx>
---
  mm/khugepaged.c | 62 ++++++++++++++++++++++++++++++++++---------------
  1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 77e0d8ee59a0..074101d03c9d 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -551,15 +551,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
  					unsigned long address,
  					pte_t *pte,
  					struct collapse_control *cc,
-					struct list_head *compound_pagelist)
+					struct list_head *compound_pagelist,
+					unsigned int order)

I think it's better if we keep the output var as the last in the order. It's a
bit weird to have the order specified here.

Also, while at it, just double-tab indent.


  {
  	struct page *page = NULL;
  	struct folio *folio = NULL;
  	pte_t *_pte;
  	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
  	bool writable = false;
+	int scaled_max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);

This is a weird formulation, I guess we have to go with it to keep things
consistent-ish, but it's like we have a value for this that is reliant on the
order always being PMD and then sort of awkwardly adjusting for MTHP.

I guess we're stuck with it though since we have:

/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none

I guess a more sane version of this would be a ratio or something...

Yeah, ratios would have made much more sense ... but people didn't plan for having something that is not a PMD size.


Anyway probably out of scope here.


-	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
+	for (_pte = pte; _pte < pte + (1 << order);

Hmm is this correct? I think shifting an int is probably a bad idea even if we
can get away with it for even PUD order atm (though... 64KB ARM hm), wouldn't
_BITUL(order) be better?

Just for completeness: we discussed that recently in other context. It would better be BIT() instead of _BITUL().

But I am not a fan of using BIT() when not working with bitmaps etc.

"1ul << order" etc is used throughout the code base.

What makes this easier to read is:

	const unsigned long nr_pages = 1ul << order;

Maybe in the future we want an ORDER_PAGES(), maybe not. Have not made up my mind yet :)

--
Cheers

David / dhildenb





[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux