Re: [PATCH v2 3/5] mm/huge_memory: treat MADV_COLLAPSE as an advise with PR_THP_DISABLE_EXCEPT_ADVISED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 31.07.25 16:38, Lorenzo Stoakes wrote:
Nits on subject:

- It's >75 chars

No big deal. If we cna come up with something shorter, good.

- advise is the verb, advice is the noun.

Yeah.


On Thu, Jul 31, 2025 at 01:27:20PM +0100, Usama Arif wrote:
From: David Hildenbrand <david@xxxxxxxxxx>

Let's allow for making MADV_COLLAPSE succeed on areas that neither have
VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled
unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED).

Hmm, I'm not sure about this.

So far this prctl() has been the only way to override MADV_COLLAPSE
behaviour, but now we're allowing for this one case to not.

This is not an override really. prctl() disallowed MADV_COLLAPSE, but in the new mode we don't want that anymore.

> > I suppose the precedent is that MADV_COLLAPSE overrides 'madvise' sysfs
behaviour.
> > I suppose what saves us here is 'advised' can be read to mean either
MADV_HUGEPAGE or MADV_COLLAPSE.
> > And yes, MADV_COLLAPSE is clearly the user requesting this behaviour.

Exactly.


I think the vagueness here is one that already existed, because one could
perfectly one have expected MADV_COLLAPSE to obey sysfs and require
MADV_HUGEPAGE to have been applied, but of course this is not the case.

Yes.


OK so fine.

BUT.

I think the MADV_COLLAPSE man page will need to be updated to mention this.


Yes.

And I REALLY think we should update the THP doc too to mention all these
prctl() modes.

I'm not sure we cover that right now _at all_ and obviously we should
describe the new flags.

Usama - can you add a patch to this series to do that?

Good point, let's document the interaction with prctl().



MADV_COLLAPSE is a clear advise that we want to collapse.

advise -> advice.


Note that we still respect the VM_NOHUGEPAGE flag, just like
MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only
refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED.

You also need to mention the shmem change you've made I think.

Yes.

> >>
Co-developed-by: Usama Arif <usamaarif642@xxxxxxxxx>
Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx>
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
  include/linux/huge_mm.h    | 8 +++++++-
  include/uapi/linux/prctl.h | 2 +-
  mm/huge_memory.c           | 5 +++--
  mm/memory.c                | 6 ++++--
  mm/shmem.c                 | 2 +-
  5 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b0ff54eee81c..aeaf93f8ac2e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -329,7 +329,7 @@ struct thpsize {
   * through madvise or prctl.
   */
  static inline bool vma_thp_disabled(struct vm_area_struct *vma,
-		vm_flags_t vm_flags)
+		vm_flags_t vm_flags, bool forced_collapse)
  {
  	/* Are THPs disabled for this VMA? */
  	if (vm_flags & VM_NOHUGEPAGE)
@@ -343,6 +343,12 @@ static inline bool vma_thp_disabled(struct vm_area_struct *vma,
  	 */
  	if (vm_flags & VM_HUGEPAGE)
  		return false;
+	/*
+	 * Forcing a collapse (e.g., madv_collapse), is a clear advise to

advise -> advice.

+	 * use THPs.
+	 */
+	if (forced_collapse)
+		return false;
  	return test_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, &vma->vm_mm->flags);
  }

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 9c1d6e49b8a9..ee4165738779 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -185,7 +185,7 @@ struct prctl_mm_map {
  #define PR_SET_THP_DISABLE	41
  /*
   * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE /
- * VM_HUGEPAGE).
+ * VM_HUGEPAGE / MADV_COLLAPSE).

This is confusing you're mixing VMA flags with MADV ones... maybe just
stick to madvise ones, or add extra context around VM_HUGEPAGE bit?

I don't see anything confusing here, really.

But if it helps you, we can do
	(e.g., MADV_HUGEPAGE / VM_HUGEPAGE, MADV_COLLAPSE).

(reason VM_HUGEPAGE is spelled out is that there might be code where we set VM_HUGEPAGE implicitly in the kernel)


Would need to be fixed up in a prior commit obviously.

   */
  # define PR_THP_DISABLE_EXCEPT_ADVISED	(1 << 1)
  #define PR_GET_THP_DISABLE	42
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 85252b468f80..ef5ccb0ec5d5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -104,7 +104,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
  {
  	const bool smaps = type == TVA_SMAPS;
  	const bool in_pf = type == TVA_PAGEFAULT;
-	const bool enforce_sysfs = type != TVA_FORCED_COLLAPSE;
+	const bool forced_collapse = type == TVA_FORCED_COLLAPSE;
+	const bool enforce_sysfs = !forced_collapse;

Can we just get rid of this enforce_sysfs altogether in patch 2/5 and use
forced_collapse?

Let's do that as a separate cleanup on top. I want least churn in that patch.

(had the same idea while writing that patch, but I have other things to focus on than cleaning up all this mess)

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux