On 31.07.25 16:38, Lorenzo Stoakes wrote:
Nits on subject:
- It's >75 chars
No big deal. If we cna come up with something shorter, good.
- advise is the verb, advice is the noun.
Yeah.
On Thu, Jul 31, 2025 at 01:27:20PM +0100, Usama Arif wrote:
From: David Hildenbrand <david@xxxxxxxxxx>
Let's allow for making MADV_COLLAPSE succeed on areas that neither have
VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled
unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED).
Hmm, I'm not sure about this.
So far this prctl() has been the only way to override MADV_COLLAPSE
behaviour, but now we're allowing for this one case to not.
This is not an override really. prctl() disallowed MADV_COLLAPSE, but in
the new mode we don't want that anymore.
> > I suppose the precedent is that MADV_COLLAPSE overrides 'madvise' sysfs
behaviour.
> > I suppose what saves us here is 'advised' can be read to mean either
MADV_HUGEPAGE or MADV_COLLAPSE.
> > And yes, MADV_COLLAPSE is clearly the user requesting this behaviour.
Exactly.
I think the vagueness here is one that already existed, because one could
perfectly one have expected MADV_COLLAPSE to obey sysfs and require
MADV_HUGEPAGE to have been applied, but of course this is not the case.
Yes.
OK so fine.
BUT.
I think the MADV_COLLAPSE man page will need to be updated to mention this.
Yes.
And I REALLY think we should update the THP doc too to mention all these
prctl() modes.
I'm not sure we cover that right now _at all_ and obviously we should
describe the new flags.
Usama - can you add a patch to this series to do that?
Good point, let's document the interaction with prctl().
MADV_COLLAPSE is a clear advise that we want to collapse.
advise -> advice.
Note that we still respect the VM_NOHUGEPAGE flag, just like
MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only
refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED.
You also need to mention the shmem change you've made I think.
Yes.
> >>
Co-developed-by: Usama Arif <usamaarif642@xxxxxxxxx>
Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx>
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
include/linux/huge_mm.h | 8 +++++++-
include/uapi/linux/prctl.h | 2 +-
mm/huge_memory.c | 5 +++--
mm/memory.c | 6 ++++--
mm/shmem.c | 2 +-
5 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b0ff54eee81c..aeaf93f8ac2e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -329,7 +329,7 @@ struct thpsize {
* through madvise or prctl.
*/
static inline bool vma_thp_disabled(struct vm_area_struct *vma,
- vm_flags_t vm_flags)
+ vm_flags_t vm_flags, bool forced_collapse)
{
/* Are THPs disabled for this VMA? */
if (vm_flags & VM_NOHUGEPAGE)
@@ -343,6 +343,12 @@ static inline bool vma_thp_disabled(struct vm_area_struct *vma,
*/
if (vm_flags & VM_HUGEPAGE)
return false;
+ /*
+ * Forcing a collapse (e.g., madv_collapse), is a clear advise to
advise -> advice.
+ * use THPs.
+ */
+ if (forced_collapse)
+ return false;
return test_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, &vma->vm_mm->flags);
}
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 9c1d6e49b8a9..ee4165738779 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -185,7 +185,7 @@ struct prctl_mm_map {
#define PR_SET_THP_DISABLE 41
/*
* Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE /
- * VM_HUGEPAGE).
+ * VM_HUGEPAGE / MADV_COLLAPSE).
This is confusing you're mixing VMA flags with MADV ones... maybe just
stick to madvise ones, or add extra context around VM_HUGEPAGE bit?
I don't see anything confusing here, really.
But if it helps you, we can do
(e.g., MADV_HUGEPAGE / VM_HUGEPAGE, MADV_COLLAPSE).
(reason VM_HUGEPAGE is spelled out is that there might be code where we
set VM_HUGEPAGE implicitly in the kernel)
Would need to be fixed up in a prior commit obviously.
*/
# define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1)
#define PR_GET_THP_DISABLE 42
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 85252b468f80..ef5ccb0ec5d5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -104,7 +104,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
{
const bool smaps = type == TVA_SMAPS;
const bool in_pf = type == TVA_PAGEFAULT;
- const bool enforce_sysfs = type != TVA_FORCED_COLLAPSE;
+ const bool forced_collapse = type == TVA_FORCED_COLLAPSE;
+ const bool enforce_sysfs = !forced_collapse;
Can we just get rid of this enforce_sysfs altogether in patch 2/5 and use
forced_collapse?
Let's do that as a separate cleanup on top. I want least churn in that
patch.
(had the same idea while writing that patch, but I have other things to
focus on than cleaning up all this mess)
--
Cheers,
David / dhildenb