On Thu, Jul 31, 2025 at 04:54:38PM +0200, David Hildenbrand wrote: > On 31.07.25 16:38, Lorenzo Stoakes wrote: > > Nits on subject: > > > > - It's >75 chars > > No big deal. If we cna come up with something shorter, good. > > > - advise is the verb, advice is the noun. > > Yeah. > > > > > On Thu, Jul 31, 2025 at 01:27:20PM +0100, Usama Arif wrote: > > > From: David Hildenbrand <david@xxxxxxxxxx> > > > > > > Let's allow for making MADV_COLLAPSE succeed on areas that neither have > > > VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled > > > unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED). > > > > Hmm, I'm not sure about this. > > > > So far this prctl() has been the only way to override MADV_COLLAPSE > > behaviour, but now we're allowing for this one case to not. > > This is not an override really. prctl() disallowed MADV_COLLAPSE, but in the > new mode we don't want that anymore. Yeah see below, I sort of convinced myself this is OK. > > > > I suppose the precedent is that MADV_COLLAPSE overrides 'madvise' sysfs > > behaviour. > > > I suppose what saves us here is 'advised' can be read to mean either > > MADV_HUGEPAGE or MADV_COLLAPSE. > > > And yes, MADV_COLLAPSE is clearly the user requesting this behaviour. > > Exactly. And really I guess it's logical right? If you MADV_COLLAPSE you are saying 'I want THPs', if you MADV_HUGEPAGE you are saying 'I want THPs', the difference being on timing. > > > > > I think the vagueness here is one that already existed, because one could > > perfectly one have expected MADV_COLLAPSE to obey sysfs and require > > MADV_HUGEPAGE to have been applied, but of course this is not the case. > > Yes. > > > > > OK so fine. > > > > BUT. > > > > I think the MADV_COLLAPSE man page will need to be updated to mention this. > > > > Yes. > > > And I REALLY think we should update the THP doc too to mention all these > > prctl() modes. > > > > I'm not sure we cover that right now _at all_ and obviously we should > > describe the new flags. > > > > Usama - can you add a patch to this series to do that? > > Good point, let's document the interaction with prctl(). Thanks, I think we haven't even spoken that much about MADV_COLLAPSE in the docs, somebody had a patch but then it fizzled out. But that's a separate task, will add to my TODOs in case nobody else picks up. > > > > > > > > > MADV_COLLAPSE is a clear advise that we want to collapse. > > > > advise -> advice. > > > > > > > > Note that we still respect the VM_NOHUGEPAGE flag, just like > > > MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only > > > refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED. > > > > You also need to mention the shmem change you've made I think. > > Yes. > > > >> > > > Co-developed-by: Usama Arif <usamaarif642@xxxxxxxxx> > > > Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx> > > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > > > --- > > > include/linux/huge_mm.h | 8 +++++++- > > > include/uapi/linux/prctl.h | 2 +- > > > mm/huge_memory.c | 5 +++-- > > > mm/memory.c | 6 ++++-- > > > mm/shmem.c | 2 +- > > > 5 files changed, 16 insertions(+), 7 deletions(-) > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > index b0ff54eee81c..aeaf93f8ac2e 100644 > > > --- a/include/linux/huge_mm.h > > > +++ b/include/linux/huge_mm.h > > > @@ -329,7 +329,7 @@ struct thpsize { > > > * through madvise or prctl. > > > */ > > > static inline bool vma_thp_disabled(struct vm_area_struct *vma, > > > - vm_flags_t vm_flags) > > > + vm_flags_t vm_flags, bool forced_collapse) > > > { > > > /* Are THPs disabled for this VMA? */ > > > if (vm_flags & VM_NOHUGEPAGE) > > > @@ -343,6 +343,12 @@ static inline bool vma_thp_disabled(struct vm_area_struct *vma, > > > */ > > > if (vm_flags & VM_HUGEPAGE) > > > return false; > > > + /* > > > + * Forcing a collapse (e.g., madv_collapse), is a clear advise to > > > > advise -> advice. > > > > > + * use THPs. > > > + */ > > > + if (forced_collapse) > > > + return false; > > > return test_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, &vma->vm_mm->flags); > > > } > > > > > > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > > > index 9c1d6e49b8a9..ee4165738779 100644 > > > --- a/include/uapi/linux/prctl.h > > > +++ b/include/uapi/linux/prctl.h > > > @@ -185,7 +185,7 @@ struct prctl_mm_map { > > > #define PR_SET_THP_DISABLE 41 > > > /* > > > * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE / > > > - * VM_HUGEPAGE). > > > + * VM_HUGEPAGE / MADV_COLLAPSE). > > > > This is confusing you're mixing VMA flags with MADV ones... maybe just > > stick to madvise ones, or add extra context around VM_HUGEPAGE bit? > > I don't see anything confusing here, really. > > But if it helps you, we can do > (e.g., MADV_HUGEPAGE / VM_HUGEPAGE, MADV_COLLAPSE). > > (reason VM_HUGEPAGE is spelled out is that there might be code where we set > VM_HUGEPAGE implicitly in the kernel) Yeah to be clear this is a pretty minor point, and I see why we'd want to mention VM_HUGEPAGE explicitly, as it is in fact this that overrides THP being disabled for the process. > > > > > Would need to be fixed up in a prior commit obviously. > > > > > */ > > > # define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1) > > > #define PR_GET_THP_DISABLE 42 > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index 85252b468f80..ef5ccb0ec5d5 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -104,7 +104,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > > > { > > > const bool smaps = type == TVA_SMAPS; > > > const bool in_pf = type == TVA_PAGEFAULT; > > > - const bool enforce_sysfs = type != TVA_FORCED_COLLAPSE; > > > + const bool forced_collapse = type == TVA_FORCED_COLLAPSE; > > > + const bool enforce_sysfs = !forced_collapse; > > > > Can we just get rid of this enforce_sysfs altogether in patch 2/5 and use > > forced_collapse? > > Let's do that as a separate cleanup on top. I want least churn in that > patch. > > (had the same idea while writing that patch, but I have other things to > focus on than cleaning up all this mess) Ack, I'm fine with that. > > -- > Cheers, > > David / dhildenb > Cheers, Lorenzo