One thing I wanted to emphasise - shouldn't we invoke khugepaged_enter_vma() for VMAs we set VM_HUGEPAGE for? I note in __mmap_new_vma() we do so for non-anon VMAs _prior_ to invoking vma_set_thp_policy(). I _really_ hate putting in conditional logic like this for specific cases. Feels very hacky. On Thu, May 15, 2025 at 05:47:34PM +0100, Usama Arif wrote: > > > On 15/05/2025 17:06, Lorenzo Stoakes wrote: > > >> Its doing the same as KSM as suggested by David. Does KSM break these tests? > >> Is there some specific test you can point to that I can run that is breaking > >> with this patch and not without it? > > > > They don't build, at all. I explain how you can attempt a build below. > > > > Ah yes, I initially thought by break you meant they were failing. I saw that, > will fix it. > > > And no, KSM doesn't break the tests, because steps were taken to make them > > not break the tests :) I mean it's really easy - it's just adding some > > trivial stubs. > > > > Yes, will do Thanks! Thanks. > > > If you need help with it just ping me in whatever way helps and I can help! > > > > It's understandable as it's not necessarily clear this is a thing (maybe we > > need self tests to build it, but that might break CI setups so unclear). > > > > The merging is much more important! To re-emphasise ^ the merging is key - I will unfortunately have to NACK any series that breaks it. So this is an absolute requirement here. > > > >> > >> > >>> I really feel this series needs to be an RFC until we can get some > >>> consensus on how to approach this. > >> > >> There was consensus in https://lore.kernel.org/all/97702ff0-fc50-4779-bfa8-83dc42352db1@xxxxxxxxxx/ > > > > I disagree with this asssessment, that doesn't look like consensus at all, > > I think at least this is a very contentious or at least _complicated_ topic > > that we need to really dig into. > > > > So in my view - it's this kind of situation that warrants an RFC until > > there's some stabilisation and agreement on a way forward. > > > > Sure will change next revision to RFC, unless hopefully maybe we can > get a consensus in this revision :) Thanks! > > >> > >>> > >>> On Thu, May 15, 2025 at 02:33:30PM +0100, Usama Arif wrote: > >>>> This is set via the new PR_SET_THP_POLICY prctl. > >>> > >>> What is? > >>> > >>> You're making very major changes here, including adding a new flag to > >>> mm_struct (!!) and the explanation/justification for this is missing. > >>> > >> > >> I have added the justification in your reply to the coverletter. > > > > As stated there, you've not explained why alternatives are unworkable, I > > think we need this! > > > > Sort of: > > > > 1. Why not cgroups? blah blah blah > > 2. Why not process_madvise()? blah blah blah > > 3. Why not bpf? blah blah blah > > 4. Why not <something I've not thought of>? blah blah blah > > > > I will add this in the next cover letter. Thanks > > > >> > >>>> This will set the MMF2_THP_VMA_DEFAULT_HUGE process flag > >>>> which changes the default of new VMAs to be VM_HUGEPAGE. The > >>>> call also modifies all existing VMAs that are not VM_NOHUGEPAGE > >>>> to be VM_HUGEPAGE. The policy is inherited during fork+exec. > >>> > >>> So you can only set this flag? > >>> > >> > >> ?? > > > > This patch is only allowing the setting of this flag. I am asking 'so you > > can only set this flag?' > > > > To which it appears the answer is, yes I think :) > > > > An improved cover letter could say something like: > > > > " > > Here we implement the first flag intended to allow the _overriding_ of huge > > page policy to ensure that, when > > /sys/kernel/mm/transparent_hugepage/enabled is set to madvise, we are able > > to maintain fine-grained control of individual processes, including any > > fork/exec'd, by setting this flag. > > > > In subsequent commits, we intend to permit further such control. > > " > > > >> > >>>> > >>>> This allows systems where the global policy is set to "madvise" > >>>> to effectively have THPs always for the process. In an environment > >>>> where different types of workloads are stacked on the same machine, > >>>> this will allow workloads that benefit from always having hugepages > >>>> to do so, without regressing those that don't. > >>> > >>> Again, this explanation really makes no sense at all to me, I don't really > >>> know what you mean, you're not going into what you're doing in this change, > >>> this is just a very unclear commit message. > >>> > >> > >> I hope this is answered in my reply to your coverletter. > > > > You still need to improve the cover letter here I think, see above for a > > suggestion! > > > > Sure, will do in the next revision, Thanks! Thanks > >> > >>>> > >>>> Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx> > >>>> --- > >>>> include/linux/huge_mm.h | 3 ++ > >>>> include/linux/mm_types.h | 11 +++++++ > >>>> include/uapi/linux/prctl.h | 4 +++ > >>>> kernel/fork.c | 1 + > >>>> kernel/sys.c | 21 ++++++++++++ > >>>> mm/huge_memory.c | 32 +++++++++++++++++++ > >>>> mm/vma.c | 2 ++ > >>>> tools/include/uapi/linux/prctl.h | 4 +++ > >>>> .../trace/beauty/include/uapi/linux/prctl.h | 4 +++ > >>>> 9 files changed, 82 insertions(+) > >>>> > >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>> index 2f190c90192d..e652ad9ddbbd 100644 > >>>> --- a/include/linux/huge_mm.h > >>>> +++ b/include/linux/huge_mm.h > >>>> @@ -260,6 +260,9 @@ static inline unsigned long thp_vma_suitable_orders(struct vm_area_struct *vma, > >>>> return orders; > >>>> } > >>>> > >>>> +void vma_set_thp_policy(struct vm_area_struct *vma); > >>> > >>> This is a VMA-specific function but you're putting it in huge_mm.h? Why > >>> can't > >> this be in vma.h or vma.c? > >>> > >> > >> Sure can move it there. > >> > >>>> +void process_vmas_thp_default_huge(struct mm_struct *mm); > >>> > >>> 'vmas' is redundant here. > >>> > >> > >> Sure. > >>>> + > >>>> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > >>>> unsigned long vm_flags, > >>>> unsigned long tva_flags, > >>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > >>>> index e76bade9ebb1..2fe93965e761 100644 > >>>> --- a/include/linux/mm_types.h > >>>> +++ b/include/linux/mm_types.h > >>>> @@ -1066,6 +1066,7 @@ struct mm_struct { > >>>> mm_context_t context; > >>>> > >>>> unsigned long flags; /* Must use atomic bitops to access */ > >>>> + unsigned long flags2; > >>> > >>> > >>> Ugh, god really?? > >>> > >>> I really am not a fan of adding flags2 just to add a prctl() feature like > >>> this. This is crazy. > >>> > >>> Also this is a TERRIBLE name. I mean, no please PLEASE no. > >>> > >>> Do we really have absolutely no choice but to add a new flags field here? > >>> > >>> It again doesn't help that you don't mention nor even try to justify this > >>> in the commit message or cover letter. > >>> > >> > >> And again, I hope my reply to your email has given you the justification. > > > > No :) I understood why you did this though of course. > > > >> > >>> If this is a 32-bit kernel vs. 64-bit kernel thing so we 'ran out of bits', > >>> let's just go make this flags field 64-bit on 32-bit kernels. > >>> > >>> I mean - I'm kind of insisting we do that to be honest. Because I really > >>> don't like this. > >> > >> > >> If the maintainers want this, I will make it a 64 bit only feature. We > >> are only using it for 64 bit servers. But it will probably mean ifdef > >> config 64 bit in a lot of places. > > > > I'm going to presume you are including me in this category rather than > > implying that you are deferring only to others :) > > > > Yes ofcourse! I mean all maintainers :) > > And hopefully everyone else as well :) > > > So, there's another option: > > > > Have a prerequisite series that makes mm_struct->flags 64-bit on 32-bit > > kernels, which solves this problem everywhere and avoids us wasting a bunch > > of memory for a very specific usecase, splitting flag state across 2 fields > > (which are no longer atomic as a whole of course), adding confusion, > > possibly subtly breaking anywhere that assumes mm->flags completely > > describes mm-granularity flag state etc. > > > > This is probably a very basic question, but by make mm_struct->flags 64-bit on 32-bit > do you mean convert flags to unsigned long long when !CONIFG_64BIT? Yes. This would be a worthwhile project in its own right. I will be changing vma->vm_flags in the same way soon. > > > The RoI here is not looking good, otherwise. > > > >> > >>> > >>> Also if we _HAVE_ to have this, shouldn't we duplicate that comment about > >>> atomic bitops?... > >>> > >> > >> Sure > >> > >>>> > >>>> #ifdef CONFIG_AIO > >>>> spinlock_t ioctx_lock; > >>>> @@ -1744,6 +1745,11 @@ enum { > >>>> MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\ > >>>> MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK) > >>>> > >>>> +#define MMF2_THP_VMA_DEFAULT_HUGE 0 > >>> > >>> I thought the whole idea was to move away from explicitly refrencing 'THP' > >>> in a future where large folios are implicit and now we're saying 'THP'. > >>> > >>> Anyway the 'VMA' is totally redundant here. > >>> > >> > >> Sure, I can remove VMA. > >> I see THP everywhere in the kernel code. > >> Its mentioned 108 times in transhuge.rst alone :) > >> If you have any suggestion to rename this flag, happy to take it :) > > > > Yeah I mean it's a mess man, and it's not your fault... Again naming is > > hard, I put a suggestion in reply to cover letter anyway... > > > >> > >>>> +#define MMF2_THP_VMA_DEFAULT_HUGE_MASK (1 << MMF2_THP_VMA_DEFAULT_HUGE) > >>> > >>> Do we really need explicit trivial mask declarations like this? > >>> > >> > >> I have followed the convention that has existed in this file, please see below > >> links :) > >> https://elixir.bootlin.com/linux/v6.14.6/source/include/linux/mm_types.h#L1645 > >> https://elixir.bootlin.com/linux/v6.14.6/source/include/linux/mm_types.h#L1623 > >> https://elixir.bootlin.com/linux/v6.14.6/source/include/linux/mm_types.h#L1603 > >> https://elixir.bootlin.com/linux/v6.14.6/source/include/linux/mm_types.h#L1582 > > > > Ack, yuck but ack. > > > >> > >> > >>>> + > >>>> +#define MMF2_INIT_MASK (MMF2_THP_VMA_DEFAULT_HUGE_MASK) > >>> > >>>> + > >>>> static inline unsigned long mmf_init_flags(unsigned long flags) > >>>> { > >>>> if (flags & (1UL << MMF_HAS_MDWE_NO_INHERIT)) > >>>> @@ -1752,4 +1758,9 @@ static inline unsigned long mmf_init_flags(unsigned long flags) > >>>> return flags & MMF_INIT_MASK; > >>>> } > >>>> > >>>> +static inline unsigned long mmf2_init_flags(unsigned long flags) > >>>> +{ > >>>> + return flags & MMF2_INIT_MASK; > >>>> +} > >>>> + > >>>> #endif /* _LINUX_MM_TYPES_H */ > >>>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > >>>> index 15c18ef4eb11..325c72f40a93 100644 > >>>> --- a/include/uapi/linux/prctl.h > >>>> +++ b/include/uapi/linux/prctl.h > >>>> @@ -364,4 +364,8 @@ struct prctl_mm_map { > >>>> # define PR_TIMER_CREATE_RESTORE_IDS_ON 1 > >>>> # define PR_TIMER_CREATE_RESTORE_IDS_GET 2 > >>>> > >>>> +#define PR_SET_THP_POLICY 78 > >>>> +#define PR_GET_THP_POLICY 79 > >>>> +#define PR_THP_POLICY_DEFAULT_HUGE 0 > >>>> + > >>>> #endif /* _LINUX_PRCTL_H */ > >>>> diff --git a/kernel/fork.c b/kernel/fork.c > >>>> index 9e4616dacd82..6e5f4a8869dc 100644 > >>>> --- a/kernel/fork.c > >>>> +++ b/kernel/fork.c > >>>> @@ -1054,6 +1054,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, > >>>> > >>>> if (current->mm) { > >>>> mm->flags = mmf_init_flags(current->mm->flags); > >>>> + mm->flags2 = mmf2_init_flags(current->mm->flags2); > >>>> mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK; > >>>> } else { > >>>> mm->flags = default_dump_filter; > >>>> diff --git a/kernel/sys.c b/kernel/sys.c > >>>> index c434968e9f5d..1115f258f253 100644 > >>>> --- a/kernel/sys.c > >>>> +++ b/kernel/sys.c > >>>> @@ -2658,6 +2658,27 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > >>>> clear_bit(MMF_DISABLE_THP, &me->mm->flags); > >>>> mmap_write_unlock(me->mm); > >>>> break; > >>>> + case PR_GET_THP_POLICY: > >>>> + if (arg2 || arg3 || arg4 || arg5) > >>>> + return -EINVAL; > >>>> + if (!!test_bit(MMF2_THP_VMA_DEFAULT_HUGE, &me->mm->flags2)) > >>> > >>> I really don't think we need the !!? Do we? > >> > >> I have followed the convention that has existed in this file already, > >> please see: > >> https://elixir.bootlin.com/linux/v6.14.6/source/kernel/sys.c#L2644 > > > > OK, but please don't, I don't see why this is necessary. if (truthy) is > > fine. > > > > Unless somebody has a really good reason why this is necessary, it's just > > ugly ceremony. > > > > Agreed :) Thanks > > >> > >>> > >>> Shouldn't we lock the mm when we do this no? Can't somebody change this? > >>> > >> > >> It wasn't locked in PR_GET_THP_DISABLE > >> https://elixir.bootlin.com/linux/v6.14.6/source/kernel/sys.c#L2644 > >> > >> I can acquire do mmap_write_lock_killable the same as PR_SET_THP_POLICY > >> in the next series. > >> > >> I can also add the lock in PR_GET_THP_DISABLE. > > > > Well, the issue I guess is... if the flags field is atomic, and we know > > over this call maybe we can rely on mm sticking around, then we probalby > > don't need an mmap lock actually. > > > >> > >>>> + error = PR_THP_POLICY_DEFAULT_HUGE; > > > > Wait, error = PR_THP_POLICY_DEFAULT_HUGE? Is this the convention for > > returning here? :) > > I see a few of the PR_GET_.. setting the return value. I hope I didnt > misinterpret that. Yeah I thought it might be the case. I reemphasise my dislike of prctl(). > > > > >>>> + break; > >>>> + case PR_SET_THP_POLICY: > >>>> + if (arg3 || arg4 || arg5) > >>>> + return -EINVAL; > >>>> + if (mmap_write_lock_killable(me->mm)) > >>>> + return -EINTR; > >>>> + switch (arg2) { > >>>> + case PR_THP_POLICY_DEFAULT_HUGE: > >>>> + set_bit(MMF2_THP_VMA_DEFAULT_HUGE, &me->mm->flags2); > >>>> + process_vmas_thp_default_huge(me->mm); > >>>> + break; > >>>> + default: > >>>> + return -EINVAL; > > > > Oh I just noticed - this is really broken - you're not unlocking the mmap() > > here on error... :) you definitely need to fix this. > > > > Ah yes, will do Thanks! Thanks > > >>>> + } > >>>> + mmap_write_unlock(me->mm); > >>>> + break; > >>>> case PR_MPX_ENABLE_MANAGEMENT: > >>>> case PR_MPX_DISABLE_MANAGEMENT: > >>>> /* No longer implemented: */ > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 2780a12b25f0..64f66d5295e8 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -98,6 +98,38 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > >>>> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > >>>> } > >>>> > >>>> +void vma_set_thp_policy(struct vm_area_struct *vma) > >>>> +{ > >>>> + struct mm_struct *mm = vma->vm_mm; > >>>> + > >>>> + if (test_bit(MMF2_THP_VMA_DEFAULT_HUGE, &mm->flags2)) > >>>> + vm_flags_set(vma, VM_HUGEPAGE); > >>>> +} > >>>> + > >>>> +static void vmas_thp_default_huge(struct mm_struct *mm) > >>>> +{ > >>>> + struct vm_area_struct *vma; > >>>> + unsigned long vm_flags; > >>>> + > >>>> + VMA_ITERATOR(vmi, mm, 0); > >>> > >>> This is a declaration, it should be grouped with declarations... > >>> > >> > >> Sure, will make the change in next version. > >> > >> Unfortunately checkpatch didn't complain. > > > > Checkpatch actually complains the other way :P it doesn't understand > > macros. > > > > So you'll start getting a warning here, which you can ignore. It sucks, but > > there we go. Making checkpatch.pl understand that would be a pain, probs. > > > >> > >>>> + for_each_vma(vmi, vma) { > >>>> + vm_flags = vma->vm_flags; > >>>> + if (vm_flags & VM_NOHUGEPAGE) > >>>> + continue; > >>> > >>> Literally no point in you putting vm_flags as a separate variable here. > >>> > >> > >> Sure, will make the change in next version. > > > > Thanks! > > > >> > >>> So if you're not overriding VM_NOHUGEPAGE, the whole point of this exercise > >>> is to override global 'never'? > >>> > >> > >> Again, I am not overriding never. > >> > >> hugepage_global_always and hugepage_global_enabled will evaluate to false > >> and you will not get a hugepage. > > > > Yeah, again ack, but I kind of hate that we set VM_HUGEPAGE everywhere even > > if the policy is never. > > > > And we now get into realms of: > > > > 'Hey I set prctl() to make everything huge pages, and PR_GET_THP_POLICY > > says I've set that, but nothing is huge? BUG???' > > > > Of course then you get into - if somebody sets it to never, do we go around > > and remove VM_HUGEPAGE and this MMF_ flag? > > > >> > >> > >>> I'm really concerned about this. > >>> > >>>> + vm_flags_set(vma, VM_HUGEPAGE); > >>>> + } > >>>> +} > >>> > >>> Do we have an mmap write lock established here? Can you confirm that? Also > >>> you should add an assert for that here. > >>> > >> > >> Yes I do, its only called in PR_SET_THP_POLICY where mmap_write lock was taken. > >> I can add an assert if it helps. > > > > It not only helps, it's utterly critical :) > > > > 'It's only called in xxx()' is famous last words for a programmer, because > > later somebody (maybe even your good self) calls it from somewhere else > > and... we've all been there... > > > > Thanks! Will do. Thanks. > >> > >>>> + > >>>> +void process_vmas_thp_default_huge(struct mm_struct *mm) > >>>> +{ > >>>> + if (test_bit(MMF2_THP_VMA_DEFAULT_HUGE, &mm->flags2)) > >>>> + return; > >>>> + > >>>> + set_bit(MMF2_THP_VMA_DEFAULT_HUGE, &mm->flags2); > >>>> + vmas_thp_default_huge(mm); > >>>> +} > >>>> + > >>>> + > >>>> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > >>>> unsigned long vm_flags, > >>>> unsigned long tva_flags, > >>>> diff --git a/mm/vma.c b/mm/vma.c > >>>> index 1f2634b29568..101b19c96803 100644 > >>>> --- a/mm/vma.c > >>>> +++ b/mm/vma.c > >>>> @@ -2476,6 +2476,7 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap) > >>>> if (!vma_is_anonymous(vma)) > >>>> khugepaged_enter_vma(vma, map->flags); > >>>> ksm_add_vma(vma); > >>>> + vma_set_thp_policy(vma); > >>> > >>> You're breaking VMA merging completely by doing this here... > >>> > >>> Now I can map one VMA with this policy set, then map another immediately > >>> next to it and - oops - no merge, ever, because the VM_HUGEPAGE flag is not > >>> set in the new VMA on merge attempt. > >>> > >>> I realise KSM is just as broken (grr) but this doesn't justify us > >>> completely breaking VMA merging here. > >> > >> I think this answers it. Its doing the same as KSM. > > > > Yes, but as I said there, it's not acceptable, at all. > > > > You're making it so litearlly VMA merging _does not happen at all_. That's > > unacceptable and might even break some workloads. > > > > You'll certainly cause very big kernel metadata usage. > > > > Consider: > > > > |-----------------------------|..................| > > | some VMA flags, VM_HUGEPAGE | proposed new VMA | > > |-----------------------------|..................| > > > > Now, because you set VM_HUGEPAGE _after any merge is attempted_, this will > > _always_ be fragmented, forever. > > > > So if __mmap_new_vma and do_brk_flags are called after merge attempt, > is it possible to vma_set_thp_policy (or do something similar) before > the merge attempt? > > Actually I just read your reply to the next block, so I think its ok? > Added more to the next block. > > I dont have any preference on where its put, so happy with putting this > earlier. Yeah, you can just do it earlier. But you maybe should just set the flag in the appropriate field rather than using the set flags helper. > > > > That's just not... acceptable. > > > > The fact KSM is broken this way doesn't make that OK. > > > > Especially on brk(), which now will _always_ allocate new VMAs for every > > brk() expansion which doesn't seem very efficient. > > > > It may also majorly degrade performance. > > > > That makes me think we need some perf testing for this ideally... > > > >> > >>> > >>> You need to set earlier than this. Then of course a driver might decide to > >>> override this, so maybe then we need to override that. > >>> > >>> But then we're getting into realms of changing fundamental VMA code _just > >>> for this feature_. > >>> > >>> Again I'm iffy about this. Very. > >>> > >>> Also you've broken the VMA userland tests here: > >>> > >>> $ cd tools/testing/vma > >>> $ make > >>> ... > >>> In file included from vma.c:33: > >>> ../../../mm/vma.c: In function ‘__mmap_new_vma’: > >>> ../../../mm/vma.c:2486:9: error: implicit declaration of function ‘vma_set_thp_policy’; did you mean ‘vma_dup_policy’? [-Wimplicit-function-declaration] > >>> 2486 | vma_set_thp_policy(vma); > >>> | ^~~~~~~~~~~~~~~~~~ > >>> | vma_dup_policy > >>> make: *** [<builtin>: vma.o] Error 1 > >>> > >>> You need to create stubs accordingly. > >>> > >> > >> Thanks will do. > > > > Thanks! > > > >> > >>>> *vmap = vma; > >>>> return 0; > >>>> > >>>> @@ -2705,6 +2706,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, > >>>> mm->map_count++; > >>>> validate_mm(mm); > >>>> ksm_add_vma(vma); > >>>> + vma_set_thp_policy(vma); > >>> > >>> You're breaking merging again... This is quite a bad case too as now you'll > >>> have totally fragmented brk VMAs no? > >>> > >> > >> Again doing it the same as KSM. > > > > That doesn't make it ok. Just because KSM is broken doesn't make this ok. I > > mean grr at KSM :) I'm going to look into that and see about > > investigating/fixing that behaviour. > > > > obviously I can't accept anything that will fundamentally break VMA > > merging. > > > > Ofcourse! > > > The answer really is to do this earlier, but you risk a driver overriding > > it, but that's OK I think (I don't even think any in-tree ones do actually > > _anywhere_ - and yes I was literally reading through _every single_ .mmap() > > callback lately because I am quite obviously insane ;) > > > > Again I can help with this. > > > > Appreaciate it! > > I am actually not familiar with the merge code. I will try and have a look, > but if you could give a pointer to the file:line after which its not acceptable > to have and I can move vma_set_thp_policy to before it or try and do something > similar to that. Ack. I wrote the latest merge and mmap() code so am well placed on this :>) But I don't think we should use vma_set_thp_policy() in these places, we should just set the flag, to avoid trying to do a write lock etc. etc., plus we want to set the flag in a place that's not a VMA yet in both cases. So we'd need something like in do_mmap(): + vm_flags |= mm_implied_vma_flags(mm); addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); Where mm_implied_vma_flags() reads the MMF flags and sees if any imply VMA flags. But we have something for that already don't we? mm->def_flags. Can't we use that actually? That should work for mmap too? > > >> > >>> We can't have it implemented this way. > >>> > >>>> out: > >>>> perf_event_mmap(vma); > >>>> mm->total_vm += len >> PAGE_SHIFT; > >>>> diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h > >>>> index 35791791a879..f5945ebfe3f2 100644 > >>>> --- a/tools/include/uapi/linux/prctl.h > >>>> +++ b/tools/include/uapi/linux/prctl.h > >>>> @@ -328,4 +328,8 @@ struct prctl_mm_map { > >>>> # define PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC 0x10 /* Clear the aspect on exec */ > >>>> # define PR_PPC_DEXCR_CTRL_MASK 0x1f > >>>> > >>>> +#define PR_SET_THP_POLICY 78 > >>>> +#define PR_GET_THP_POLICY 79 > >>>> +#define PR_THP_POLICY_DEFAULT_HUGE 0 > >>>> + > >>>> #endif /* _LINUX_PRCTL_H */ > >>>> diff --git a/tools/perf/trace/beauty/include/uapi/linux/prctl.h b/tools/perf/trace/beauty/include/uapi/linux/prctl.h > >>>> index 15c18ef4eb11..325c72f40a93 100644 > >>>> --- a/tools/perf/trace/beauty/include/uapi/linux/prctl.h > >>>> +++ b/tools/perf/trace/beauty/include/uapi/linux/prctl.h > >>>> @@ -364,4 +364,8 @@ struct prctl_mm_map { > >>>> # define PR_TIMER_CREATE_RESTORE_IDS_ON 1 > >>>> # define PR_TIMER_CREATE_RESTORE_IDS_GET 2 > >>>> > >>>> +#define PR_SET_THP_POLICY 78 > >>>> +#define PR_GET_THP_POLICY 79 > >>>> +#define PR_THP_POLICY_DEFAULT_HUGE 0 > >>>> + > >>>> #endif /* _LINUX_PRCTL_H */ > >>>> -- > >>>> 2.47.1 > >>>> > >> >