Re: [RFC PATCH 0/4] mm, bpf: BPF based THP adjustment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Yafang Shao <laoar.shao@xxxxxxxxx> [250430 11:37]:
> On Wed, Apr 30, 2025 at 11:21 PM Liam R. Howlett
> <Liam.Howlett@xxxxxxxxxx> wrote:
> >
> > * Zi Yan <ziy@xxxxxxxxxx> [250430 11:01]:
> >
> > ...
> >
> > > >>>>> Since multiple services run on a single host in a containerized environment,
> > > >>>>> enabling THP globally is not ideal. Previously, we set THP to madvise,
> > > >>>>> allowing selected services to opt in via MADV_HUGEPAGE. However, this
> > > >>>>> approach had limitation:
> > > >>>>>
> > > >>>>> - Some services inadvertently used madvise(MADV_HUGEPAGE) through
> > > >>>>>   third-party libraries, bypassing our restrictions.
> > > >>>>
> > > >>>> Basically, you want more precise control of THP enablement and the
> > > >>>> ability of overriding madvise() from userspace.
> > > >>>>
> > > >>>> In terms of overriding madvise(), do you have any concrete example of
> > > >>>> these third-party libraries? madvise() users are supposed to know what
> > > >>>> they are doing, so I wonder why they are causing trouble in your
> > > >>>> environment.
> > > >>>
> > > >>> To my knowledge, jemalloc [0] supports THP.
> > > >>> Applications using jemalloc typically rely on its default
> > > >>> configurations rather than explicitly enabling or disabling THP. If
> > > >>> the system is configured with THP=madvise, these applications may
> > > >>> automatically leverage THP where appropriate
> > > >>>
> > > >>> [0]. https://github.com/jemalloc/jemalloc
> > > >>
> > > >> It sounds like a userspace issue. For jemalloc, if applications require
> > > >> it, can't you replace the jemalloc with a one compiled with --disable-thp
> > > >> to work around the issue?
> > > >
> > > > That’s not the issue this patchset is trying to address or work
> > > > around. I believe we should focus on the actual problem it's meant to
> > > > solve.
> > > >
> > > > By the way, you might not raise this question if you were managing a
> > > > large fleet of servers. We're a platform provider, but we don’t
> > > > maintain all the packages ourselves. Users make their own choices
> > > > based on their specific requirements. It's not a feasible solution for
> > > > us to develop and maintain every package.
> > >
> > > Basically, user wants to use THP, but as a service provider, you think
> > > differently, so want to override userspace choice. Am I getting it right?
> >
> > Who is the platform provider in question?  It makes me uneasy to have
> > such claims from an @gmail account with current world events..
> 
> It’s a small company based in China, called PDD—if that information is helpful.

Thanks.

> 
> >
> > ...
> >
> > > >>>
> > > >>> I chose not to include this in the self-tests to avoid the complexity
> > > >>> of setting up cgroups for testing purposes. However, in patch #4 of
> > > >>> this series, I've included a simpler example demonstrating task-level
> > > >>> control.
> > > >>
> > > >> For task-level control, why not using prctl(PR_SET_THP_DISABLE)?
> > > >
> > > > You’ll need to modify the user-space code—and again, this likely
> > > > wouldn’t be a concern if you were managing a large fleet of servers.
> > > >
> > > >>
> > > >>> For service-level control, we could potentially utilize BPF task local
> > > >>> storage as an alternative approach.
> > > >>
> > > >> +cgroup people
> > > >>
> > > >> For service-level control, there was a proposal of adding cgroup based
> > > >> THP control[1]. You might need a strong use case to convince people.
> > > >>
> > > >> [1] https://lore.kernel.org/linux-mm/20241030083311.965933-1-gutierrez.asier@xxxxxxxxxxxxxxxxxxx/
> > > >
> > > > Thanks for the reference. I've reviewed the related discussion, and if
> > > > I understand correctly, the proposal was rejected by the maintainers.
> >
> > More of the point is why it was rejected.  Why is your motive different?
> >
> > >
> > > I wonder why your approach is better than the cgroup based THP control proposal.
> >
> > I think Matthew's response in that thread is pretty clear and still
> > relevant.
> 
> Are you refering
> https://lore.kernel.org/linux-mm/ZyT7QebITxOKNi_c@xxxxxxxxxxxxxxxxxxxx/
>  or https://lore.kernel.org/linux-mm/ZyIxRExcJvKKv4JW@xxxxxxxxxxxxxxxxxxxx/
> ?
> 
> If it’s the latter, then this patchset aims to make sysadmins' lives easier.

Both, really.  Your patch gives the sysadm another knob to turn and know
when to turn it.  Matthew is suggesting we should know when to do the
right thing and avoid a knob in the first place.

> 
> > If it isn't, can you state why?
> >
> > The main difference is that you are saying it's in a container that you
> > don't control.  Your plan is to violate the control the internal
> > applications have over THP because you know better.  I'm not sure how
> > people might feel about you messing with workloads,
> 
> It’s not a mess. They have the option to deploy their services on
> dedicated servers, but they would need to pay more for that choice.
> This is a two-way decision.

This implies you want a container-level way of controlling the setting
and not a system service-level?  I guess I find the wording of the
problem statement unclear.

> 
> > but beyond that, you
> > are fundamentally fixing things at a sysadmin level because programmers
> > have made errors.
> 
> No, they’re not making mistakes—they simply focus on the
> implementation details of their own services and don’t find it
> worthwhile to dive into kernel internals. Their services run perfectly
> well with or without THP.
> 
> > You state as much in the cover letter, yes?
> 
> I’ll try to explain it in more detail in the next version if that
> would be helpful.

Yes, I think so.

Thanks,
Liam





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux