On Thu, May 01, 2025 at 12:06:31AM +0800, Yafang Shao wrote: > > > > If it isn't, can you state why? > > > > > > > > The main difference is that you are saying it's in a container that you > > > > don't control. Your plan is to violate the control the internal > > > > applications have over THP because you know better. I'm not sure how > > > > people might feel about you messing with workloads, > > > > > > It’s not a mess. They have the option to deploy their services on > > > dedicated servers, but they would need to pay more for that choice. > > > This is a two-way decision. > > > > This implies you want a container-level way of controlling the setting > > and not a system service-level? > > Right. We want to control the THP per container. This does strike me as a reasonable usecase. I think there is consensus that in the long-term we want this stuff to just work and truly be transparent to userspace. In the short-to-medium term, however, there are still quite a few caveats. thp=always can significantly increase the memory footprint of sparse virtual regions. Huge allocations are not as cheap and reliable as we would like them to be, which for real production systems means having to make workload-specifcic choices and tradeoffs. There is ongoing work in these areas, but we do have a bit of a chicken-and-egg problem: on the one hand, huge page adoption is slow due to limitations in how they can be deployed. For example, we can't do thp=always on a DC node that runs arbitary combinations of jobs from a wide array of services. Some might benefit, some might hurt. Yet, it's much easier to improve the kernel based on exactly such production experience and data from real-world usecases. We can't improve the THP shrinker if we can't run THP. So I don't see it as overriding whoever wrote the software running inside the container. They don't know, and they shouldn't have to care about page sizes. It's about letting admins and kernel teams get started on using and experimenting with this stuff, given the very real constraints right now, so we can get the feedback necessary to improve the situation.