On Tue, May 20, 2025 at 10:08:03PM +0800, Yafang Shao wrote: > On Tue, May 20, 2025 at 9:10 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > On Tue, May 20, 2025 at 03:25:07PM +0800, Yafang Shao wrote: > > > The challenge we face is that our system administration team doesn't > > > permit enabling THP globally in production by setting it to "madvise" > > > or "always". As a result, we can only experiment with your feature on > > > our test servers at this stage. > > > > That's a you problem. > > perhaps. > > > You need to figure out how to influence your > > sysadmin team to change their mind; whether it's by talking to their > > superiors or persuading them directly. > > I believe that "practicing" matters more than "talking" or "persuading". > I’m surprised your suggestion relies on "talking" ;-) > If I understand correctly, we all agree that "talk is cheap", right? > > > It's not a justification for why > > upstream should take this patch. > > I believe Johannes has clearly explained the challenges the community > is currently facing [0]. > > [0]. https://lore.kernel.org/linux-mm/20250430174521.GC2020@xxxxxxxxxxx/ (Sorry to interject on your conversation, but :) I don't think anybody denies we have issues in configuring this stuff sensibly. A global-only control isn't going to cut it in the real world it seems. To me as you say yourself, definining the ABI/API here is what really matters, and we're right now inundated with several series all at once (you wait for one bus then 3 come at once... :). So this I think, should be the question. I like the idea of just exposing something like madvise(), which is something we're going to maintain indefinitely. Though any such exposure would in my view would need to be opt-in i.e. have a list of MADV_... options that are accepted, as we'd need to very cautiously determine which are safe from this context. Of course then this leads to the whole thing (and I really know very little about BPF internals - obviously happy to understand more) of whether we can just use the madvise() code direct or what locking we can do or how all that works. At any rate, a custom thing that is specific as 'switch mode for mTHP pages of size X to Y' is just something I'd rather us not tie ourselves to. > > > -- > Regards > > Yafang What do you think re: bpf vs. something like my proposed process_madvise() extensions or Usama's proposed prctl()? Simpler, but really just using madvise functionality and having a means of defaulting across fork/exec (notwithstanding Jann's concerns in this area).