On 20/05/2025 15:35, Lorenzo Stoakes wrote: > On Tue, May 20, 2025 at 03:32:16PM +0100, Usama Arif wrote: >> >> >> On 20/05/2025 15:22, Lorenzo Stoakes wrote: >>> On Tue, May 20, 2025 at 10:08:03PM +0800, Yafang Shao wrote: >>>> On Tue, May 20, 2025 at 9:10 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: >>>>> >>>>> On Tue, May 20, 2025 at 03:25:07PM +0800, Yafang Shao wrote: >>>>>> The challenge we face is that our system administration team doesn't >>>>>> permit enabling THP globally in production by setting it to "madvise" >>>>>> or "always". As a result, we can only experiment with your feature on >>>>>> our test servers at this stage. >>>>> >>>>> That's a you problem. >>>> >>>> perhaps. >>>> >>>>> You need to figure out how to influence your >>>>> sysadmin team to change their mind; whether it's by talking to their >>>>> superiors or persuading them directly. >>>> >>>> I believe that "practicing" matters more than "talking" or "persuading". >>>> I’m surprised your suggestion relies on "talking" ;-) >>>> If I understand correctly, we all agree that "talk is cheap", right? >>>> >>>>> It's not a justification for why >>>>> upstream should take this patch. >>>> >>>> I believe Johannes has clearly explained the challenges the community >>>> is currently facing [0]. >>>> >>>> [0]. https://lore.kernel.org/linux-mm/20250430174521.GC2020@xxxxxxxxxxx/ >>> >>> (Sorry to interject on your conversation, but :) >>> >>> I don't think anybody denies we have issues in configuring this stuff >>> sensibly. A global-only control isn't going to cut it in the real world it >>> seems. >>> >>> To me as you say yourself, definining the ABI/API here is what really matters, >>> and we're right now inundated with several series all at once (you wait for one >>> bus then 3 come at once... :). >>> >>> So this I think, should be the question. >>> >>> I like the idea of just exposing something like madvise(), which is something >>> we're going to maintain indefinitely. >>> >>> Though any such exposure would in my view would need to be opt-in i.e. have a >>> list of MADV_... options that are accepted, as we'd need to very cautiously >>> determine which are safe from this context. >>> >>> Of course then this leads to the whole thing (and I really know very little >>> about BPF internals - obviously happy to understand more) of whether we can just >>> use the madvise() code direct or what locking we can do or how all that works. >>> >>> At any rate, a custom thing that is specific as 'switch mode for mTHP pages of >>> size X to Y' is just something I'd rather us not tie ourselves to. >>> >>>> >>>> >>>> -- >>>> Regards >>>> >>>> Yafang >>> >>> What do you think re: bpf vs. something like my proposed process_madvise() >>> extensions or Usama's proposed prctl()? >>> >>> Simpler, but really just using madvise functionality and having a means of >>> defaulting across fork/exec (notwithstanding Jann's concerns in this area). >> >> Unfortunately I think the issue is that neither prctl or process_madvise would work >> for Yafangs usecase? Its usecase 3 mentioned in [1], i.e. >> global system policy=never, process wants "madvise" policy for itself. >> Will let Yafang confirm. >> >> [1] https://lore.kernel.org/all/13b68fa0-8755-43d8-8504-d181c2d46134@xxxxxxxxx/ >> > > Yeah I really object to that case. I explicitly said on your series I > object to it, I believe David did too. Yes, I am not for it as well, which is why my series never tried to do it :) As I mentioned in my series several times (unfortunately too many to count) hugepage_global_enabled always evaluated to false when THP is never. > > Never should mean never. > > It's a NACK if that's what this is about unless I'm missing something here. > > I agree global settings are not fine-grained enough, but 'sys admins refuse > to do X so we want to ignore what they do' is... really not right at all.