On 20.05.25 16:32, Usama Arif wrote:
On 20/05/2025 15:22, Lorenzo Stoakes wrote:
On Tue, May 20, 2025 at 10:08:03PM +0800, Yafang Shao wrote:
On Tue, May 20, 2025 at 9:10 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
On Tue, May 20, 2025 at 03:25:07PM +0800, Yafang Shao wrote:
The challenge we face is that our system administration team doesn't
permit enabling THP globally in production by setting it to "madvise"
or "always". As a result, we can only experiment with your feature on
our test servers at this stage.
That's a you problem.
perhaps.
You need to figure out how to influence your
sysadmin team to change their mind; whether it's by talking to their
superiors or persuading them directly.
I believe that "practicing" matters more than "talking" or "persuading".
I’m surprised your suggestion relies on "talking" ;-)
If I understand correctly, we all agree that "talk is cheap", right?
It's not a justification for why
upstream should take this patch.
I believe Johannes has clearly explained the challenges the community
is currently facing [0].
[0]. https://lore.kernel.org/linux-mm/20250430174521.GC2020@xxxxxxxxxxx/
(Sorry to interject on your conversation, but :)
I don't think anybody denies we have issues in configuring this stuff
sensibly. A global-only control isn't going to cut it in the real world it
seems.
To me as you say yourself, definining the ABI/API here is what really matters,
and we're right now inundated with several series all at once (you wait for one
bus then 3 come at once... :).
So this I think, should be the question.
I like the idea of just exposing something like madvise(), which is something
we're going to maintain indefinitely.
Though any such exposure would in my view would need to be opt-in i.e. have a
list of MADV_... options that are accepted, as we'd need to very cautiously
determine which are safe from this context.
Of course then this leads to the whole thing (and I really know very little
about BPF internals - obviously happy to understand more) of whether we can just
use the madvise() code direct or what locking we can do or how all that works.
At any rate, a custom thing that is specific as 'switch mode for mTHP pages of
size X to Y' is just something I'd rather us not tie ourselves to.
--
Regards
Yafang
What do you think re: bpf vs. something like my proposed process_madvise()
extensions or Usama's proposed prctl()?
Simpler, but really just using madvise functionality and having a means of
defaulting across fork/exec (notwithstanding Jann's concerns in this area).
Unfortunately I think the issue is that neither prctl or process_madvise would work
for Yafangs usecase? Its usecase 3 mentioned in [1], i.e.
global system policy=never, process wants "madvise" policy for itself.
If the global system policy would be "madvise", you'd need a way to just
disable it for processes where you wouldn't ever want them.
--
Cheers,
David / dhildenb