On Tue, Jul 22, 2025 at 09:28:02AM +0200, David Hildenbrand wrote: > On 22.07.25 04:40, Yafang Shao wrote: > > On Sun, Jul 20, 2025 at 11:56 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > > > > > > > > > > > > We discussed this yesterday at a THP upstream meeting, and what we > > > > > should look into is: > > > > > > > > > > (1) Having a callback like > > > > > > > > > > unsigned int (*get_suggested_order)(.., bool in_pagefault); > > > > > > > > This interface meets our needs precisely, enabling allocation orders > > > > of either 0 or 9 as required by our workloads. That's great to hear, and to be clear my views align with David on this - I feel like having a _carefully chosen_ BPF interface could be valuable here, especially in the short to medium term where it will allow us to more rapidly iterate on an automated [m]THP mechanism. I think one key question here is - do we want to retain a _permanent_ BPF hook here? In any cae, for the first experiments with this we absolutely _must_ be able to express that this is going away, NO, not based on whether it's widely used, it IS going away. > > > > > > > > > > > > > > Where we can provide some information about the fault (vma > > > > > size/flags/anon_name), and whether we are in the page fault (or in > > > > > khugepaged). > > > > > > > > > > Maybe we want a bitmap of orders to try (fallback), not sure yet. > > > > > > > > > > (2) Having some way to tag these callbacks as "this is absolutely > > > > > unstable for now and can be changed as we please.". > > > > > > > > BPF has already helped us complete this, so we don’t need to implement > > > > this restriction. > > > > Note that all BPF kfuncs (including struct_ops) are currently unstable > > > > and may change in the future. > > > > > Alexei, could you confirm this understanding? > > > > > > Every MM person I talked to about this was like "as soon as it's > > > actively used out there (e.g., a distro supports it), there is no way > > > you can easily change these callbacks ever again - it will just silently > > > become stable." > > > > > > That is actually the biggest concern from the MM side: being stuck with > > > an interface that was promised to be "unstable" but suddenly it's > > > not-so-unstable anymore, and we have to support something that is very > > > likely to be changed in the future. > > > > > > Which guarantees do we have in the regard? > > > > > > How can we make it clear to anybody using this specific interface that > > > "if you depend on this being stable, you should learn how to read and > > > you are to blame, not the MM people" ? > > > > As explained in the kernel document [0]: > > > > kfuncs provide a kernel <-> kernel API, and thus are not bound by any > > of the strict stability restrictions associated with kernel <-> user > > UAPIs. This means they can be thought of as similar to > > EXPORT_SYMBOL_GPL, and can therefore be modified or removed by a > > maintainer of the subsystem they’re defined in when it’s deemed > > necessary. I find this documentation super contradictory. I'm sorry but you can't have: "...can therefore be modified or removed by a maintainer of the subsystem they’re defined in when it’s deemed necessary." And: "kfuncs that are widely used or have been in the kernel for a long time will be more difficult to justify being changed or removed by a maintainer." At the same time. Let alone: "A kfunc will never have any hard stability guarantees. BPF APIs cannot and will not ever hard-block a change in the kernel purely for stability reasons" Make your mind up!! I mean the EXPORT_SYMBOL_GPL() example isn't accurate AT ALL - we can _absolutely_ change or remove those _at will_ as we don't care about external modules. Really this seems to be saying, in not so many words, that this is basically a kAPI and you can't change it. So this strictly violates what we need here. > > > > [0] https://docs.kernel.org/bpf/kfuncs.html#bpf-kfunc-lifecycle-expectations > > > > That said, users of BPF kfuncs should treat them as inherently > > unstable and take responsibility for verifying their compatibility > > when switching kernel versions. However, this does not imply that BPF > > kfuncs can be modified arbitrarily. > > > > For widely adopted kfuncs that deliver substantial value, changes > > should be made cautiously—preferably through backward-compatible > > extensions to ensure continued functionality across new kernel > > versions. Removal should only be considered in exceptional cases, such > > as: > > - Severe, unfixable issues within the kernel > > - Maintenance burdens that block new features or critical improvements. > > And that is exactly what we are worried about. > > You don't know beforehand whether something will be "widely adopted". > > Even if there is the "A kfunc will never have any hard stability > guarantees." in there. > > The concerning bit is: > > "kfuncs that are widely used or have been in the kernel for a long time will > be more difficult to justify being changed or removed by a maintainer. " > > Just no. Not going to happen for the kfuncs we know upfront (like here) will > stand in our way in the future at some point and *will* be changed one way > or another. Yes, and the EXPORT*() example is plain wrong in that document. > > > So for these kfuncs I want a clear way of expressing "whatever the kfuncs > doc says, this here is completely unstable even if widely used" I wonder if we can use a CONFIG_xxx and put this behind that, which specifically says 'WE WILL REMOVE THIS' CONFIG_EXPERIMENTAL_DO_NOT_USE_THP_THINGY :P > > -- > Cheers, > > David / dhildenb > Cheers, Lorenzo