On Tue, Jul 22, 2025 at 8:05 PM Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote: > > On Tue, Jul 22, 2025 at 07:56:21PM +0800, Yafang Shao wrote: > > On Tue, Jul 22, 2025 at 6:09 PM Lorenzo Stoakes > > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > > > > On Tue, Jul 22, 2025 at 09:28:02AM +0200, David Hildenbrand wrote: > > > > On 22.07.25 04:40, Yafang Shao wrote: > > > > > On Sun, Jul 20, 2025 at 11:56 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > > > We discussed this yesterday at a THP upstream meeting, and what we > > > > > > > > should look into is: > > > > > > > > > > > > > > > > (1) Having a callback like > > > > > > > > > > > > > > > > unsigned int (*get_suggested_order)(.., bool in_pagefault); > > > > > > > > > > > > > > This interface meets our needs precisely, enabling allocation orders > > > > > > > of either 0 or 9 as required by our workloads. > > > > > > That's great to hear, and to be clear my views align with David on this - I > > > feel like having a _carefully chosen_ BPF interface could be valuable here, > > > especially in the short to medium term where it will allow us to more > > > rapidly iterate on an automated [m]THP mechanism. > > > > > > I think one key question here is - do we want to retain a _permanent_ BPF > > > hook here? > > > > > > In any cae, for the first experiments with this we absolutely _must_ be > > > able to express that this is going away, NO, not based on whether it's > > > widely used, it IS going away. > > > > If this BPF kfunc provides clear user value with minimal maintenance > > overhead, what would be the rationale for removing it? That said, if > > you and David both agree it should be deprecated, I won't object - > > though I'd suggest following the standard deprecation process. > > You see herein lies the problem... :) from my point of view we want to have > the ability to choose, fundamentally. > > We may find out the proposed interface is unworkable, or sets assumptions > we don't want to make. > > So I think hiding ehhind a CONFIG_ flag is the best idea here to really > enforce that and make it clear. > > Personally I have a sense that we _will_ introduce something permanent. We > just need to have the 'space' to positively decide to do that once the > experimentation is complete. Thanks for your explanation. > > > > I find this documentation super contradictory. I'm sorry but you can't > > > have: > > > > > > "...can therefore be modified or removed by a maintainer of the subsystem > > > they’re defined in when it’s deemed necessary." > > > > > > And: > > > > > > "kfuncs that are widely used or have been in the kernel for a long time > > > will be more difficult to justify being changed or removed by a > > > maintainer." > > > > > > At the same time. Let alone: > > > > > > "A kfunc will never have any hard stability guarantees. BPF APIs cannot and > > > will not ever hard-block a change in the kernel purely for stability > > > reasons" > > > > > > Make your mind up!! > > > > > > I mean the EXPORT_SYMBOL_GPL() example isn't accurate AT ALL - we can > > > _absolutely_ change or remove those _at will_ as we don't care about > > > external modules. > > > > > > Really this seems to be saying, in not so many words, that this is > > > basically a kAPI and you can't change it. > > > > > > So this strictly violates what we need here. > > > > The maintainers have the authority to make the final determination ;-) > > Well the kernel doesn't entirely work this way... pressure can come which > impacts what others may do. > > If you have people saying 'hey we rely on this and removing it will break > our cloud deployment' and 'hey I checked the docs and it says you guys have > to take this into account', I am not so sure Andrew or Linus will accept > the patch. understood. > > > > I wonder if we can use a CONFIG_xxx and put this behind that, which > > > specifically says 'WE WILL REMOVE THIS' > > > CONFIG_EXPERIMENTAL_DO_NOT_USE_THP_THINGY :P > > > > That's a reasonable suggestion. We could implement this function under > > CONFIG_EXPERIMENTAL to mark it as experimental infrastructure. > > Thanks! Yes, I was looking for this flag :P didn't know if we still had > that or not actually... > > But, yeah, putting it behind that explicitly also makes it very clearly. > > CONFIG_EXPERIMENTAL_BPF_FAULT_ORDER relies on CONFIG_EXPERIMENTAL makes it > you know... pretty clear ;) Agreed. Let's move forward with the CONFIG_EXPERIMENTAL_BPF_FAULT_ORDER option. -- Regards Yafang