On 22.07.25 04:40, Yafang Shao wrote:
On Sun, Jul 20, 2025 at 11:56 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
We discussed this yesterday at a THP upstream meeting, and what we
should look into is:
(1) Having a callback like
unsigned int (*get_suggested_order)(.., bool in_pagefault);
This interface meets our needs precisely, enabling allocation orders
of either 0 or 9 as required by our workloads.
Where we can provide some information about the fault (vma
size/flags/anon_name), and whether we are in the page fault (or in
khugepaged).
Maybe we want a bitmap of orders to try (fallback), not sure yet.
(2) Having some way to tag these callbacks as "this is absolutely
unstable for now and can be changed as we please.".
BPF has already helped us complete this, so we don’t need to implement
this restriction.
Note that all BPF kfuncs (including struct_ops) are currently unstable
and may change in the future.
> > Alexei, could you confirm this understanding?
Every MM person I talked to about this was like "as soon as it's
actively used out there (e.g., a distro supports it), there is no way
you can easily change these callbacks ever again - it will just silently
become stable."
That is actually the biggest concern from the MM side: being stuck with
an interface that was promised to be "unstable" but suddenly it's
not-so-unstable anymore, and we have to support something that is very
likely to be changed in the future.
Which guarantees do we have in the regard?
How can we make it clear to anybody using this specific interface that
"if you depend on this being stable, you should learn how to read and
you are to blame, not the MM people" ?
As explained in the kernel document [0]:
kfuncs provide a kernel <-> kernel API, and thus are not bound by any
of the strict stability restrictions associated with kernel <-> user
UAPIs. This means they can be thought of as similar to
EXPORT_SYMBOL_GPL, and can therefore be modified or removed by a
maintainer of the subsystem they’re defined in when it’s deemed
necessary.
[0] https://docs.kernel.org/bpf/kfuncs.html#bpf-kfunc-lifecycle-expectations
That said, users of BPF kfuncs should treat them as inherently
unstable and take responsibility for verifying their compatibility
when switching kernel versions. However, this does not imply that BPF
kfuncs can be modified arbitrarily.
For widely adopted kfuncs that deliver substantial value, changes
should be made cautiously—preferably through backward-compatible
extensions to ensure continued functionality across new kernel
versions. Removal should only be considered in exceptional cases, such
as:
- Severe, unfixable issues within the kernel
- Maintenance burdens that block new features or critical improvements.
And that is exactly what we are worried about.
You don't know beforehand whether something will be "widely adopted".
Even if there is the "A kfunc will never have any hard stability
guarantees." in there.
The concerning bit is:
"kfuncs that are widely used or have been in the kernel for a long time
will be more difficult to justify being changed or removed by a
maintainer. "
Just no. Not going to happen for the kfuncs we know upfront (like here)
will stand in our way in the future at some point and *will* be changed
one way or another.
So for these kfuncs I want a clear way of expressing "whatever the
kfuncs doc says, this here is completely unstable even if widely used"
--
Cheers,
David / dhildenb