On Thu, 26 Jun 2025, Song Liu wrote: > > > > On Jun 25, 2025, at 6:05 PM, NeilBrown <neil@xxxxxxxxxx> wrote: > > [...] > > >> > >> I can't speak for Mickaël, but a callback-based interface is less flexible > >> (and _maybe_ less performant?). Also, probably we will want to fallback > >> to a reference-taking walk if the walk fails (rather than, say, retry > >> infinitely), and this should probably use Song's proposed iterator. I'm > >> not sure if Song would be keen to rewrite this iterator patch series in > >> callback style (to be clear, it doesn't necessarily seem like a good idea > >> to me, and I'm not asking him to), which means that we will end up with > >> the reference walk API being a "call this function repeatedly", and the > >> rcu walk API taking a callback. I think it is still workable (after all, > >> if Landlock wants to reuse the code in the callback it can just call the > >> callback function itself when doing the reference walk), but it seems a > >> bit "ugly" to me. > > > > call-back can have a performance impact (less opportunity for compiler > > optimisation and CPU speculation), though less than taking spinlock and > > references. However Al and Christian have drawn a hard line against > > making seq numbers visible outside VFS code so I think it is the > > approach most likely to be accepted. > > > > Certainly vfs_walk_ancestors() would fallback to ref-walk if rcu-walk > > resulted in -ECHILD - just like all other path walking code in namei.c. > > This would be largely transparent to the caller - the caller would only > > see that the callback received a NULL path indicating a restart. It > > wouldn't need to know why. > > I guess I misunderstood the proposal of vfs_walk_ancestors() > initially, so some clarification: > > I think vfs_walk_ancestors() is good for the rcu-walk, and some > rcu-then-ref-walk. However, I don’t think it fits all use cases. > A reliable step-by-step ref-walk, like this set, works well with > BPF, and we want to keep it. The distinction between rcu-walk and ref-walk is an internal implementation detail. You as a caller shouldn't need to think about the difference. You just want to walk. Note that LOOKUP_RCU is documented in namei.h as "semi-internal". The only uses outside of core-VFS code is in individual filesystem's d_revalidate handler - they are checking if they are allowed to sleep or not. You should never expect to pass LOOKUP_RCU to an VFS API - no other code does. It might be reasonable for you as a caller to have some control over whether the call can sleep or not. LOOKUP_CACHED is a bit like that. But for dotdot lookup the code will never sleep - so that is not relevant. I strongly suggest you stop thinking about rcu-walk vs ref-walk. Think about the needs of your code. If you need a high-performance API, then ask for a high-performance API, don't assume what form it will take or what the internal implementation details will be. I think you already have a clear answer that a step-by-step API will not be read-only on the dcache (i.e. it will adjust refcounts) and so will not be high performance. If you want high performance, you need to accept a different style of API. NeilBrown