On Tue, 08 Jul 2025, Song Liu wrote: > Hi Christian, > > Thanks for your comments! > > > On Jul 7, 2025, at 4:17 AM, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > [...] > > >>> 3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback. > >> > >> I think that's fine. > > > > Ok, sorry for the delay but there's a lot of different things going on > > right now and this one isn't exactly an easy thing to solve. > > > > I mentioned this before and so did Neil: the lookup implementation > > supports two modes sleeping and non-sleeping. That api is abstracted > > away as heavily as possible by the VFS so that non-core code will not be > > exposed to it other than in exceptional circumstances and doesn't have > > to care about it. > > > > It is a conceptual dead-end to expose these two modes via separate APIs > > and leak this implementation detail into non-core code. It will not > > happen as far as I'm concerned. > > > > I very much understand the urge to get the refcount step-by-step thing > > merged asap. Everyone wants their APIs merged fast. And if it's > > reasonable to move fast we will (see the kernfs xattr thing). > > > > But here are two use-cases that ask for the same thing with different > > constraints that closely mirror our unified approach. Merging one > > quickly just to have something and then later bolting the other one on > > top, augmenting, or replacing, possible having to deprecate the old API > > is just objectively nuts. That's how we end up with a spaghetthi helper > > collection. We want as little helper fragmentation as possible. > > > > We need a unified API that serves both use-cases. I dislike > > callback-based APIs generally but we have precedent in the VFS for this > > for cases where the internal state handling is delicate enough that it > > should not be exposed (see __iterate_supers() which does exactly work > > like Neil suggested down to the flag argument itself I added). > > > > So I'm open to the callback solution. > > > > (Note for really absurd perf requirements you could even make it work > > with static calls I'm pretty sure.) > > I guess we will go with Mickaël’s idea: > > > int vfs_walk_ancestors(struct path *path, > > bool (*walk_cb)(const struct path *ancestor, void *data), > > void *data, int flags) > > > > The walk continue while walk_cb() returns true. walk_cb() can then > > check if @ancestor is equal to a @root, or other properties. The > > walk_cb() return value (if not bool) should not be returned by > > vfs_walk_ancestors() because a walk stop doesn't mean an error. > > If necessary, we hide “root" inside @data. This is good. > > > @path would be updated with latest ancestor path (e.g. @root). > > Update @path to the last ancestor and hold proper references. > I missed this part earlier. With this feature, vfs_walk_ancestors > should work usable with open-codeed bpf path iterator. I don't think path should be updated. That adds complexity which might not be needed. The original (landlock) requirements were only to look at each ancestor, not to take a reference to any of them. If the caller needs a reference to any of the ancestors I think that walk_cb() needs to take that reference and store it in data. Note that attempting to take the reference might fail. See legitimize_path() in fs/namei.c. It isn't yet clear to me what would be a good API for requesting the reference. One option would be for vfs_walk_ancestors() to pass another void* to walk_cb(), and it passed it on to vfs_legitimize_path() which extracts the seq numbers from there. Another might be that the path passed to walk_cb is always nameidata.path, and so when that is passed to vfs_legitimize_path() path it can use container_of() to find the seq numbers. If vfs_legitimize_path() fail, walk_cb() might want to ask for the walk to be restarted. > > I have a question about this behavior with RCU walk. IIUC, RCU > walk does not hold reference to @ancestor when calling walk_cb(). > If walk_cb() returns false, shall vfs_walk_ancestors() then > grab a reference on @ancestor? This feels a bit weird to me. > Maybe “updating @path to the last ancestor” should only apply to > LOOKUP_RCU==false case? > > > @flags could contain LOOKUP_RCU or not, which enables us to have > > walk_cb() not-RCU compatible. > > > > When passing LOOKUP_RCU, if the first call to vfs_walk_ancestors() > > failed with -ECHILD, the caller can restart the walk by calling > > vfs_walk_ancestors() again but without LOOKUP_RCU. > > > Given we want callers to handle -ECHILD and call vfs_walk_ancestors > again without LOOKUP_RCU, I think we should keep @path not changed > With LOOKUP_RCU==true, and only update it to the last ancestor > when LOOKUP_RCU==false. No, we really don't want to pass a LOOKUP_RCU() flag to vfs_walk_ancestors(). vfs_walk_ancestors() might choose to pass that flag to walk_cb(). NeilBrown