Hi Christian, Thanks for your comments! > On Jul 7, 2025, at 4:17 AM, Christian Brauner <brauner@xxxxxxxxxx> wrote: [...] >>> 3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback. >> >> I think that's fine. > > Ok, sorry for the delay but there's a lot of different things going on > right now and this one isn't exactly an easy thing to solve. > > I mentioned this before and so did Neil: the lookup implementation > supports two modes sleeping and non-sleeping. That api is abstracted > away as heavily as possible by the VFS so that non-core code will not be > exposed to it other than in exceptional circumstances and doesn't have > to care about it. > > It is a conceptual dead-end to expose these two modes via separate APIs > and leak this implementation detail into non-core code. It will not > happen as far as I'm concerned. > > I very much understand the urge to get the refcount step-by-step thing > merged asap. Everyone wants their APIs merged fast. And if it's > reasonable to move fast we will (see the kernfs xattr thing). > > But here are two use-cases that ask for the same thing with different > constraints that closely mirror our unified approach. Merging one > quickly just to have something and then later bolting the other one on > top, augmenting, or replacing, possible having to deprecate the old API > is just objectively nuts. That's how we end up with a spaghetthi helper > collection. We want as little helper fragmentation as possible. > > We need a unified API that serves both use-cases. I dislike > callback-based APIs generally but we have precedent in the VFS for this > for cases where the internal state handling is delicate enough that it > should not be exposed (see __iterate_supers() which does exactly work > like Neil suggested down to the flag argument itself I added). > > So I'm open to the callback solution. > > (Note for really absurd perf requirements you could even make it work > with static calls I'm pretty sure.) I guess we will go with Mickaël’s idea: > int vfs_walk_ancestors(struct path *path, > bool (*walk_cb)(const struct path *ancestor, void *data), > void *data, int flags) > > The walk continue while walk_cb() returns true. walk_cb() can then > check if @ancestor is equal to a @root, or other properties. The > walk_cb() return value (if not bool) should not be returned by > vfs_walk_ancestors() because a walk stop doesn't mean an error. If necessary, we hide “root" inside @data. This is good. > @path would be updated with latest ancestor path (e.g. @root). Update @path to the last ancestor and hold proper references. I missed this part earlier. With this feature, vfs_walk_ancestors should work usable with open-codeed bpf path iterator. I have a question about this behavior with RCU walk. IIUC, RCU walk does not hold reference to @ancestor when calling walk_cb(). If walk_cb() returns false, shall vfs_walk_ancestors() then grab a reference on @ancestor? This feels a bit weird to me. Maybe “updating @path to the last ancestor” should only apply to LOOKUP_RCU==false case? > @flags could contain LOOKUP_RCU or not, which enables us to have > walk_cb() not-RCU compatible. > > When passing LOOKUP_RCU, if the first call to vfs_walk_ancestors() > failed with -ECHILD, the caller can restart the walk by calling > vfs_walk_ancestors() again but without LOOKUP_RCU. Given we want callers to handle -ECHILD and call vfs_walk_ancestors again without LOOKUP_RCU, I think we should keep @path not changed With LOOKUP_RCU==true, and only update it to the last ancestor when LOOKUP_RCU==false. With this behavior, landlock code will be like: /* Assume we hold reference on “path”. * With LOOKUP_RCU, path will not change, we don’t need * extra reference on “path”. */ err = vfs_walk_ancestors(path, ll_cb, data, LOOKUP_RCU); /* * At this point, whether err is 0 or not, path is not * changed. */ if (err == -ECHILD) { struct path walk_path = *path; /* reset any data changed by the walk */ reset_data(data); /* get a reference on walk_path. */ path_get(&walk_path); err = vfs_walk_ancestors(&walk_path, ll_cb, data, 0); /* Now, walk_path might be updated */ /* Always release reference on walk_path */ path_put(&walk_path); } BPF path iterator sode will look like: static bool bpf_cb(const struct path *ancestor, void *data) { return false; } struct path *bpf_iter_path_next(struct bpf_iter_path *it) { struct bpf_iter_path_kern *kit = (void *)it; if (vfs_walk_ancestors(&kit->path, bpf_cb, NULL)) return NULL; return &kit->path; } Does this sound reasonable to every body? Thanks, Song