On Thu, 10 Jul 2025, Song Liu wrote: > > > > On Jul 9, 2025, at 3:14 PM, NeilBrown <neil@xxxxxxxxxx> wrote: > > > > On Tue, 08 Jul 2025, Song Liu wrote: > >> Hi Christian, > >> > >> Thanks for your comments! > >> > >>> On Jul 7, 2025, at 4:17 AM, Christian Brauner <brauner@xxxxxxxxxx> wrote: > >> > >> [...] > >> > >>>>> 3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback. > >>>> > >>>> I think that's fine. > >>> > >>> Ok, sorry for the delay but there's a lot of different things going on > >>> right now and this one isn't exactly an easy thing to solve. > >>> > >>> I mentioned this before and so did Neil: the lookup implementation > >>> supports two modes sleeping and non-sleeping. That api is abstracted > >>> away as heavily as possible by the VFS so that non-core code will not be > >>> exposed to it other than in exceptional circumstances and doesn't have > >>> to care about it. > >>> > >>> It is a conceptual dead-end to expose these two modes via separate APIs > >>> and leak this implementation detail into non-core code. It will not > >>> happen as far as I'm concerned. > >>> > >>> I very much understand the urge to get the refcount step-by-step thing > >>> merged asap. Everyone wants their APIs merged fast. And if it's > >>> reasonable to move fast we will (see the kernfs xattr thing). > >>> > >>> But here are two use-cases that ask for the same thing with different > >>> constraints that closely mirror our unified approach. Merging one > >>> quickly just to have something and then later bolting the other one on > >>> top, augmenting, or replacing, possible having to deprecate the old API > >>> is just objectively nuts. That's how we end up with a spaghetthi helper > >>> collection. We want as little helper fragmentation as possible. > >>> > >>> We need a unified API that serves both use-cases. I dislike > >>> callback-based APIs generally but we have precedent in the VFS for this > >>> for cases where the internal state handling is delicate enough that it > >>> should not be exposed (see __iterate_supers() which does exactly work > >>> like Neil suggested down to the flag argument itself I added). > >>> > >>> So I'm open to the callback solution. > >>> > >>> (Note for really absurd perf requirements you could even make it work > >>> with static calls I'm pretty sure.) > >> > >> I guess we will go with Mickaël’s idea: > >> > >>> int vfs_walk_ancestors(struct path *path, > >>> bool (*walk_cb)(const struct path *ancestor, void *data), > >>> void *data, int flags) > >>> > >>> The walk continue while walk_cb() returns true. walk_cb() can then > >>> check if @ancestor is equal to a @root, or other properties. The > >>> walk_cb() return value (if not bool) should not be returned by > >>> vfs_walk_ancestors() because a walk stop doesn't mean an error. > >> > >> If necessary, we hide “root" inside @data. This is good. > >> > >>> @path would be updated with latest ancestor path (e.g. @root). > >> > >> Update @path to the last ancestor and hold proper references. > >> I missed this part earlier. With this feature, vfs_walk_ancestors > >> should work usable with open-codeed bpf path iterator. > > > > I don't think path should be updated. That adds complexity which might > > not be needed. The original (landlock) requirements were only to look > > at each ancestor, not to take a reference to any of them. > > I think this is the ideal case that landlock wants in the long term. > But we may need to take references when the attempt fails. Also, > current landlock code takes reference at each step. Why may be need to? Yes, current landlock code takes references, but I don't think that is because it needs references, only because the API requires it to take references. > > > If the caller needs a reference to any of the ancestors I think that > > walk_cb() needs to take that reference and store it in data. > > Note that attempting to take the reference might fail. See > > legitimize_path() in fs/namei.c. > > > > It isn't yet clear to me what would be a good API for requesting the > > reference. > > One option would be for vfs_walk_ancestors() to pass another void* to > > walk_cb(), and it passed it on to vfs_legitimize_path() which extracts > > the seq numbers from there. > > Another might be that the path passed to walk_cb is always > > nameidata.path, and so when that is passed to vfs_legitimize_path() path > > it can use container_of() to find the seq numbers. > > Letting walk_cb() call vfs_legitimize_path() seems suboptimal to me. > I think the original goal is to have vfs_walk_ancestors() to: > 1. Try to walk the ancestors without taking any references; > 2. Detect when the not-taking-reference walk is not reliable; > 3. Maybe, retry the walk from beginning, but takes references on > each step. > > With walk_cb() calling vfs_legitimize_path(), we are moving #2 above > to walk_cb(). I think this is not what we want? I think you are looking at this the wrong way around. Focus on the needs for the caller, not on how you think it might be implemented. If the caller needs a reference, there should be a way for it to get a reference. This is quite separate from the choices vfs_walk_ancestors() makes about how it is going to walk the list of dentries. NeilBrown