On Thu, 10 Jul 2025, Song Liu wrote: > > > > On Jul 9, 2025, at 9:06 AM, Mickaël Salaün <mic@xxxxxxxxxxx> wrote:\ > > [...] > > >> If necessary, we hide “root" inside @data. This is good. > >> > >>> @path would be updated with latest ancestor path (e.g. @root). > >> > >> Update @path to the last ancestor and hold proper references. > >> I missed this part earlier. With this feature, vfs_walk_ancestors > >> should work usable with open-codeed bpf path iterator. > >> > >> I have a question about this behavior with RCU walk. IIUC, RCU > >> walk does not hold reference to @ancestor when calling walk_cb(). > > > > I think a reference to the mount should be held, but not necessarily to > > the dentry if we are still in the same mount as the original path. > > If we update @path and do path_put() after the walk, we have to hold > reference to both the mnt and the dentry, no? > > > > >> If walk_cb() returns false, shall vfs_walk_ancestors() then > >> grab a reference on @ancestor? This feels a bit weird to me. > > > > If walk_cb() checks for a root, it will return false when the path will > > match, and the caller would expect to get this root path, right? > > If the user want to walk to the global root, walk_cb() may not > return false at all, IIUC. walk_cb() may also return false on > other conditions. > > > > > In general, it's safer to always have the same behavior when holding or > > releasing a reference. I think the caller should then always call > > path_put() after vfs_walk_ancestors() whatever the return code is. > > > >> Maybe “updating @path to the last ancestor” should only apply to > >> LOOKUP_RCU==false case? > >> > >>> @flags could contain LOOKUP_RCU or not, which enables us to have > >>> walk_cb() not-RCU compatible. > >>> > >>> When passing LOOKUP_RCU, if the first call to vfs_walk_ancestors() > >>> failed with -ECHILD, the caller can restart the walk by calling > >>> vfs_walk_ancestors() again but without LOOKUP_RCU. > >> > >> > >> Given we want callers to handle -ECHILD and call vfs_walk_ancestors > >> again without LOOKUP_RCU, I think we should keep @path not changed > >> With LOOKUP_RCU==true, and only update it to the last ancestor > >> when LOOKUP_RCU==false. > > > > As Neil said, we don't want to explicitly pass LOOKUP_RCU as a public > > flag. Instead, walk_cb() should never sleep (and then potentially be > > called under RCU by the vfs_walk_ancestors() implementation). > > How should the user handle -ECHILD without LOOKUP_RCU flag? Say the > following code in landlocked: > > /* Try RCU walk first */ > err = vfs_walk_ancestors(path, ll_cb, data, LOOKUP_RCU); > > if (err == -ECHILD) { > struct path walk_path = *path; > > /* reset any data changed by the walk */ > reset_data(data); > > /* now do ref-walk */ > err = vfs_walk_ancestors(&walk_path, ll_cb, data, 0); > } > > Or do you mean vfs_walk_ancestors will never return -ECHILD? > Then we need vfs_walk_ancestors to call reset_data logic, right? It isn't clear to me that vfs_walk_ancestors() needs to return anything. All the communication happens through walk_cb() walk_cb() is called with a path, the data, and a "may_sleep" flag. If it needs to sleep but may_sleep is not set, it returns "-ECHILD" which causes the walk to restart and use refcounts. If it wants to stop, it returns 0. If it wants to continue, it returns 1. If it wants a reference to the path then it can use (new) vfs_legitimize_path() which might fail. If it wants a reference to the path and may_sleep is true, it can use path_get() which won't fail. When returning -ECHILD (either because of a need to sleep or because vfs_legitimize_path() fails), walk_cb() would reset_data(). NeilBrown