On Wed, Jun 11, 2025 at 01:36:46PM +0200, Christian Brauner wrote: > On Mon, Jun 09, 2025 at 09:08:34AM +0100, Tingmao Wang wrote: > > On 6/9/25 07:23, Song Liu wrote: > > > On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@xxxxxxxxxx> wrote: > > > [...] > > >> Hi Song, Christian, Al and others, > > >> > > >> Previously I proposed in [1] to add ability to do a reference-less parent > > >> walk for Landlock. However, as Christian pointed out and I do agree in > > >> hindsight, it is not a good idea to do things like this in non-VFS code. > > >> > > >> However, I still think this is valuable to consider given the performance > > >> improvement, and after some discussion with Mickaël, I would like to > > >> propose extending Song's helper to support such usage. While I recognize > > >> that this patch series is already in its v3, and I do not want to delay it > > >> by too much, putting this proposal out now is still better than after this > > >> has merged, so that we may consider signature changes. > > >> > > >> I've created a proof-of-concept and did some brief testing. The > > >> performance improvement attained here is the same as in [1] (with a "git > > >> status" workload, median landlock overhead 35% -> 28%, median time in > > >> landlock decreases by 26.6%). > > >> > > >> If this idea is accepted, I'm happy to work on it further, split out this > > >> patch, update the comments and do more testing etc, potentially in > > >> collaboration with Song. > > >> > > >> An alternative to this is perhaps to add a new helper > > >> path_walk_parent_rcu, also living in namei.c, that will be used directly > > >> by Landlock. I'm happy to do it either way, but with some experimentation > > >> I personally think that the code in this patch is still clean enough, and > > >> can avoid some duplication. > > >> > > >> Patch title: path_walk_parent: support reference-less walk > > >> > > >> A later commit will update the BPF path iterator to use this. > > >> > > >> Signed-off-by: Tingmao Wang <m@xxxxxxxxxx> > > > [...] > > >> > > >> -bool path_walk_parent(struct path *path, const struct path *root); > > >> +struct parent_iterator { > > >> + struct path path; > > >> + struct path root; > > >> + bool rcu; > > >> + /* expected seq of path->dentry */ > > >> + unsigned next_seq; > > >> + unsigned m_seq, r_seq; > > > > > > Most of parent_iterator is not really used by reference walk. > > > So it is probably just separate the two APIs? > > > > I don't mind either way, but I feel like it might be nice to just have one > > style of APIs (i.e. an iterator with start / end / next vs just one > > function), even though this is not totally necessary for the ref-taking > > walk. After all, the BPF use case is iterator-based. This also means > > that the code at the user's side (mostly thinking of Landlock here) is > > slightly simpler. > > > > But I've not experimented with the other way. I'm open to both, and I'm > > happy to send a patch later for a separate API (in that case that would > > not depend on this and I might just start a new series). > > > > Would like to hear what VFS folks thinks of this first tho, and whether > > there's any preference in one or two APIs. > > I really dislike exposing the sequence number for mounts an for > dentries. That's just nonsense and a non-VFS low-level consumer of this > API has zero business caring about any of that. It's easy to > misunderstand, it's easy to abuse so that's not a good way of doing > this. It's the wrong API. > > > > > > > > > Also, is it ok to make m_seq and r_seq available out of fs/? > > No, it's not. > > > > > The struct is not intended to be used directly by code outside. Not sure > > That doesn't mean anything. It's simply the wrong API if it has to spill > so much of its bowels. > > > what is the standard way to do this but we can make it private by e.g. > > putting the seq values in another struct, if needed. Alternatively I > > think we can hide the entire struct behind an opaque pointer by doing the > > allocation ourselves. > > > > > > > >> +}; > > >> + > > >> +#define PATH_WALK_PARENT_UPDATED 0 > > >> +#define PATH_WALK_PARENT_ALREADY_ROOT -1 > > >> +#define PATH_WALK_PARENT_RETRY -2 > > >> + > > >> +void path_walk_parent_start(struct parent_iterator *pit, > > >> + const struct path *path, const struct path *root, > > >> + bool ref_less); > > >> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent); > > >> +int path_walk_parent_end(struct parent_iterator *pit); > > > > > > I think it is better to make this rcu walk a separate set of APIs. > > > IOW, we will have: > > > > > > int path_walk_parent(struct path *path, struct path *root); > > > > > > and > > > > > > void path_walk_parent_rcu_start(struct parent_iterator *pit, > > > const struct path *path, const struct path *root); > > > int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path > > > *next_parent); > > > int path_walk_parent_rcu_end(struct parent_iterator *pit); > > > > (replied above) > > Exposing two sets of different APIs for essentially the same things is > not going to happen. > > The VFS doesn't expose a rcu variant and a non-rcu variant for itself so > we are absolutely not going to do that for outside stuff. > > It always does the try RCU first, then try to continue the walk by > falling back to REF walk (e.g., via try_to_unlazy()). If that doesn't > work then let the caller know and require them to decide whether to > abort or redo everything in ref-walk. That's indeed what is done by choose_mountpoint() (relying on choose_mountpoint_rcu() when possible), but this proposal is about doing a full path walk (i.e. multiple calls to path_walk_parent) within RCU. > > There's zero need in that scheme for the caller to see any of the > internals of the VFS and that's what you should aim for. Yes, but how could we detect if a full path walk is invalid (because of a rename or mount change)?