On 6/11/25 17:31, Song Liu wrote: > On Wed, Jun 11, 2025 at 8:42 AM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > [...] >>> We can probably call this __path_walk_parent() and make it static. >>> >>> Then we can add an exported path_walk_parent() that calls >>> __path_walk_parent() and adds extra logic. >>> >>> If this looks good to folks, I can draft v4 based on this idea. >> >> This looks good but it would be better if we could also do a full path >> walk within RCU when possible. > > I think we will need some callback mechanism for this. Something like: > > for_each_parents(starting_path, root, callback_fn, cb_data, bool try_rcu) { > if (!try_rcu) > goto ref_walk; > > __read_seqcount_begin(); > /* rcu walk parents, from starting_path until root */ > walk_rcu(starting_path, root, path) { > callback_fn(path, cb_data); > } > if (!read_seqcount_retry()) > return xxx; /* successful rcu walk */ > > ref_walk: > /* ref walk parents, from starting_path until root */ > walk(starting_path, root, path) { > callback_fn(path, cb_data); > } > return xxx; > } > > Personally, I don't like this version very much, because the callback > mechanism is not very flexible, and it is tricky to use it in BPF LSM. Aside from the "exposing mount seqcounts" problem, what do you think about the parent_iterator approach I suggested earlier? I feel that it is better than such a callback - more flexible, and also fits in right with the BPF API you already designed (i.e. with a callback you might then have to allow BPF to pass a callback?). There are some specifics that I can improve - Mickaël suggested some in our discussion: - Letting the caller take rcu_read_lock outside rather than doing it in path_walk_parent_start - Instead of always requiring a struct parent_iterator, allow passing in NULL for the iterator to path_walk_parent to do a reference walk without needing to call path_walk_parent_start - this way might be simpler and path_walk_parent_start/end can just be for rcu case. but what do you think about the overall shape of it? And while it is technically doing two separate things (rcu walk and reference walk), so is this callback to some extent. The pro of callback however is that the retry on ref walk failure is automatic, but the user still has to be aware and detect such cases. For example, landlock needs to re-initialize the layer masks previously collected if parent walk is restarting. (and of course, this also hides the seqcounts from non VFS code, but I'm wondering if there are other ways to make the seqcounts in the parent_iterator struct private, if that is the only issue with it?) Also, if the common logic with follow_dotdot is extracted out to __path_walk_parent, path_walk_parent might just defer to that for the non-rcu case, and so the complexity of that function is further reduced. > > Thanks, > Song