Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()

Tingmao Wang <m@xxxxxxxxxx> · Wed, 11 Jun 2025 18:50:24 +0100

On 6/11/25 17:31, Song Liu wrote:
> On Wed, Jun 11, 2025 at 8:42 AM Mickaël Salaün <mic@xxxxxxxxxxx> wrote:
> [...]
>>> We can probably call this __path_walk_parent() and make it static.
>>>
>>> Then we can add an exported path_walk_parent() that calls
>>> __path_walk_parent() and adds extra logic.
>>>
>>> If this looks good to folks, I can draft v4 based on this idea.
>>
>> This looks good but it would be better if we could also do a full path
>> walk within RCU when possible.
> 
> I think we will need some callback mechanism for this. Something like:
> 
> for_each_parents(starting_path, root, callback_fn, cb_data, bool try_rcu) {
>    if (!try_rcu)
>       goto ref_walk;
> 
>    __read_seqcount_begin();
>     /* rcu walk parents, from starting_path until root */
>    walk_rcu(starting_path, root, path) {
>     callback_fn(path, cb_data);
>   }
>   if (!read_seqcount_retry())
>     return xxx;  /* successful rcu walk */
> 
> ref_walk:
>   /* ref walk parents, from starting_path until root */
>    walk(starting_path, root, path) {
>     callback_fn(path, cb_data);
>   }
>   return xxx;
> }
> 
> Personally, I don't like this version very much, because the callback
> mechanism is not very flexible, and it is tricky to use it in BPF LSM.

Aside from the "exposing mount seqcounts" problem, what do you think about
the parent_iterator approach I suggested earlier?  I feel that it is
better than such a callback - more flexible, and also fits in right with
the BPF API you already designed (i.e. with a callback you might then have
to allow BPF to pass a callback?).  There are some specifics that I can
improve - Mickaël suggested some in our discussion:

- Letting the caller take rcu_read_lock outside rather than doing it in
path_walk_parent_start

- Instead of always requiring a struct parent_iterator, allow passing in
NULL for the iterator to path_walk_parent to do a reference walk without
needing to call path_walk_parent_start - this way might be simpler and
path_walk_parent_start/end can just be for rcu case.

but what do you think about the overall shape of it?

And while it is technically doing two separate things (rcu walk and
reference walk), so is this callback to some extent.  The pro of callback
however is that the retry on ref walk failure is automatic, but the user
still has to be aware and detect such cases.  For example, landlock needs
to re-initialize the layer masks previously collected if parent walk is
restarting.

(and of course, this also hides the seqcounts from non VFS code, but I'm
wondering if there are other ways to make the seqcounts in the
parent_iterator struct private, if that is the only issue with it?)

Also, if the common logic with follow_dotdot is extracted out to
__path_walk_parent, path_walk_parent might just defer to that for the
non-rcu case, and so the complexity of that function is further reduced.

> 
> Thanks,
> Song