Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 2 Jun 2025 11:30:13 +0200

On Fri, May 30, 2025 at 12:10:18AM +0100, Al Viro wrote:
> On Thu, May 29, 2025 at 03:13:10PM -0700, Song Liu wrote:
> 
> > Is it an issue if we only hold a reference to a MNT_LOCKED mount for
> > short period of time? "Short period" means it may get interrupted, page
> > faults, or wait for an IO (read xattr), but it won't hold a reference to the
> > mount and sleep indefinitely.
> 
> MNT_LOCKED mount itself is not a problem.  What shouldn't be done is
> looking around in the mountpoint it covers.  It depends upon the things
> you are going to do with that, but it's very easy to get an infoleak
> that way.
> 
> > > OTOH, there's a good cause for moving some of the flags, MNT_LOCKED
> > > included, out of ->mnt_flags and into a separate field in struct mount.
> > > However, that would conflict with any code using that to deal with
> > > your iterator safely.
> > >
> > > What's more, AFAICS in case of a stack of mounts each covering the root
> > > of parent mount, you stop in each of those.  The trouble is, umount(2)
> > > propagation logics assumes that intermediate mounts can be pulled out of
> > > such stack without causing trouble.  For pathname resolution that is
> > > true; it goes through the entire stack atomically wrt that stuff.
> > > For your API that's not the case; somebody who has no idea about an
> > > intermediate mount being there might get caught on it while it's getting
> > > pulled from the stack.
> > >
> > > What exactly do you need around the mountpoint crossing?
> > 
> > I thought about skipping intermediate mounts (that are hidden by
> > other mounts). AFAICT, not skipping them will not cause any issue.
> 
> It can.  Suppose e.g. that /mnt gets propagation from another namespace,
> but not the other way round and you mount something on /mnt.
> 
> Later, in that another namespace, somebody mounts something on wherever
> your /mnt gets propagation to.  A copy will be propagated _between_
> your /mnt and whatever you've mounted on top of it; it will be entirely
> invisible until you umount your /mnt.  At that point the propagated
> copy will show up there, same as if it had appeared just after your
> umount.  Prior to that it's entirely invisible.  If its original
> counterpart in another namespace gets unmounted first, the copy will
> be quietly pulled out.

Fwiw, I have explained these and similar issues at length multiple times.

> 
> Note that choose_mountpoint_rcu() callers (including choose_mountpoint())
> will have mount_lock seqcount sampled before the traversal _and_ recheck
> it after having reached the bottom of stack.  IOW, if you traverse ..
> on the way to root, you won't get caught on the sucker being pulled out.
> 
> Your iterator, OTOH, would stop in that intermediate mount - and get
> an unpleasant surprise when it comes back to do the next step (towards
> /mnt on root filesystem, that is) and finds that path->mnt points
> to something that is detached from everything - no way to get from
> it any further.  That - despite the fact that location you've started
> from is still mounted, still has the same pathname, etc. and nothing
> had been disrupted for it.

Same...