On Thu, May 29, 2025 at 05:42:16PM -0700, Song Liu wrote: > On Thu, May 29, 2025 at 4:10 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > On Thu, May 29, 2025 at 03:13:10PM -0700, Song Liu wrote: > > > > > Is it an issue if we only hold a reference to a MNT_LOCKED mount for > > > short period of time? "Short period" means it may get interrupted, page > > > faults, or wait for an IO (read xattr), but it won't hold a reference to the > > > mount and sleep indefinitely. > > > > MNT_LOCKED mount itself is not a problem. What shouldn't be done is > > looking around in the mountpoint it covers. It depends upon the things > > you are going to do with that, but it's very easy to get an infoleak > > that way. > > > > > > OTOH, there's a good cause for moving some of the flags, MNT_LOCKED > > > > included, out of ->mnt_flags and into a separate field in struct mount. > > > > However, that would conflict with any code using that to deal with > > > > your iterator safely. > > > > > > > > What's more, AFAICS in case of a stack of mounts each covering the root > > > > of parent mount, you stop in each of those. The trouble is, umount(2) > > > > propagation logics assumes that intermediate mounts can be pulled out of > > > > such stack without causing trouble. For pathname resolution that is > > > > true; it goes through the entire stack atomically wrt that stuff. > > > > For your API that's not the case; somebody who has no idea about an > > > > intermediate mount being there might get caught on it while it's getting > > > > pulled from the stack. > > > > > > > > What exactly do you need around the mountpoint crossing? > > > > > > I thought about skipping intermediate mounts (that are hidden by > > > other mounts). AFAICT, not skipping them will not cause any issue. > > > > It can. Suppose e.g. that /mnt gets propagation from another namespace, > > but not the other way round and you mount something on /mnt. > > > > Later, in that another namespace, somebody mounts something on wherever > > your /mnt gets propagation to. A copy will be propagated _between_ > > your /mnt and whatever you've mounted on top of it; it will be entirely > > invisible until you umount your /mnt. At that point the propagated > > copy will show up there, same as if it had appeared just after your > > umount. Prior to that it's entirely invisible. If its original > > counterpart in another namespace gets unmounted first, the copy will > > be quietly pulled out. > > Thanks for sharing this information! > > > Note that choose_mountpoint_rcu() callers (including choose_mountpoint()) > > will have mount_lock seqcount sampled before the traversal _and_ recheck > > it after having reached the bottom of stack. IOW, if you traverse .. > > on the way to root, you won't get caught on the sucker being pulled out. > > In some of our internal discussions, we talked about using > choose_mountpoint() instead of follow_up(). I didn't go that direction in this > version because it requires holding "root". But if it makes more sense > to use, choose_mountpoint(), we sure can hold "root". > > Alternatively, I think it is also OK to pass a zero'ed root to > choose_mountpoint(). > > > Your iterator, OTOH, would stop in that intermediate mount - and get > > an unpleasant surprise when it comes back to do the next step (towards > > /mnt on root filesystem, that is) and finds that path->mnt points > > to something that is detached from everything - no way to get from > > it any further. That - despite the fact that location you've started > > from is still mounted, still has the same pathname, etc. and nothing > > had been disrupted for it. > > > > And yes, landlock has a narrow race in the matching place. Needs to > > be fixed. At least it does ignore those as far as any decisions are > > concerned... Thanks for pointing this out. In the case of Landlock, walking to a disconnected mount point (because of this umount race condition) would deny the requested access whereas it may be allowed otherwise. This is not a security issue but still an issue because an event unrelated to the request (umount) can abort a path resolution, which should not be the case. Without access to mount_lock, what would be the best way to fix this Landlock issue while making it backportable? > > If we update path_parent in this patchset with choose_mountpoint(), > and use it in Landlock, we will close this race condition, right? choose_mountpoint() is currently private, but if we add a new filesystem helper, I think the right approach would be to expose follow_dotdot(), updating its arguments with public types. This way the intermediates mount points will not be exposed, RCU optimization will be leveraged, and usage of this new helper will be simplified. > > > > > Note, BTW, that it might be better off by doing that similar to > > d_path.c - without arseloads of dget_parent/dput et.al.; not sure > > how feasible it is, but if everything in it can be done under > > rcu_read_lock(), that's something to look into. > > I don't think we can do everything here inside rcu_read_lock(). > But d_path.c does have some code we can probably reuse or > learn from. Also, we probably need two variations of iterators, > one walk until absolute root, while the other walk until root of > current->fs, just like d_path() vs. d_absolute_path(). Does this > sound reasonable? Passing the root to a public follow_dotdot() helper should do the job. > > Thanks, > Song >