Re: [RFC] {do_,}lock_mount() behaviour wrt races and move_mount(2) with empty to_path (was Re: [PATCH] fs/namespace.c: fix mountpath handling in do_lock_mount())

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 19 Aug 2025 11:40:14 +0200

On Mon, Aug 18, 2025 at 09:56:06PM +0100, Al Viro wrote:
> On Mon, Aug 18, 2025 at 09:14:28PM +0100, Al Viro wrote:
> 
> > Alternative would be to treat these races as "act as if we'd won and
> > the other guy had overmounted ours", i.e. *NOT* follow mounts.  Again,
> > for old syscalls that's fine - if another thread has raced with us and
> > mounted something on top of the place we want to mount on, it could just
> > as easily have come *after* we'd completed mount(2) and mounted their
> > stuff on top of ours.  If userland is not fine with such outcome, it needs
> > to provide serialization between the callers.  For move_mount(2)... again,
> > the only real question is empty to_path case.
> > 
> > Comments?
> 
> Thinking about it a bit more...  Unfortunately, there's another corner
> case: "." as mountpoint.  That would affect that old syscalls as well
> and I'm not sure that there's no userland code that relies upon the
> current behaviour.
> 
> Background: pathname resolution does *NOT* follow mounts on the starting
> point and it does not follow mounts after "."
> 
> ; mkdir /tmp/foo
> ; mount -t tmpfs none /tmp/foo
> ; cd /tmp/foo
> ; echo under > a
> ; cat /tmp/foo/a
> under
> ; mount -t tmpfs none /tmp/foo
> ; cat a
> under
> ; cat /tmp/foo/a
> cat: /tmp/foo/a: no such file or directory
> ; echo under > b
> ; cat b
> under
> ; cat /tmp/foo/b
> cat: /tmp/foo/b: no such file or directory
> ;
> 
> It's been a bad decision (if it can be called that - it's been more
> of an accident, AFAICT), but it's decades too late to change it.
> And interaction with mount is also fun: mount(2) *DOES* follow mounts
> on the end of any pathname, no matter what.  So in case when we are
> standing in an overmounted directory, ls . will show the contents of
> that directory, but mount <something> . will mount on top of whatever's
> mounted there.
> 
> So the alternative I've mentioned above would change the behaviour of
> old syscalls in a corner case that just might be actually used in userland
> code - including the scripts run at the boot time, of all things ;-/
> 
> IOW, it probably falls under "can't touch that, no matter how much we'd
> like to" ;-/  Pity, that...
> 
> That leaves the question of MOVE_MOUNT_BENEATH with empty pathname -
> do we want a variant that would say "slide precisely under the opened
> directory I gave you, no matter what might overmount it"?

Afaict, right now MOVE_MOUNT_BENEATH will take the overmount into
account even for "." just like mount(2) will lookup the topmost mount no
matter what. That is what userspace expects. I don't think we need a
variant where "." ignores overmounts for MOVE_MOUNT_BENEATH and really
not unless someone has a specific use-case for it. If it comes to that
we should probably add a new flag.

> 
> At the very least this corner case needs to be documented in move_mount(2)
> - behaviour of
> 	move_mount(_, _, dir_fd, "",
> 		   MOVE_MOUNT_T_EMPTY | MOVE_MOUNT_BENEATH)
> has two apriori reasonable variants ("slide right under the top of
> whatever pile there might be over dir_fd" and "slide right under dir_fd

Yes, that's what's intended and documented also what I wrote in my
commit messages and what the selftests should test for. I specifically
did not make it deviate from standard mount(2) behavior.

> itself, no matter what pile might be on top of that") and leaving it
> unspecified is not good, IMO...

Sure, Aleksa can pull that into his documentation patches.