Re: [BUG][RFC] broken use of __lookup_mnt() in path_overmounted()

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Sun, 1 Jun 2025 18:40:24 +0100

On Sat, May 31, 2025 at 09:57:00PM +0100, Al Viro wrote:

> One possibility is to wrap the use of __lookup_mnt() into a sample-and-recheck
> loop there; for the call of path_overmounted() in finish_automount() it'll
> give the right behaviour.

OK, that's definitely the right thing to do, whatever we end up doing
with checks in do_move_mount().

So the rules become:
	Mount hash lookup (__lookup_mnt()) requires mount_lock - either
holding its spinlock component, or seqretry on its seqcount component.
	If we are not holding the spinlock side of mount_lock, we must
be under rcu_read_lock() at least for the duration of lookup.
	Result is safe to dereference as long as
		1) mount_lock is still held or
		2) rcu_read_lock() is still held or
		3) namespace_sem had been held since before the lookup *AND*
parent's refcount remains positive.  This covers only the continued safety
of access to the result of lookup; we still must've satisfied the rules
above for the lookup itself.
	Acquiring a reference to result in cases (1) and (3) is safe; in case (2)
it must be done with __legitimize_mnt(result, seq), with seq being a value of
mount_lock seqcount component sampled *BEFORE* the lookup.

That's pretty close to the rules for the rest of mount tree walking...

Complications wrt namespace_sem come from dissolving of lazy-umounted
trees; stuck children get detached when parent's refcount drops to zero.
That happens outside of namespace_sem and I don't see any sane way to
change that.