On Tue, Jun 03, 2025 at 09:47:04PM +0100, Al Viro wrote: > Currently we use ->mnt_flags for all kinds of things, including > the ones only fs/{namespace,pnode}.c care about. > > That wouldn't be a problem if not for the locking. ->mnt_flags is > protected by mount_lock. All writers MUST grab that. Having lockless > readers is unsurprising - after all, for something like noexec we want > the current state of mount, whatever it happens to be. If userland > remounts something noexec and that races with execve(2), there's nothing > to be done - it is a race, but not the kernel one. > > However, for a bunch of flags we rely upon the fact that all > changes in one of those are always done under namespace_sem (exclusive). > It's not a lockless read - we do depend upon namespace_sem (shared) being > sufficient to stabilize those particular bits. MNT_SHARED is one such. > Note that this flag is a part of propagation graph representation and > namespace_sem is what protects the entire thing, so setting or clearing > it under mount_lock alone would be a very bad idea. > > Since MNT_SHARED sits in ->mnt_flags, we have to take mount_lock to set or > clear it, leading to places like this: > if (IS_MNT_SHARED(from)) { > to->mnt_group_id = from->mnt_group_id; > list_add(&to->mnt_share, &from->mnt_share); > lock_mount_hash(); > set_mnt_shared(to); > unlock_mount_hash(); > } > Non-empty ->mnt_share on a mount without MNT_SHARED would be a hard bug; > these changes belong together. Incidentally, lock_mount_hash() is an > overkill here - read_seqlock_excl(&mount_lock) would be better... > > The rules would be easier to follow if we took MNT_SHARED, MNT_UNBINDABLE, > MNT_MARKED and possibly MNT_LOCKED and MNT_LOCK_* to separate field, protected > by namespace_sem alone (MNT_LOCKED - depending upon how the path_parent() mess > settles). Agreed. > > Yes, it's 4 bytes added into struct mount. However, this > int mnt_id; /* mount identifier, reused */ > u64 mnt_id_unique; /* mount ID unique until reboot */ > int mnt_group_id; /* peer group identifier */ > int mnt_expiry_mark; /* true if marked for expiry */ > is preceded and followed by a pointer, so we already have a gap there, > _and_ there are other pending changes that kill ->mnt_umounting. So the > size of struct mount won't grow even on 32bit and would actually go down > on 64bit. I'm not at all worried about the size in this case. > > Objections? The thing I really want is clear locking rules for that > stuff. Reduced contention on mount_lock and fewer increments of its > seqcount component wouldn't hurt either, but that's secondary... Seems good to me! > PS: despite being strictly namespace.c-internal, two flags must stay in > ->mnt_flags - MNT_DOOMED and MNT_SYNC_UMOUNT - __legitimize_mnt() needs Sure.