Branch force-pushed into git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount (also visible as #v2.mount, #v1.mount being the previous version) Individual patches in followups. Still -rc3-based, seems to survive local beating. Please, help with review and testing. Note: no links in commits, I still don't understand what kind of use is expected in this situation. Changes since v1 (aside of reviewed-by applied): In #13, #14 and #15 scoped_guard replaced with guard. I don't like it, but I can live with it. Between old #18 and #19: do_new_mount_fc() switched to use of fc_mount(). vfs_get_tree() call moved from the caller into the function itself, unlock + vfs_create_mount() reordered to before the checks in there and collapsed with vfs_get_tree() into a call of fc_mount(). Cleanup aside, that avoids the difference between the lexical scope of mnt and the actual lifetime of that reference. Differs from the variant posted in https://lore.kernel.org/all/20250826182124.GV39973@ZenIV/ only by fixing an obvious braino - fetching fc->root->d_sb should be done after successful fc_mount(), not before it. That change modifies old #25 (now #26) "do_new_mount_rc(): use __free() to deal with dropping mnt on failure". Added to the end of queue: cleanup of populating a new namespace with a tree (open_detached_copy() and copy_mnt_ns()); both end up using guards, BTW. 5 commits, #54..#58 * open_detached_copy(): don't bother with mount_lock_hash() It's useless there right now - namespace_excl is quite enough. * open_detached_copy(): separate creation of namespace into helper Creation of namespace and opening that FMODE_NEED_UNMOUNT file are better off separated - cleaner that way. * mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Currently it (and free_mnt_ns()) can't be used with non-anon namespace before the insertion into mnt_ns_tree; very easy to make it work in such situation as well - in fact, the old "is it non-anonymous" check is not needed anymore. * copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Use the previous patch to avoid weird open-coding of free_mnt_ns(). * copy_mnt_ns(): use guards ... and __free(mntput) for rootmnt/pwdmnt. Added to the end of queue: handling of ->s_mounts/->mnt_instance and mnt_hold_writers(). Each mount is associated with the same dentry (sub)tree of the same filesystem through its entire lifetime. They are allocated empty, then (in the same function that had called allocator) attached to dentry tree and stay like that all the way to destructor (cleanup_mnt()). Unfortunately, as soon as they are attached to a tree, they become reachable from shared data structures - we maintain the set of all mounts associated with given superblock. Having to worry about that while we are still setting them up is inconvenient. Thankfully, the accesses via that set are *very* limited - only sb_prepare_remount_readonly() goes there and the only thing it does to a mount is setting/clearing MNT_WRITE_HOLD and checking the write count (guaranteed to be zero during setup, since there's nobody who could've asked for write access by that point). Turns out it's easy to take MNT_WRITE_HOLD out of ->mnt_flags and basically move it into the same thing that establishes linkage in per-superblock set of mounts. That makes accesses via that set isolated from the rest of struct mount; as far as we are concerned, this set is no longer a way to reach the mount from shared data structures and mount remains private to caller until it is explicitly made reachable (by mounting, attaching to overlayfs as a layer, etc.). FWIW, I think we should get rid of the "empty" state of struct mount and have allocator take the root dentry as additional argument. Hadn't done that yet; this series removes the need to delay attaching a partially set up mount to filesystem - we can do that from the very beginning now. 5 commits, #59..#63 * setup_mnt(): primitive for connecting a mount to filesystem Identical logics in clone_mnt() and vfs_create_mount() => common helper * preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Change the representation of set from list_head list to something equivalent to hlist one, with forward linkage going to the entire struct mount rather than embedded hlist_node. * struct mount: relocate MNT_WRITE_HOLD bit Steal the LSB of back links in the set representation to store it. We only traverse the list forwards and all changes are under mount_lock, same as for all mnt_hold_writers()/mnt_unhold_writers() pairs, so it's pretty uncomplicated. * simplify the callers of mnt_unhold_writers() * WRITE_HOLD machinery: no need for to bump mount_lock seqcount The last part is another group of "we only need mount_locked_reader" cases Diffstat: fs/ecryptfs/dentry.c | 14 +- fs/ecryptfs/ecryptfs_kernel.h | 27 +- fs/ecryptfs/file.c | 15 +- fs/ecryptfs/inode.c | 19 +- fs/ecryptfs/main.c | 24 +- fs/internal.h | 4 +- fs/mount.h | 16 +- fs/namespace.c | 989 +++++++++++++++++++----------------------- fs/pnode.c | 75 +++- fs/pnode.h | 1 + fs/super.c | 3 +- include/linux/fs.h | 2 +- include/linux/mount.h | 7 +- kernel/audit_tree.c | 12 +- 14 files changed, 573 insertions(+), 635 deletions(-)