Updated variant (-rc4-based) force-pushed to git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount individual patches in followups. It seems to survive testing here, but more testing and review would be very welcome. Again, that is not all - there's more stuff coming... Folks, please review - if nobody objects, it goes into #for-next in a day or two. Changes since v2: Fixes went into mainline. Added change_mnt_propagation() stuff: cleanups and getting rid of potentially O(N^2) work in umount() - when a long slave list gets moved from one doomed mount to another, with O(list length) work on each move. In the same area, mnt_slave_list/mnt_slave turned into hlist. Added propagate_mnt() series - refactoring instead of brute-force "pass a structure around instead of playing with globals". Added a few ->mnt_group_id-related cleanups. New: ##32--44,46--48 Slight changes in #16 (Rewrite of propagate_umount()) and #30 (mount: separate the flags accessed only under namespace_sem). Rough overview: Part 1: getting rid of mount hash conflicts for good 1) attach_mnt(): expand in attach_recursive_mnt(), then lose the flag argument 2) get rid of mnt_set_mountpoint_beneath() 3) prevent mount hash conflicts Part 2: trivial cleanups and helpers: 4) copy_tree(): don't set ->mnt_mountpoint on the root of copy 5) constify mnt_has_parent() 6) pnode: lift peers() into pnode.h 7) new predicate: mount_is_ancestor() 8) constify is_local_mountpoint() 9) new predicate: anon_ns_root(mount) 10) dissolve_on_fput(): use anon_ns_root() 11) __attach_mnt(): lose the second argument 12) don't set MNT_LOCKED on parentless mounts 13) clone_mnt(): simplify the propagation-related logics 14) do_umount(): simplify the "is it still mounted" checks Part 3: (somewhat of a side story) restore the machinery for long-term mounts from accumulated bitrot. 15) sanitize handling of long-term internal mounts Still unchanged; might end up moved on top of #work.fs_context with its change of vfs_fs_parse_string() calling conventions. Part 4: propagate_umount() rewrite (posted last cycle) 16) Rewrite of propagate_umount() Part 5: untangling do_move_mount()/attach_recursive_mnt(). 17) make commit_tree() usable in same-namespace move case 18) attach_recursive_mnt(): unify the mnt_change_mountpoint() logics 19) attach_recursive_mnt(): pass destination mount in all cases 20) attach_recursive_mnt(): get rid of flags entirely 21) do_move_mount(): take dropping the old mountpoint into attach_recursive_mnt() 22) do_move_mount(): get rid of 'attached' flag Part 6: change locking for expiry lists. 23) attach_recursive_mnt(): remove from expiry list on move 24) take ->mnt_expire handling under mount_lock [read_seqlock_excl] Part 7: struct mountpoint massage. 25) pivot_root(): reorder tree surgeries, collapse unhash_mnt() and put_mountpoint() 26) combine __put_mountpoint() with unhash_mnt() 27) get rid of mountpoint->m_count Part 8: regularize mount refcounting a bit 28) don't have mounts pin their parents Part 9: propagate_mnt() massage 29) mount: separate the flags accessed only under namespace_sem 30) propagate_one(): get rid of dest_master 31) propagate_mnt(): handle all peer groups in the same loop 32) propagate_one(): separate the "do we need secondary here?" logics 33) propagate_one(): separate the "what should be the master for this copy" part 34) propagate_one(): fold into the sole caller 35) fs/pnode.c: get rid of globals 36) propagate_mnt(): get rid of last_dest 37) propagate_mnt(): fix comment and convert to kernel-doc, while we are at it Part 10: change_mnt_propagation() massage 38) change_mnt_propagation() cleanups, step 1 39) change_mnt_propagation(): do_make_slave() is a no-op unless IS_MNT_SHARED() these two are preliminary massage, getting do_make_slave() into shape for 40) do_make_slave(): choose new master sanely ... getting rid of excessive work on umount(). The thing is, when mount stops propagating events (e.g. when it gets taken out), we need to transfer its slave list to its peer (if exists) or to its master. If there's neither, we need to dissolve that slave list. Each member of slave list needs at least to have ->mnt_master switched to new value. Unfortunately, if the chosen new master is itself getting taken out on the same umount(2), the entire thing needs to be repeated there, etc. and it doesn't take much to construct a situation when we have 2N mounts and umount(2) taking out half of them will end up moving the slave list (consisting of the other half) through all of those, resulting in N^2 reassignments of ->mnt_master alone. Not hard to avoid, we just need to figure out where the thing will settle and transfer it there from the very beginning. 41) turn do_make_slave() into transfer_propagation() cleanup, getting the things into convenient shape for... 42) mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node what it says on the can. 43) change_mnt_propagation(): move ->mnt_master assignment into MS_SLAVE case finishing touches on the cleanups series. Part 11: misc stuff, will grow... 44) copy_tree(): don't link the mounts via mnt_list 45) take freeing of emptied mnt_namespace to namespace_unlock() 46) get rid of CL_SHARE_TO_SLAVE 47) invent_group_ids(): zero ->mnt_group_id always implies !IS_MNT_SHARED() 48) statmount_mnt_basic(): simplify the logics for group id Diffstat: Documentation/filesystems/propagate_umount.txt | 484 +++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gemfs.c | 21 +- drivers/gpu/drm/v3d/v3d_gemfs.c | 21 +- fs/hugetlbfs/inode.c | 2 +- fs/mount.h | 40 +- fs/namespace.c | 711 ++++++++++--------------- fs/pnode.c | 697 ++++++++++++------------ fs/pnode.h | 27 +- include/linux/mount.h | 18 +- ipc/mqueue.c | 2 +- 10 files changed, 1216 insertions(+), 807 deletions(-) create mode 100644 Documentation/filesystems/propagate_umount.txt