Most of this pile is basically an attempt to see how well do cleanup.h-style mechanisms apply in mount handling. That stuff lives in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount Rebased to -rc3 (used to be a bit past -rc2, branched at mount fixes merge) Individual patches in followups. Please, help with review and testing. It seems to survive the local beating and code generation seems to be OK, but more testing would be a good thing and I would really like to see comments on that stuff. This is not all I've got around mount handling, but I'd rather get that thing out for review before starting to sort out other local mount-related branches. Series overview: Part 1: guards. This part starts with infrastructure, followed by one-by-one conversions to the guard/scoped_guard in some of the places that fit that well enough. Note that one of those places turned out to be taking mount_lock for no reason whatsoever; I already see places where we do write_seqlock when read_seqlock_excl would suffice, etc. Folks, _please_ don't do any bulk conversions in that area. IMO one area where RAII becomes dangerous is locking; usually it's not a big deal to delay freeing some object a bit, but delay dropping a lock and you risk introducing deadlocks that will be bloody hard to spot. It _has_ to be done carefully; we had trouble in that area several times over the last year or so in fs/namespace.c alone. Another fun problem is that quite a few comments regarding the locking in there are stale. We still have the comments that talk about mount lock as if it had been an rwlock-like thing. It hadn't been that for more than a decade now. It needs to be documented sanely; so do the access rules to the data structures involved. I hope to get some of that into the tree this cycle, but it's still in progress. 1/52) fs/namespace.c: fix the namespace_sem guard mess New guards: namespace_excl and namespace_shared. The former implies the latter, as for anything rwsem-like. No inode locks, no dropping the final references, no opening files, etc. in scope of those. 2/52) introduced guards for mount_lock New guards: mount_writer, mount_locked_reader. That's write_seqlock and read_seqlock_excl on mount_lock; obviously, nothing blocking should be done in scope of those. 3/52) fs/namespace.c: allow to drop vfsmount references via __free(mntput) Missing DEFINE_FREE (for mntput()); local in fs/namespace.c, to be used only for keeping shit out of namespace_... and mount_... scopes. 4/52) __detach_mounts(): use guards 5/52) __is_local_mountpoint(): use guards 6/52) do_change_type(): use guards 7/52) do_set_group(): use guards 8/52) mark_mounts_for_expiry(): use guards 9/52) put_mnt_ns(): use guards 10/52) mnt_already_visible(): use guards a bunch of clear-cut conversions, with explanations of the reasons why this or that guard is needed. 11/52) check_for_nsfs_mounts(): no need to take locks ... and here we have one where it turns out that locking had been excessive. Iterating through a subtree in mount_locked_reader scope is safe, all right, but (1) mount_writer is not needed here at all and (2) namespace_shared + a reference held to the root of subtree is also enough. All callers had (2) already. Documented the locking requirements for function, removed {,un}lock_mount_hash() in it... 12/52) propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() This one is interesting - existing code had been equivalent to scoped_guard(mount_locked_reader), and it's right for that call. However, mnt_set_mountpoint() generally requires mount_writer - the only reason we get away with that here is that the mount in question never had been reachable from the mounts visible to other threads. 13/52) has_locked_children(): use guards 14/52) mnt_set_expiry(): use guards 15/52) path_is_under(): use guards more clear-cut conversions with explanations. 16/52) current_chrooted(): don't bother with follow_down_one() 17/52) current_chrooted(): use guards this pair might be better off with #16 taken to the beginning of the series (or to a separate branch merge into this one); no better reason to do as I had than wanting to keep the guard infrastructure in the very beginning. Part 2: turning unlock_mount() into __cleanup. Environment for mounting something on given location consists of: 1) namespace_excl scope 2) parent mount - the one we'll be attaching things to. 3) mountpoint to be, protected from disappearing under us. 4) inode of that mountpoint's dentry held exclusive. Unfortunately, we can't take inode locks in namespace_excl scopes. And we want to cope with the possibility that somebody has managed to mount something on that place while we'd been taking locks. "Cope" part is simple for finish_automount() ("drop our mount and go away quietly; somebody triggered it before we did"), but for everything else it's trickier - "use whatever's overmounting that place now (with the right locks, please)". lock_mount() does all of that (do_lock_mount(), actually), with unlock_mount() closing the scope. And it's definitely a good candidate for __cleanup()-based approach, except that * the damn thing can return an error and conditional variants of that infrastructure are too revolting. * parent mount is returned in a fucking awful way - we modify the struct path passed to us as location to mount on and then its ->mnt is the parent to be... except for the "beneath" variant where we play convoluted games with "no, here we want the parent of that". Implementation is also vulnerable to umount propagtion races. * the structure we set up (everything except the parent) is inserted into a linked list by lock_mount(). That excludes DEFINE_CLASS() - it wants the value formed and then copied to the variable we are defining. * it contains an implicit namespace_excl scope, so path_put() and its ilk *must* be done after the unlock_mount(). And most of the users have gotos past that. The first two problems are solved by adding an explicit pointer to parent mount into struct pinned_mountpoint. Having lock_mount() failure reported by setting it to ERR_PTR(-E...) allows to avoid the problem with expressing the constructor failure. The third one is dealt with by defining local macros to be used instead of CLASS - I went with LOCK_MOUNT(mp, path) which defines struct pinned_mountpoint mp with __cleanup(unlock_mount) and sets it up. If anybody has better suggestions, I'll be glad to hear those. The last one is dealt with by massaging the users to form that would have all post-unlock_mount() stuff done by __free(). First, several trivial cleanups: 18/52) do_move_mount(): trim local variables 19/52) do_move_mount(): deal with the checks on old_path early 20/52) move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() 21/52) finish_automount(): simplify the ELOOP check Getting rid of post-unlock_mount() stuff: 22/52) do_loopback(): use __free(path_put) to deal with old_path 23/52) pivot_root(2): use __free() to deal with struct path in it 24/52) finish_automount(): take the lock_mount() analogue into a helper this one turns the open-coded logics into lock_mount_exact() with the same kind of calling conventions as lock_mount() and do_lock_mount() 25/52) do_new_mount_rc(): use __free() to deal with dropping mnt on failure 26/52) finish_automount(): use __free() to deal with dropping mnt on failure This is the main part: 27/52) change calling conventions for lock_mount() et.al. Followups, cleaning up the games with parent mount in the user: 28/52) do_move_mount(): use the parent mount returned by do_lock_mount() 29/52) do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path 30/52) graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Part 3: getting rid of mutating struct path there. do_lock_mount() is still playing silly buggers with struct path it had been given - the logics in that thing hadn't changed. It's not a pretty function and it's racy as well; the thing is, by this point its users have almost no use for the changed contents of struct path - dentry can be derived from struct mountpoint, parent mount to use is provided directly and we want that a lot more than modified path->mnt. There's only one place (in can_move_mount_beneath()) where we still want that and it's not hard to reconstruct the value by *original* path->mnt value + parent mount to be used. Getting rid of ->dentry uses. 31/52) pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry 32/52) don't bother passing new_path->dentry to can_move_mount_beneath() A helper, already open-coded in a couple of places; carved out of the next patch to keep it reasonably small 33/52) new helper: topmost_overmount() Rewrite of do_lock_mount() to keep path constant + trivial change in do_move_mount() to adjust the argument it passes to can_move_mount_beneath(): 34/52) do_lock_mount(): don't modify path. Part 5: a bunch of trivial cleanups (mostly constifications) 35/52) constify check_mnt() 36/52) do_mount_setattr(): constify path argument 37/52) do_set_group(): constify path arguments 38/52) drop_collected_paths(): constify arguments 39/52) collect_paths(): constify the return value 40/52) do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) 41/52) mnt_warn_timestamp_expiry(): constify struct path argument 42/52) do_new_mount{,_fc}(): constify struct path argument 43/52) do_{loopback,change_type,remount,reconfigure_mnt}(): constify struct path argument 44/52) path_mount(): constify struct path argument 45/52) may_copy_tree(), __do_loopback(): constify struct path argument 46/52) path_umount(): constify struct path argument 47/52) constify can_move_mount_beneath() arguments 48/52) do_move_mount_old(): use __free(path_put) 49/52) do_mount(): use __free(path_put) Part 6: assorted stuff, will grow. 50/52) umount_tree(): take all victims out of propagation graph at once [had been earlier] For each removed mount we need to calculate where the slaves will end up. To avoid duplicating that work, do it for all mounts to be removed at once, taking the mounts themselves out of propagation graph as we go, then do all transfers; the duplicate work on finding destinations is avoided since if we run into a mount that already had destination found, we don't need to trace the rest of the way. That's guaranteed O(removed mounts) for finding destinations and removing from propagation graph and O(surviving mounts that have master removed) for transfers. 51/52) ecryptfs: get rid of pointless mount references in ecryptfs dentries ->lower_path.mnt has the same value for all dentries on given ecryptfs instance and if somebody goes for mountpoint-crossing variant where that would not be true, we can deal with that when it happens (and _not_ with duplicating these reference into each dentry). As it is, we are better off just sticking a reference into ecryptfs-private part of superblock and keeping it pinned until ->kill_sb(). That way we can stick a reference to underlying dentry right into ->d_fsdata of ecryptfs one, getting rid of indirection through struct ecryptfs_dentry_info, along with the entire struct ecryptfs_dentry_info machinery. 52/52) fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Comments regarding "shadow mounts" were stale - no such thing anymore. Document the locking requirements for __lookup_mnt()... FWIW, the current diffstat: fs/ecryptfs/dentry.c | 14 +- fs/ecryptfs/ecryptfs_kernel.h | 27 +- fs/ecryptfs/file.c | 15 +- fs/ecryptfs/inode.c | 19 +- fs/ecryptfs/main.c | 24 +- fs/internal.h | 4 +- fs/mount.h | 12 + fs/namespace.c | 775 +++++++++++++++++++----------------------- fs/pnode.c | 75 ++-- fs/pnode.h | 1 + include/linux/mount.h | 4 +- kernel/audit_tree.c | 12 +- 12 files changed, 464 insertions(+), 518 deletions(-)