This patches are still under development. In particular some proper documentation is needed. They are sufficient to demonstrate my design. They add an alternate mechanism for providing the locking that the VFS needs for directory operations. This includes: - only one operation per name at a time - no operations in a directory being removed - no concurrent cross-directory renames which might result in an ancestor loop I had originally hoped to push the locking of i_rw_sem down into the filesystems and have the new locking on top of that. This turned out to be impractical. This series leave the i_rw_sem locking where it is, introduces new locking that happens while the directory is locked, and gives the filesystem the option of disabling (most of) the i_rw_sem locking. Once all filesystems are converted the i_rw_sem locking can be removed. Shared lock on i_rw_sem is still used for readdir and simple lookup, to exclude it while rmdir is happening. The problem with pushing i_rw_sem down is that I still want to use it to exclude readdir while rmdir is happening. Some readdir implementations use the result to prime the dcache which means creating d_in_lookup() dentries in the directory. If we can do this while holding i_rw_sem, then it is not safe to take i_rw_sem while holding a d_in_lookup() dentry. So i_rw_sem CANNOT be taken after a lookup has been performed - it must be before, or never. Another issue is that after taking i_rw_sem in rmdir() I need to wait for any dentries that are still locked. Waiting for the dentry lock while holding i_rw_sem means we cannot take i_rw_sem after getting a dentry lock. So we take i_rw_sem for filesystems that still require it (initially all) but still do the other locking which will be uncontended. This exercises the code to help ensure it is ready when we remove the i_rw_sem requirement for any given filesystem. The central feature is a per-dentry lock implemented with a couple of d_flags and wait_var_event/wake_up_var. A single thread can take 1, sometimes 2, occasionally 3 locks on different dentries. A second lock is needed for rename - we lock the two dentries in address-order after confirming there is no hierarchical relationship. It is also needed for silly-rename as part of unlink. In this case the plan is for the second dentry to always be a d_in_lookup dentry so the lock is guaranteed to be uncontented. I'm not sure I got that finished yet. The three-dentry case is a rename which results in a silly-rename of the target. For rmdir we introduce S_DYING so that marking a directory a S_DEAD is two-stage. We mark is S_DYING which will prevent more dentry locks being taken, then we wait for the locks that were already taken, then set S_DEAD. For rename ... maybe just read the patch. I tried to explain it thoroughly. The goal is to perform create/remove/rename without any mutex/semaphore held by the VFS. This will allow concurrent operations in a directory and prepare the way for async operation so that e.g. io_uring could be given a list of many names in a directory to unlink and it could unlink them in parallel. We probably need to make changes to the locking on the inode being removed before this can be fully achieved - I haven't explored that in detail yet. Thanks, NeilBrown