On Mon, Jun 09, 2025 at 05:34:10PM +1000, NeilBrown wrote: > Once we support directory operations (e.g. create) without requiring the > parent to be locked, the current practice locking a directory while > processing rmdir() or similar will not be sufficient to wait for > operations to complete and to block further operations. > > This patch introduced a new inode flag S_DYING. It indicates that > a rmdir or similar is being processed and new directory operations must > not commence in the directory. They should not abort either as the > rmdir might fail - instead they should block. They can do this by > waiting for a lock on the inode. > > A new interface rmdir_lock() locks the inode, sets this flag, and waits > for any children with DCACHE_LOCK set to complete their operation, and > for any d_in_lookup() children to complete the lookup. It should be > called before attempted to delete the directory or set S_DEAD. Matching > rmdir_unlock() clears the flag and unlocks the inode. > > dentry_lock() and d_alloc_parallel() are changed to block while this > flag it set and to fail if the parent IS_DEADDIR(), though dentry_lock() > doesn't block for d_in_lookup() dentries. > diff --git a/fs/namei.c b/fs/namei.c > index 4ad76df21677..c590f25d0d49 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1770,8 +1770,11 @@ static bool __dentry_lock(struct dentry *dentry, > struct dentry *base, const struct qstr *last, > unsigned int subclass, int state) > { > + struct dentry *parent; > + struct inode *dir; > int err; > > +retry: > lock_acquire_exclusive(&dentry->dentry_map, subclass, 0, NULL, _THIS_IP_); > spin_lock(&dentry->d_lock); > err = wait_var_event_any_lock(&dentry->d_flags, > @@ -1782,10 +1785,43 @@ static bool __dentry_lock(struct dentry *dentry, > spin_unlock(&dentry->d_lock); > return false; > } > - > - dentry->d_flags |= DCACHE_LOCK; > + parent = dentry->d_parent; Why will it stay the parent? Matter of fact, why will it stay positive? > + dir = igrab(parent->d_inode); ... and not oops right here? > + lock_map_release(&dentry->dentry_map); > spin_unlock(&dentry->d_lock); > - return true; > + > + if (state == TASK_KILLABLE) { > + err = down_write_killable(&dir->i_rwsem); > + if (err) { > + iput(dir); > + return false; > + } > + } else > + inode_lock(dir); > + /* S_DYING much be clear now */ > + inode_unlock(dir); > + iput(dir); > + goto retry; OK, I'm really confused now. Is it allowed to call dentry_lock() while holding ->i_rwsem of the parent? Where does your dentry lock nest wrt ->i_rwsem? As a bonus (well, malus, I guess) question, where does it nest wrt parent *and* child inodes' ->i_rwsem for rmdir and rename? Tangentially connected question: which locks are held for ->unlink() in your scheme? You do need *something* on the victim inode to protect ->i_nlink modifications, and anything on dentries of victim or their parent directories is not going to give that.