Re: [PATCH 5/8] Introduce S_DYING which warns that S_DEAD might follow.

"NeilBrown" <neil@xxxxxxxxxx> · Wed, 11 Jun 2025 12:49:21 +1000

On Wed, 11 Jun 2025, Al Viro wrote:
> On Wed, Jun 11, 2025 at 11:00:06AM +1000, NeilBrown wrote:
> 
> > Yes.
> > 
> > > 
> > > Where does your dentry lock nest wrt ->i_rwsem?  As a bonus (well, malus, I guess)
> > > question, where does it nest wrt parent *and* child inodes' ->i_rwsem for rmdir
> > > and rename?
> > 
> > Between inode of parent of the dentry and inode of the dentry.
> 
> That's... going to be fun to prove the deadlock avoidance.
> Looking forward to such proof...

I do hope to write something along those lines, but I'd rather do it
after getting feedback on the design.  It's already changed several
times and while I have growing confidence that I understand the key
issues there is still room for change as discussed below.

> 
> Look, the reason why I'm sceptical is that we had quite a few interesting
> problems with directory locking schemes; fun scenarios are easy to
> miss and I've fucked up more than a few times in that area.  Fixing it
> afterwards can be a real bitch, especially if we get filesystem-specific
> parts in the picture.

That's part of why I like bringing all the locking into namei.c using
lookup_and_lock() etc.  Having everything in one place won't make it
easy but might make it more tractable.

> 
> So let's sort it out _before_ we go there.  And I mean proof - verifiable
> statements about the functions, etc.

I was hoping to get some of the refactoring - which I think it useful in
any case - in before needing a complete tidy solution.  Partly this is
because I thought the review discussion of the locking would be more
effective when the code was clean and centralised....

> 
> Incidentally, what was the problem with having dentry locked before
> the parent?  At least that way we would have a reasonable lock ordering...
> It would require some preliminary work, but last time I looked at the
> area (not very deeply) it felt like a plausible direction...  I wonder
> which obstacle have I missed...
> 

If we are to put the parent locking after the dentry locking it must
also be after the d_alloc_parallel() locking.  As some readdir
implementations (quite sensibly) use d_alloc_parallel() to prime the
dcache with readdir results, we would not be able to hold the parent
locked across readdir.  So we would need some other exclusion with
rmdir.

Put another way: if we want to push the parent locking down into the
filesystem for anything, we really need to do it for everything.
For rmdir we would need the "set S_DYING and wait for locked dentries"
to happen before taking the lock, and we cannot use parent lock for
readdir.

Maybe we could do that, but we can't use spare d_flags or i_flags for
the readdir locking - we need a counter.  Maybe i_writecount or
i_dio_count could provide the space we need as they aren't used on
directories.

I thought that going down that path added more complexity than I wanted,
but it does have the advantage of making the purpose and scope of the
different locks more explicit.

... or maybe we could change those readdir routines to do a try-lock.
i.e.  have a d_alloc_parallel() variant which returned -EWOULDBLOCK
rather than waiting for the dentry to be ready.  Maybe that is a
sensible approach anyway...

Another (unrelated) option that I've considered but not yet explored is
using the same locking scheme for in_lookup dentries and for the targets
of operations.  There would be just two DCACHE flags (locked, and
waiter-exists) and the in_lookup dentries would be recognised by having
d_inode being $RESERVED_ADDRESS.  Then dentries would always be created
locked and then unlocked when they are ready (like inodes).  I think I
like this, but I also wonder if it is worth the churn.

NeilBrown