On Wed, 11 Jun 2025, Al Viro wrote: > On Wed, Jun 11, 2025 at 11:00:06AM +1000, NeilBrown wrote: > > > Yes. > > > > > > > > Where does your dentry lock nest wrt ->i_rwsem? As a bonus (well, malus, I guess) > > > question, where does it nest wrt parent *and* child inodes' ->i_rwsem for rmdir > > > and rename? > > > > Between inode of parent of the dentry and inode of the dentry. > > That's... going to be fun to prove the deadlock avoidance. > Looking forward to such proof... I do hope to write something along those lines, but I'd rather do it after getting feedback on the design. It's already changed several times and while I have growing confidence that I understand the key issues there is still room for change as discussed below. > > Look, the reason why I'm sceptical is that we had quite a few interesting > problems with directory locking schemes; fun scenarios are easy to > miss and I've fucked up more than a few times in that area. Fixing it > afterwards can be a real bitch, especially if we get filesystem-specific > parts in the picture. That's part of why I like bringing all the locking into namei.c using lookup_and_lock() etc. Having everything in one place won't make it easy but might make it more tractable. > > So let's sort it out _before_ we go there. And I mean proof - verifiable > statements about the functions, etc. I was hoping to get some of the refactoring - which I think it useful in any case - in before needing a complete tidy solution. Partly this is because I thought the review discussion of the locking would be more effective when the code was clean and centralised.... > > Incidentally, what was the problem with having dentry locked before > the parent? At least that way we would have a reasonable lock ordering... > It would require some preliminary work, but last time I looked at the > area (not very deeply) it felt like a plausible direction... I wonder > which obstacle have I missed... > If we are to put the parent locking after the dentry locking it must also be after the d_alloc_parallel() locking. As some readdir implementations (quite sensibly) use d_alloc_parallel() to prime the dcache with readdir results, we would not be able to hold the parent locked across readdir. So we would need some other exclusion with rmdir. Put another way: if we want to push the parent locking down into the filesystem for anything, we really need to do it for everything. For rmdir we would need the "set S_DYING and wait for locked dentries" to happen before taking the lock, and we cannot use parent lock for readdir. Maybe we could do that, but we can't use spare d_flags or i_flags for the readdir locking - we need a counter. Maybe i_writecount or i_dio_count could provide the space we need as they aren't used on directories. I thought that going down that path added more complexity than I wanted, but it does have the advantage of making the purpose and scope of the different locks more explicit. ... or maybe we could change those readdir routines to do a try-lock. i.e. have a d_alloc_parallel() variant which returned -EWOULDBLOCK rather than waiting for the dentry to be ready. Maybe that is a sensible approach anyway... Another (unrelated) option that I've considered but not yet explored is using the same locking scheme for in_lookup dentries and for the targets of operations. There would be just two DCACHE flags (locked, and waiter-exists) and the in_lookup dentries would be recognised by having d_inode being $RESERVED_ADDRESS. Then dentries would always be created locked and then unlocked when they are ready (like inodes). I think I like this, but I also wonder if it is worth the churn. NeilBrown