On Mon, Sep 08, 2025 at 04:25:26PM +1000, NeilBrown wrote: > Might the locking rules be too complex? Are they even documented? For ->d_name and ->d_parent? Very poorly. That's the absolute worst part of struct dentry locking rules. _That_ is the place where attempts to write a coherent documentation stall every time, often diverting into yet another side story from hell. ->d_revalidate() calling convention change came from one of those; at least that got dealt with for good now - and took a couple of years in the making, what with getting sidetracked to other stuff. Keep in mind that there are weird filesystems that manage to get away with playing very odd games. Or do not manage, really - apparmorfs is the most recent weird thing. Locking of their own, *interspersed* between the directory locks. With ->mkdir() and ->rmdir() both unlocking and relocking the parent. At least apparmor folks have finally admitted that there's a problem... I don't believe that any of us can get away with imposing changes that would break configfs (another weird horror). Or apparmor. Or autofs, for that matter... Anyway, the current rules are: * only __d_move() ever changes name/parent of live dentry (there's d_mark_tmpfile(), but that's done to a dentry that is invisible to anyone other than caller - it flips name from "/" to "#<inumber>" just before attaching the inode to it; part of ->tmpfile()) * any change happens under rename_lock. * any change happens under ->d_lock on everything involved (parent(s) included). * move from one parent to another happens only under ->s_vfs_rename_mutex. * exclusive lock on directory is enough to prevent moves to or from that directory. * only positive dentries ever get moved/renamed * d_splice_alias() may move an existing directory dentry, but no non-directory is going to be touched by it. * with a couple of exceptions (told you it's messy) d_move() and d_exchange() are called in locking conditions equivalent to vfs_rename() - ->s_vfs_rename_mutex if cross-directory, parent(s) exclusive, etc. Exceptions are vfat_lookup() and exfat_lookup() - both would be in trouble if ->link() was supported there. What it boils down to for filesystems is * any dentry passed to directory-modifying operation has stable ->d_name and ->d_parent. That also applies to ->atomic_open() even without O_CREAT in flags, except for the mess with directories in some cases (without O_CREAT if you are given an in-lookup dentry and it turns out to be a directory, from that point on you have no promise that it won't be reparented by d_splice_alias(); whether it can actually happens depends upon the filesystem, but instances tend to just call finish_no_open() in that case and bugger off). * any dentry passed to ->lookup() has stable ->d_name/->d_parent until it had been passed to d_splice_alias() (which is normally the last time we look at it). * ->d_revalidate() and ->d_compare() get stable name passed as argument; they should not look at dentry->d_name at all. > As you know I want to change directory locking so that a ->d_flags bit > locks a dentry in much the same way that locking the parent directory > currently does. > > I had wondered why vfs_link() locks the inode being linked and thought it was > only to protect ->i_nlink. If it is needed to protect against rename > too, that could usefully be documented - or we could use the same > ->d_flags bit to ensure that lock. We could. FWIW, I *like* the notion of "dentry that is held in place and won't go away/get renamed/dropped until we say so". That's the major reason why I'm interested in your patchset, actually. > I guess I'm a bit concerned that your goal here is to transition from > "lots of auditing" to "much less auditing" and I would rather it was "no > auditing needed" - one day you won't want to (or be able to) audit any > more. > > Fudging some type-state with C may well be useful but I suspect it is at > most part of a solution. Simplification, documentation, run-time checks > might also be important parts. As the type-state flag-day is a big > thing, maybe it shouldn't be first. All of that requires being able to answer questions about what's there in the existing filesystems. Which is pretty much the same problem as those audits, obviously. And static annotations are way easier to reason about. Anyway, I'm about to fall asleep (it's nearly 5am here); I've put an initial sketch of infrastructure and of symlink calling conventions change to #experimenta.stable_dentry, but it's very raw at the moment - and lacks the followups that would make use of that stuff. I'll continue tomorrow (right now I've got ->create() half-done as well), but by now one thing I'm certain about is that it will be trivial to reorder wrt your stuff - in either direction. So that worry can be discarded - change might or might not be a good idea, but as flagdays go it's trivial.