On Mon, 08 Sep 2025, Al Viro wrote: > On Mon, Sep 08, 2025 at 04:25:26PM +1000, NeilBrown wrote: > > > Might the locking rules be too complex? Are they even documented? > > For ->d_name and ->d_parent? Very poorly. That's the absolute worst > part of struct dentry locking rules. _That_ is the place where attempts > to write a coherent documentation stall every time, often diverting > into yet another side story from hell. ->d_revalidate() calling convention > change came from one of those; at least that got dealt with for good > now - and took a couple of years in the making, what with getting sidetracked > to other stuff. > > Keep in mind that there are weird filesystems that manage to get away with > playing very odd games. Or do not manage, really - apparmorfs is the > most recent weird thing. Locking of their own, *interspersed* between > the directory locks. With ->mkdir() and ->rmdir() both unlocking and > relocking the parent. At least apparmor folks have finally admitted that > there's a problem... > > I don't believe that any of us can get away with imposing changes that > would break configfs (another weird horror). Or apparmor. Or autofs, > for that matter... > > Anyway, the current rules are: > * only __d_move() ever changes name/parent of live dentry (there's > d_mark_tmpfile(), but that's done to a dentry that is invisible to anyone > other than caller - it flips name from "/" to "#<inumber>" just before > attaching the inode to it; part of ->tmpfile()) > * any change happens under rename_lock. > * any change happens under ->d_lock on everything involved (parent(s) > included). > * move from one parent to another happens only under ->s_vfs_rename_mutex. > * exclusive lock on directory is enough to prevent moves to or from > that directory. > * only positive dentries ever get moved/renamed > * d_splice_alias() may move an existing directory dentry, but no > non-directory is going to be touched by it. > * with a couple of exceptions (told you it's messy) d_move() and d_exchange() > are called in locking conditions equivalent to vfs_rename() - ->s_vfs_rename_mutex > if cross-directory, parent(s) exclusive, etc. Exceptions are vfat_lookup() > and exfat_lookup() - both would be in trouble if ->link() was supported there. > > What it boils down to for filesystems is > * any dentry passed to directory-modifying operation has stable ->d_name > and ->d_parent. That also applies to ->atomic_open() even without O_CREAT in > flags, except for the mess with directories in some cases (without O_CREAT > if you are given an in-lookup dentry and it turns out to be a directory, > from that point on you have no promise that it won't be reparented by d_splice_alias(); > whether it can actually happens depends upon the filesystem, but instances tend to > just call finish_no_open() in that case and bugger off). > * any dentry passed to ->lookup() has stable ->d_name/->d_parent until it > had been passed to d_splice_alias() (which is normally the last time we look at > it). > * ->d_revalidate() and ->d_compare() get stable name passed as argument; > they should not look at dentry->d_name at all. So if we were to provide kdoc documentation for claim_stability() it could say something like: It is only safe to claim_stability() for a dentry if: - ->d_parent->d_inode is locked exclusively This is guaranteed for target dentries passed to directory-modifying inode_operations, but not e.g. the old_dentry passed to ->link. It is also guaranteed for dentry passed to ->atomic_open when create it set. - ->d_inode is locked exclusively. This is guaranteed for the old_dentry passed to ->link. - DCACHE_PAR_LOOKUP is set. Dentries passed to ->lookup will either have this flag or the parent will be locked exclusively - dentry is negative and a shared lock is held on parent inode. This is guaranteed for dentries passed to ->atomic_open when create is NOT set. - the ->d_lock is held on the dentry or its ->d_parent All uses of claim_stability() should be preceded by a comment identifying which condition is claimed to be met. > > > As you know I want to change directory locking so that a ->d_flags bit > > locks a dentry in much the same way that locking the parent directory > > currently does. > > > > I had wondered why vfs_link() locks the inode being linked and thought it was > > only to protect ->i_nlink. If it is needed to protect against rename > > too, that could usefully be documented - or we could use the same > > ->d_flags bit to ensure that lock. > > We could. FWIW, I *like* the notion of "dentry that is held in place and won't > go away/get renamed/dropped until we say so". That's the major reason why I'm > interested in your patchset, actually. > > > I guess I'm a bit concerned that your goal here is to transition from > > "lots of auditing" to "much less auditing" and I would rather it was "no > > auditing needed" - one day you won't want to (or be able to) audit any > > more. > > > > Fudging some type-state with C may well be useful but I suspect it is at > > most part of a solution. Simplification, documentation, run-time checks > > might also be important parts. As the type-state flag-day is a big > > thing, maybe it shouldn't be first. > > All of that requires being able to answer questions about what's there in > the existing filesystems. Which is pretty much the same problem as > those audits, obviously. And static annotations are way easier to > reason about. And they are good places to encourage people to put comments. I'm liking this idea more because it provides a focus for documenting a non-trivial dependency. > > Anyway, I'm about to fall asleep (it's nearly 5am here); I've put an > initial sketch of infrastructure and of symlink calling conventions change > to #experimenta.stable_dentry, but it's very raw at the moment - and lacks > the followups that would make use of that stuff. I'll continue tomorrow > (right now I've got ->create() half-done as well), but by now one thing > I'm certain about is that it will be trivial to reorder wrt your stuff - > in either direction. So that worry can be discarded - change might or > might not be a good idea, but as flagdays go it's trivial. > One thing I don't like is the name "unwrap_dentry()". It says what is done rather than what it means or what the purpose is. Maybe "access_dentry()" (a bit like rcu_access_pointer()). Maybe "dentry_of()" - then we would want to call stable dentries "stable_foo" or similar. So: static int afs_symlink(struct mnt_idmap *idmap, struct inode *dir, struct stable_dentry stable_child, const char *content) { struct dentry *dentry = dentry_of(stable_child); Passing around a struct that contains a single pointer feels strange and anything we can do to give it a more natural feel and make it easy to read would likely be a win. Thanks, NeilBrown