Re: [RFC] a possible way of reducing the PITA of ->d_name audits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 08 Sep 2025, Al Viro wrote:
> On Mon, Sep 08, 2025 at 04:25:26PM +1000, NeilBrown wrote:
> 
> > Might the locking rules be too complex?  Are they even documented?
> 
> For ->d_name and ->d_parent?  Very poorly.  That's the absolute worst
> part of struct dentry locking rules.  _That_ is the place where attempts
> to write a coherent documentation stall every time, often diverting
> into yet another side story from hell.  ->d_revalidate() calling convention
> change came from one of those; at least that got dealt with for good
> now - and took a couple of years in the making, what with getting sidetracked
> to other stuff.
> 
> Keep in mind that there are weird filesystems that manage to get away with
> playing very odd games.  Or do not manage, really - apparmorfs is the
> most recent weird thing.  Locking of their own, *interspersed* between
> the directory locks.  With ->mkdir() and ->rmdir() both unlocking and
> relocking the parent.  At least apparmor folks have finally admitted that
> there's a problem...
> 
> I don't believe that any of us can get away with imposing changes that
> would break configfs (another weird horror).  Or apparmor.  Or autofs,
> for that matter...
> 
> Anyway, the current rules are:
> 	* only __d_move() ever changes name/parent of live dentry (there's
> d_mark_tmpfile(), but that's done to a dentry that is invisible to anyone
> other than caller - it flips name from "/" to "#<inumber>" just before
> attaching the inode to it; part of ->tmpfile())
> 	* any change happens under rename_lock.
> 	* any change happens under ->d_lock on everything involved (parent(s)
> included).
> 	* move from one parent to another happens only under ->s_vfs_rename_mutex.
> 	* exclusive lock on directory is enough to prevent moves to or from
> that directory.
> 	* only positive dentries ever get moved/renamed
> 	* d_splice_alias() may move an existing directory dentry, but no
> non-directory is going to be touched by it.
> 	* with a couple of exceptions (told you it's messy) d_move() and d_exchange()
> are called in locking conditions equivalent to vfs_rename() - ->s_vfs_rename_mutex
> if cross-directory, parent(s) exclusive, etc.  Exceptions are vfat_lookup()
> and exfat_lookup() - both would be in trouble if ->link() was supported there.
> 
> What it boils down to for filesystems is
> 	* any dentry passed to directory-modifying operation has stable ->d_name
> and ->d_parent.  That also applies to ->atomic_open() even without O_CREAT in
> flags, except for the mess with directories in some cases (without O_CREAT
> if you are given an in-lookup dentry and it turns out to be a directory,
> from that point on you have no promise that it won't be reparented by d_splice_alias();
> whether it can actually happens depends upon the filesystem, but instances tend to
> just call finish_no_open() in that case and bugger off).
> 	* any dentry passed to ->lookup() has stable ->d_name/->d_parent until it
> had been passed to d_splice_alias() (which is normally the last time we look at
> it).
> 	* ->d_revalidate() and ->d_compare() get stable name passed as argument;
> they should not look at dentry->d_name at all.

So if we were to provide kdoc documentation for claim_stability() it
could say something like:

 It is only safe to claim_stability() for a dentry if:
   - ->d_parent->d_inode is locked exclusively
      This is guaranteed for target dentries passed to directory-modifying
      inode_operations, but not e.g. the old_dentry passed to ->link.
      It is also guaranteed for dentry passed to ->atomic_open when create it set.
   - ->d_inode is locked exclusively. This is guaranteed for the
       old_dentry passed to ->link.
   - DCACHE_PAR_LOOKUP is set.  Dentries passed to ->lookup will
     either have this flag or the parent will be locked exclusively
   - dentry is negative and a shared lock is held on parent inode.
     This is guaranteed for dentries passed to ->atomic_open when create
     is NOT set.
   - the ->d_lock is held on the dentry or its ->d_parent

 All uses of claim_stability() should be preceded by a comment
 identifying which condition is claimed to be met.

> 
> > As you know I want to change directory locking so that a ->d_flags bit
> > locks a dentry in much the same way that locking the parent directory
> > currently does.
> > 
> > I had wondered why vfs_link() locks the inode being linked and thought it was
> > only to protect ->i_nlink.  If it is needed to protect against rename
> > too, that could usefully be documented - or we could use the same
> > ->d_flags bit to ensure that lock.
> 
> We could.  FWIW, I *like* the notion of "dentry that is held in place and won't
> go away/get renamed/dropped until we say so".  That's the major reason why I'm
> interested in your patchset, actually.
> 
> > I guess I'm a bit concerned that your goal here is to transition from
> > "lots of auditing" to "much less auditing" and I would rather it was "no
> > auditing needed" - one day you won't want to (or be able to) audit any
> > more.
> > 
> > Fudging some type-state with C may well be useful but I suspect it is at
> > most part of a solution.  Simplification, documentation, run-time checks
> > might also be important parts.  As the type-state flag-day is a big
> > thing, maybe it shouldn't be first.
> 
> All of that requires being able to answer questions about what's there in
> the existing filesystems.  Which is pretty much the same problem as
> those audits, obviously.  And static annotations are way easier to
> reason about.

And they are good places to encourage people to put comments.  I'm
liking this idea more because it provides a focus for documenting a
non-trivial dependency.

> 
> Anyway, I'm about to fall asleep (it's nearly 5am here); I've put an
> initial sketch of infrastructure and of symlink calling conventions change
> to #experimenta.stable_dentry, but it's very raw at the moment - and lacks
> the followups that would make use of that stuff.  I'll continue tomorrow
> (right now I've got ->create() half-done as well), but by now one thing
> I'm certain about is that it will be trivial to reorder wrt your stuff -
> in either direction.  So that worry can be discarded - change might or
> might not be a good idea, but as flagdays go it's trivial.
> 

One thing I don't like is the name "unwrap_dentry()".  It says what is
done rather than what it means or what the purpose is.
Maybe "access_dentry()" (a bit like rcu_access_pointer()).
Maybe "dentry_of()" - then we would want to call stable dentries
"stable_foo" or similar.  So:

 static int afs_symlink(struct mnt_idmap *idmap, struct inode *dir,
                      struct stable_dentry stable_child, const char *content)
 {
       struct dentry *dentry = dentry_of(stable_child);


Passing around a struct that contains a single pointer feels strange and
anything we can do to give it a more natural feel and make it easy to
read would likely be a win.

Thanks,
NeilBrown





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux