Re: [RFC] a possible way of reducing the PITA of ->d_name audits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 08, 2025 at 04:25:26PM +1000, NeilBrown wrote:

> Might the locking rules be too complex?  Are they even documented?

For ->d_name and ->d_parent?  Very poorly.  That's the absolute worst
part of struct dentry locking rules.  _That_ is the place where attempts
to write a coherent documentation stall every time, often diverting
into yet another side story from hell.  ->d_revalidate() calling convention
change came from one of those; at least that got dealt with for good
now - and took a couple of years in the making, what with getting sidetracked
to other stuff.

Keep in mind that there are weird filesystems that manage to get away with
playing very odd games.  Or do not manage, really - apparmorfs is the
most recent weird thing.  Locking of their own, *interspersed* between
the directory locks.  With ->mkdir() and ->rmdir() both unlocking and
relocking the parent.  At least apparmor folks have finally admitted that
there's a problem...

I don't believe that any of us can get away with imposing changes that
would break configfs (another weird horror).  Or apparmor.  Or autofs,
for that matter...

Anyway, the current rules are:
	* only __d_move() ever changes name/parent of live dentry (there's
d_mark_tmpfile(), but that's done to a dentry that is invisible to anyone
other than caller - it flips name from "/" to "#<inumber>" just before
attaching the inode to it; part of ->tmpfile())
	* any change happens under rename_lock.
	* any change happens under ->d_lock on everything involved (parent(s)
included).
	* move from one parent to another happens only under ->s_vfs_rename_mutex.
	* exclusive lock on directory is enough to prevent moves to or from
that directory.
	* only positive dentries ever get moved/renamed
	* d_splice_alias() may move an existing directory dentry, but no
non-directory is going to be touched by it.
	* with a couple of exceptions (told you it's messy) d_move() and d_exchange()
are called in locking conditions equivalent to vfs_rename() - ->s_vfs_rename_mutex
if cross-directory, parent(s) exclusive, etc.  Exceptions are vfat_lookup()
and exfat_lookup() - both would be in trouble if ->link() was supported there.

What it boils down to for filesystems is
	* any dentry passed to directory-modifying operation has stable ->d_name
and ->d_parent.  That also applies to ->atomic_open() even without O_CREAT in
flags, except for the mess with directories in some cases (without O_CREAT
if you are given an in-lookup dentry and it turns out to be a directory,
from that point on you have no promise that it won't be reparented by d_splice_alias();
whether it can actually happens depends upon the filesystem, but instances tend to
just call finish_no_open() in that case and bugger off).
	* any dentry passed to ->lookup() has stable ->d_name/->d_parent until it
had been passed to d_splice_alias() (which is normally the last time we look at
it).
	* ->d_revalidate() and ->d_compare() get stable name passed as argument;
they should not look at dentry->d_name at all.

> As you know I want to change directory locking so that a ->d_flags bit
> locks a dentry in much the same way that locking the parent directory
> currently does.
> 
> I had wondered why vfs_link() locks the inode being linked and thought it was
> only to protect ->i_nlink.  If it is needed to protect against rename
> too, that could usefully be documented - or we could use the same
> ->d_flags bit to ensure that lock.

We could.  FWIW, I *like* the notion of "dentry that is held in place and won't
go away/get renamed/dropped until we say so".  That's the major reason why I'm
interested in your patchset, actually.

> I guess I'm a bit concerned that your goal here is to transition from
> "lots of auditing" to "much less auditing" and I would rather it was "no
> auditing needed" - one day you won't want to (or be able to) audit any
> more.
> 
> Fudging some type-state with C may well be useful but I suspect it is at
> most part of a solution.  Simplification, documentation, run-time checks
> might also be important parts.  As the type-state flag-day is a big
> thing, maybe it shouldn't be first.

All of that requires being able to answer questions about what's there in
the existing filesystems.  Which is pretty much the same problem as
those audits, obviously.  And static annotations are way easier to
reason about.

Anyway, I'm about to fall asleep (it's nearly 5am here); I've put an
initial sketch of infrastructure and of symlink calling conventions change
to #experimenta.stable_dentry, but it's very raw at the moment - and lacks
the followups that would make use of that stuff.  I'll continue tomorrow
(right now I've got ->create() half-done as well), but by now one thing
I'm certain about is that it will be trivial to reorder wrt your stuff -
in either direction.  So that worry can be discarded - change might or
might not be a good idea, but as flagdays go it's trivial.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux