Re: bad things when too many negative dentries in a directory

Andreas Dilger <adilger@xxxxxxxxx> · Tue, 22 Apr 2025 00:57:26 -0600

On Apr 16, 2025, at 9:26 AM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> On Tue, 2025-04-15 at 11:22 -0600, Andreas Dilger wrote:
> [...]
>> Negative dentries are only useful if there are fewer than the number
>> of entries in that directory.
> 
> I agree with this, yes.
> 
>> If the negative dentry count exceeds the actual entry count,
> 
> Yes, but finding this number is going to be hard.  We can't iterate a
> directory to count them in the fast path and a directory i_size is
> extremely filesystem and format dependent.

This depends.  Some filesystems will store the actual number of entries
in the directory, or it can be estimated based on the number of blocks
in the directory.

> However, since we only need a rough count, perhaps having the filesystem
> export its average directory entry size and simply dividing by that
> would give a good enough approximation to the number?

I would suggest to add an inode method that can be called on the directory
to request the (estimated) number of entries in a directory.  If the fs
has a good idea of this it can return that number, or it can estimate
based on allocated blocks.  It does not need to be exact, but provides
an upper bound on the useful number of negative dcache entries to keep.

>> it would be more efficient to just cache all of the positive dentries
>> and mark the directory with a "full dentry list" flag that indicates
>> all of the names are already present in dcache and any miss is
>> authoritative. In essence that gives an "infinite" negative lookup
>> cache instead of explicitly storing all of the possible negative
>> entries.
> 
> Practically, I think directories with that flag would probably
> automatically retain positive child dentries as an addition to our
> retain_dentry() logic and automatically kill negative ones.
> 
> This behaviour, though, would remove them from the shrinkers, so
> probably there would have to be a global count of the number of
> unshrinkable children this gives us and have that factor into the
> superblock shrinkers somehow.  Probably add the parent to the lru list
> but make dentry_lru_isolate() always skip until the tipping point for
> shrinking filled directories is reached?

It's true that this flag would (generally) remove the directory and its
immediate children from the dcache shrinkers.  However, the point of a
shrinker is to reduce memory usage, and if the directory can no longer
guarantee that all positive dentries are cached (so no negative dentries
are needed) would generally *increase* memory usage in the end.

I could imagine that such directories would eventually be reaped, but
it should be much harder to do so.  For example, every negative lookup
in such a directory should refresh it in the LRU since the parent dentry
avoided a negative entry from being added to the dcache.

>> For directories like ~/bin, /usr/bin, /usr/lib64, etc. (or any
>> directory) where negative lookups are frequent, it should be possible
>> to determine this threshold automatically.  Once the negative dentry
>> count exceeds the size of the directory by some factor (e.g.
>> directory size / 16, or the actual entry count if the filesystem
>> knows this, it doesn't have to be exactly correct) then a readdir
>> could load all of the names to fully populate the dcache and set the
>> "full dentry list" flag on the directory would allow dropping all
>> negative dentries in that directory.
> 
> All this supposes we have some per directory count of the negative
> dentries.  I think there'd be push back on adding this to struct dentry
> and making it an exact count in the fast path.  The next logical place
> to evaluate it would be the shrinkers but then that wouldn't solve
> Matthew's use case where the shrinkers don't get activated.  I suppose
> some flag that userspace could add to directories it identifies as hot
> might be the next best thing?

No. Kernel memory management shouldn't be dependent on userspace doing
the right thing, and no userspace would ever be taught to consistently
set such a flag.

Again, the numbers don't have to be exact, but if negative dcache is
2x the number of dir entries (or e.g. 1000 more as a directory gets
larger) then it is time to change to caching only positive entries.

Having the negative dcache be directly linked to the parent would be
fine too.  It doesn't make sense to cache negative dentries longer than
the parent, and if there is an upper bound on how many negative entries
can exist on a directory avoids the need to shrink them independently.
If there is lots of memory pressure on the dcache then directories with
inactive negative dentries would eventually be reaped, and even "full
dentry list" directories would eventually come around for shrinking if
they were inactive for a long time.

>> The VFS/VM should avoid dropping directories/dentries from cache in
>> this case, since it is saving more memory (and avoiding filesystem
>> IO) to keep them pinned rather than dropping them from cache.  There
>> might need to be a matching "part of full dentry list" flag on the
>> positive dentries to avoid dcache shrinking of those entries (which
>> would invalidate the premise that the parent holds all of the
>> possible entries in that directory), if checking the parent's flag is
>> too expensive.
> 
> As I said above, I think simply checking the parent flags in
> retain_dentry should do.  Since you don't need it to be exact and the
> parent should have a positive refcount, it should be possible to do a
> READ_ONCE rather than locking.

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP