Re: [PATCH v3 20/21] __dentry_kill(): new locking scheme

Max Kellermann <max.kellermann@xxxxxxxxx> · Mon, 7 Jul 2025 23:47:04 +0200

On Mon, Jul 7, 2025 at 11:32 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> The second d_walk() does not have the if (!data.found) break; after it.
> So if your point is that we should ignore these and bail out as soon as we
> reach that state, we are not getting any closer to it.

Not quite. My point is that you shouldn't be busy-waiting. And
whatever it is that leads to busy-waiting, it should be fixed

I don't know how the dcache works, and whatever solution I suggest,
it's not well-founded. I still don't even know why you added that "<0"
check.

> The second d_walk() is specifically about the stuff already in some other
> thread's shrink list.  If it finds more than that, all the better, but the
> primary goal is to make some progress in case if there's something in
> another thread's shrink list they are yet to get around to evicting.
>
> Again, what would you have it do?  The requirement is to take out everything
> that has no busy descendents.

A descendant that is dying (i.e. d_lockref.count<0 but still linked in
its parent because Ceph is waiting for an I/O completion), is that
"busy" or "not busy"? What was your idea of handling such a dentry
when you wrote this patch?

> BTW, is that the same dentry all along in your reproducer?  Or does it switch
> to a different dentry after a while?

I'm hunting a Ceph bug that causes I/O completion wait to never finish
while reconnecting to the Ceph MDS, therefore it's always the same
dentry. But that's only an extreme example - the general problem with
busy looping in the dentry cache is always there, even when the
request finishes after a few milliseconds - that means you'll be
busy-waiting for these milliseconds, which is still a bad idea. It
wastes CPU cycles for no reason and drains the battery / accelerates
climate change.