On Mon, Jul 7, 2025 at 11:32 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > The second d_walk() does not have the if (!data.found) break; after it. > So if your point is that we should ignore these and bail out as soon as we > reach that state, we are not getting any closer to it. Not quite. My point is that you shouldn't be busy-waiting. And whatever it is that leads to busy-waiting, it should be fixed I don't know how the dcache works, and whatever solution I suggest, it's not well-founded. I still don't even know why you added that "<0" check. > The second d_walk() is specifically about the stuff already in some other > thread's shrink list. If it finds more than that, all the better, but the > primary goal is to make some progress in case if there's something in > another thread's shrink list they are yet to get around to evicting. > > Again, what would you have it do? The requirement is to take out everything > that has no busy descendents. A descendant that is dying (i.e. d_lockref.count<0 but still linked in its parent because Ceph is waiting for an I/O completion), is that "busy" or "not busy"? What was your idea of handling such a dentry when you wrote this patch? > BTW, is that the same dentry all along in your reproducer? Or does it switch > to a different dentry after a while? I'm hunting a Ceph bug that causes I/O completion wait to never finish while reconnecting to the Ceph MDS, therefore it's always the same dentry. But that's only an extreme example - the general problem with busy looping in the dentry cache is always there, even when the request finishes after a few milliseconds - that means you'll be busy-waiting for these milliseconds, which is still a bad idea. It wastes CPU cycles for no reason and drains the battery / accelerates climate change.