On Thu, 15 May 2025 at 17:15, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > On Thu, 15 May 2025 at 16:57, Jan Kara <jack@xxxxxxx> wrote: > > > > Hello, > > > > we have a customer who is mounting over NFS a directory (let's call it > > hugedir) with many files (there are several millions dentries on d_children > > list). Now when they do 'mv hugedir hugedir.bak; mkdir hugedir' on the > > server, which invalidates NFS cache of this directory, NFS clients get > > stuck in d_invalidate() for hours (until the customer lost patience). > > > > Now I don't want to discuss here sanity or efficiency of this application > > architecture but I'm sharing the opinion that it shouldn't take hours to > > invalidate couple million dentries. Analysis of the crashdump revealed that > > d_invalidate() can have O(n^2) complexity with the number of dentries it is > > invalidating which leads to impractical times to invalidate large numbers > > of dentries. What happens is the following: > > > > There are several processes accessing the hugedir directory - about 16 in > > the case I was inspecting. When the directory changes on the server all > > these 16 processes quickly enter d_invalidate() -> shrink_dcache_parent() > > First thing d_invalidate() does is check if the dentry is unhashed and > return if so, unhash it otherwise. So only d_invalidate() that won > the race for d_lock is going to invoke shink_dcache_parent() the > others will return immediately. > > What am I missing? It's it's an old kernel (<4.18) it might be missing commit ff17fa561a04 ("d_invalidate(): unhash immediately") Thanks, Miklos