On Tue, Aug 26, 2025 at 11:39:17AM -0400, Josef Bacik wrote: > We can end up with an inode on the LRU list or the cached list, then at > some point in the future go to unlink that inode and then still have an > elevated i_count reference for that inode because it is on one of these > lists. > > The more common case is the cached list. We open a file, write to it, > truncate some of it which triggers the inode_add_lru code in the > pagecache, adding it to the cached LRU. Then we unlink this inode, and > it exists until writeback or reclaim kicks in and removes the inode. > > To handle this case, delete the inode from the LRU list when it is > unlinked, so we have the best case scenario for immediately freeing the > inode. > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > --- > fs/namei.c | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 138a693c2346..e56dcb5747e4 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -4438,6 +4438,7 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode) > int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > struct dentry *dentry) > { > + struct inode *inode = dentry->d_inode; > int error = may_delete(idmap, dir, dentry, 1); > > if (error) > @@ -4447,11 +4448,11 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > return -EPERM; > > dget(dentry); > - inode_lock(dentry->d_inode); > + inode_lock(inode); > > error = -EBUSY; > if (is_local_mountpoint(dentry) || > - (dentry->d_inode->i_flags & S_KERNEL_FILE)) > + (inode->i_flags & S_KERNEL_FILE)) > goto out; > > error = security_inode_rmdir(dir, dentry); > @@ -4463,12 +4464,21 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > goto out; > > shrink_dcache_parent(dentry); > - dentry->d_inode->i_flags |= S_DEAD; > + inode->i_flags |= S_DEAD; > dont_mount(dentry); > detach_mounts(dentry); > > out: > - inode_unlock(dentry->d_inode); > + /* > + * The inode may be on the LRU list, so delete it from the LRU at this > + * point in order to make sure that the inode is freed as soon as > + * possible. > + */ > + spin_lock(&inode->i_lock); > + inode_lru_list_del(inode); > + spin_unlock(&inode->i_lock); > + > + inode_unlock(inode); > dput(dentry); > if (!error) > d_delete_notify(dir, dentry); > @@ -4653,8 +4663,18 @@ int do_unlinkat(int dfd, struct filename *name) > dput(dentry); Why are you doing that in do_unlinkat() instead of vfs_unlink() (as you're doing it in vfs_rmdir() and not do_rmdir())? Doing it in do_unlinkat() means any stacking filesystem such as overlayfs will end up skipping the LRU list removal as they use vfs_unlink() directly. And does btrfs subvolume/snapshot deletion special treatment as well for this as it's semantically equivalent to an rmdir? > } > inode_unlock(path.dentry->d_inode); > - if (inode) > + if (inode) { > + /* > + * The LRU may be holding a reference, remove the inode from the > + * LRU here before dropping our hopefully final reference on the > + * inode. > + */ > + spin_lock(&inode->i_lock); > + inode_lru_list_del(inode); > + spin_unlock(&inode->i_lock); > + > iput(inode); /* truncate the inode here */ > + } > inode = NULL; > if (delegated_inode) { > error = break_deleg_wait(&delegated_inode); > -- > 2.49.0 >