On Thu, Aug 28, 2025 at 11:06:37AM +0200, Christian Brauner wrote: > On Tue, Aug 26, 2025 at 11:39:17AM -0400, Josef Bacik wrote: > > We can end up with an inode on the LRU list or the cached list, then at > > some point in the future go to unlink that inode and then still have an > > elevated i_count reference for that inode because it is on one of these > > lists. > > > > The more common case is the cached list. We open a file, write to it, > > truncate some of it which triggers the inode_add_lru code in the > > pagecache, adding it to the cached LRU. Then we unlink this inode, and > > it exists until writeback or reclaim kicks in and removes the inode. > > > > To handle this case, delete the inode from the LRU list when it is > > unlinked, so we have the best case scenario for immediately freeing the > > inode. > > > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > > --- > > fs/namei.c | 30 +++++++++++++++++++++++++----- > > 1 file changed, 25 insertions(+), 5 deletions(-) > > > > diff --git a/fs/namei.c b/fs/namei.c > > index 138a693c2346..e56dcb5747e4 100644 > > --- a/fs/namei.c > > +++ b/fs/namei.c > > @@ -4438,6 +4438,7 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode) > > int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > > struct dentry *dentry) > > { > > + struct inode *inode = dentry->d_inode; > > int error = may_delete(idmap, dir, dentry, 1); > > > > if (error) > > @@ -4447,11 +4448,11 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > > return -EPERM; > > > > dget(dentry); > > - inode_lock(dentry->d_inode); > > + inode_lock(inode); > > > > error = -EBUSY; > > if (is_local_mountpoint(dentry) || > > - (dentry->d_inode->i_flags & S_KERNEL_FILE)) > > + (inode->i_flags & S_KERNEL_FILE)) > > goto out; > > > > error = security_inode_rmdir(dir, dentry); > > @@ -4463,12 +4464,21 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode *dir, > > goto out; > > > > shrink_dcache_parent(dentry); > > - dentry->d_inode->i_flags |= S_DEAD; > > + inode->i_flags |= S_DEAD; > > dont_mount(dentry); > > detach_mounts(dentry); > > > > out: > > - inode_unlock(dentry->d_inode); > > + /* > > + * The inode may be on the LRU list, so delete it from the LRU at this > > + * point in order to make sure that the inode is freed as soon as > > + * possible. > > + */ > > + spin_lock(&inode->i_lock); > > + inode_lru_list_del(inode); > > + spin_unlock(&inode->i_lock); > > + > > + inode_unlock(inode); > > dput(dentry); > > if (!error) > > d_delete_notify(dir, dentry); > > @@ -4653,8 +4663,18 @@ int do_unlinkat(int dfd, struct filename *name) > > dput(dentry); > > Why are you doing that in do_unlinkat() instead of vfs_unlink() (as > you're doing it in vfs_rmdir() and not do_rmdir())? > > Doing it in do_unlinkat() means any stacking filesystem such as > overlayfs will end up skipping the LRU list removal as they use > vfs_unlink() directly. > > And does btrfs subvolume/snapshot deletion special treatment as well for > this as it's semantically equivalent to an rmdir? Another thing. If this is unlinking an inode with multiple hardlinks then vfs_unlink() might not remove the inode if it's not the last hardlink. Do you really want to remove the inode from the LRU list in that case as well? > > } > > inode_unlock(path.dentry->d_inode); > > - if (inode) > > + if (inode) { > > + /* > > + * The LRU may be holding a reference, remove the inode from the > > + * LRU here before dropping our hopefully final reference on the > > + * inode. > > + */ > > + spin_lock(&inode->i_lock); > > + inode_lru_list_del(inode); > > + spin_unlock(&inode->i_lock); > > + > > iput(inode); /* truncate the inode here */ > > + } > > inode = NULL; > > if (delegated_inode) { > > error = break_deleg_wait(&delegated_inode); > > -- > > 2.49.0 > >