On Mon, Aug 25, 2025 at 03:35:02PM -0400, Josef Bacik wrote: > On Mon, Aug 25, 2025 at 11:07:55AM +0200, Christian Brauner wrote: > > On Thu, Aug 21, 2025 at 04:18:27PM -0400, Josef Bacik wrote: > > > At evict_inodes() time, we no longer have SB_ACTIVE set, so we can > > > easily go through the normal iput path to clear any inodes. Update > > > > I'm a bit lost why SB_ACTIVE is used here as a justification to call > > iput(). I think it's because iput_final() would somehow add it back to > > the LRU if SB_ACTIVE was still set and the filesystem somehow would > > indicate it wouldn't want to drop the inode. > > > > I'm confused where that would even happen. IOW, which filesystem would > > indicate "don't drop the inode" even though it's about to vanish. But > > anyway, that's probably not important because... > > > > > dispose_list() to check how we need to free the inode, and then grab a > > > full reference to the inode while we're looping through the remaining > > > inodes, and simply iput them at the end. > > > > > > Since we're just calling iput we don't really care about the i_count on > > > the inode at the current time. Remove the i_count checks and just call > > > iput on every inode we find. > > > > > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > > > --- > > > fs/inode.c | 26 +++++++++++--------------- > > > 1 file changed, 11 insertions(+), 15 deletions(-) > > > > > > diff --git a/fs/inode.c b/fs/inode.c > > > index 72981b890ec6..80ad327746a7 100644 > > > --- a/fs/inode.c > > > +++ b/fs/inode.c > > > @@ -933,7 +933,7 @@ static void evict(struct inode *inode) > > > * Dispose-list gets a local list with local inodes in it, so it doesn't > > > * need to worry about list corruption and SMP locks. > > > */ > > > -static void dispose_list(struct list_head *head) > > > +static void dispose_list(struct list_head *head, bool for_lru) > > > { > > > while (!list_empty(head)) { > > > struct inode *inode; > > > @@ -941,8 +941,12 @@ static void dispose_list(struct list_head *head) > > > inode = list_first_entry(head, struct inode, i_lru); > > > list_del_init(&inode->i_lru); > > > > > > - evict(inode); > > > - iobj_put(inode); > > > + if (for_lru) { > > > + evict(inode); > > > + iobj_put(inode); > > > + } else { > > > + iput(inode); > > > + } > > > > ... Afaict, if we end up in dispose_list() we came from one of two > > locations: > > > > (1) prune_icache_sb() > > In which case inode_lru_isolate() will have only returned inodes > > that prior to your changes would have inode->i_count zero. > > > > (2) evict_inodes() > > Similar story, this only hits inodes with inode->i_count zero. > > > > With your change you're adding an increment from zero for (2) via > > __iget() so that you always end up with a full refcount, and that is > > backing your changes to dispose_list() later. > > > > I don't see the same done for (1) though and so your later call to > > iput() drops the reference below zero? It's accidently benign because > > iiuc atomic_dec_and_test() will simply tell you that reference count > > didn't go to zero and so iput() will back off. But still this should be > > fixed if I'm right. > > Because (1) at this point doesn't have a full reference, it only has an > i_obj_count reference. The next patch converts this, and removes this bit. I did > it this way to clearly mark the change in behavior. Ah, right, I forgot to take the the boolean argument into account. You're passing that as true in (1). Sure and then sorry about the noise. > prune_icache_sb() will call dispose_list(&list, true), which will do the > evict(inode) and iobj_put(inode). This is correct because the inodes on the list > from prune_icache_sb() will have an i_count == and have I_WILL_FREE set, so it > will never have it's i_count increased to 1. > > The change here is to change evict_inodes() to simply call iput(), as it calls > dispose_list(&list, false). We will increase the i_count to 1 from zero via > __iget(), which at this point in the series is completely correct behavior. Then > we will call iput() which will drop the i_count back to zero, and then call > iput_final, and since SB_ACTIVE is not set, it will call evict(inode) and clean > everything up properly. > > > > > The conversion to iput() is introducing a lot of subtlety in the middle > > of the series. If I'm right then the iput() is a always a nop because in > > all cases it was an increment from zero. But it isn't really a nop > > because we still do stuff like call ->drop_inode() again. Maybe it's > > fine because no filesystem would have issues with this but I wouldn't > > count on it and also it feels rather unclean to do it this way. > > So I'm definitely introducing another call to ->drop_inode() here, but > ->drop_inode() has always been a "do we want to keep this inode on the LRU" > call, calling it again doesn't really change anything. Right, the first time the inode reference count drops to zero we call iput_final() and then ->drop_inode() which tells us to put it on the LRU. And we leave it otherwise in tact. This is the part I forgot. Once you add it to the LRU you'll have incremented i_obj_count either when you added it to the cached LRU and moving it from there to the "actual" LRU or you'll take a new one. And then dispose_list() for (1) will just need to put the i_obj_count. Thanks! > That being said it is a subtle functional change. I put it here specifically > because it is a functional change. If it bites us in the ass in some unforseen > way we'll be able to bisect it down to here and then we can all laugh at Josef > because he missed something. Yeah, it's good. It's just very confusing because for a brief time we have to understand two reference counts that are first coupled almost 1:1 and then slowly decoupled which makes this particularly nasty... > > So, under the assumption, that after the increment from zero you did, we > > really only have a blatant zombie inode on our hands and we only need to > > get rid of the i_count we took make that explicit and do: > > > > if (for_lru) { > > evict(inode); > > iobj_put(inode); > > } else { > > /* This inode was always incremented from zero. > > * Get rid of that reference without doing anything else. > > */ > > WARN_ON_ONCE(!atomic_dec_and_test(&inode->i_count)); > > } > > We still need the evict() to actually free the inode. We're just getting there > via iput_final() now instead of directly calling evict(). Right, thanks! > > > > > Btw, for the iobj_put() above, I assume that we're not guaranteed that > > i_obj_count == 1? > > Right, it's purely dropping the LRU list i_obj_count reference. Thanks, Thanks for the comments!