Re: [PATCH v2 03/54] fs: rework iput logic

Mateusz Guzik <mjguzik@xxxxxxxxx> · Wed, 27 Aug 2025 14:58:51 +0200

On Tue, Aug 26, 2025 at 11:39:03AM -0400, Josef Bacik wrote:
> Currently, if we are the last iput, and we have the I_DIRTY_TIME bit
> set, we will grab a reference on the inode again and then mark it dirty
> and then redo the put.  This is to make sure we delay the time update
> for as long as possible.
> 
> We can rework this logic to simply dec i_count if it is not 1, and if it
> is do the time update while still holding the i_count reference.
> 
> Then we can replace the atomic_dec_and_lock with locking the ->i_lock
> and doing atomic_dec_and_test, since we did the atomic_add_unless above.
> 
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> ---
>  fs/inode.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index a3673e1ed157..13e80b434323 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1911,16 +1911,21 @@ void iput(struct inode *inode)
>  	if (!inode)
>  		return;
>  	BUG_ON(inode->i_state & I_CLEAR);
> -retry:
> -	if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
> -		if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
> -			atomic_inc(&inode->i_count);
> -			spin_unlock(&inode->i_lock);
> -			trace_writeback_lazytime_iput(inode);
> -			mark_inode_dirty_sync(inode);
> -			goto retry;
> -		}
> +
> +	if (atomic_add_unless(&inode->i_count, -1, 1))
> +		return;
> +
> +	if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
> +		trace_writeback_lazytime_iput(inode);
> +		mark_inode_dirty_sync(inode);
> +	}
> +
> +	spin_lock(&inode->i_lock);
> +	if (atomic_dec_and_test(&inode->i_count)) {
> +		/* iput_final() drops i_lock */
>  		iput_final(inode);
> +	} else {
> +		spin_unlock(&inode->i_lock);
>  	}
>  }
>  EXPORT_SYMBOL(iput);
> -- 
> 2.49.0
> 

This changes semantics though.

In the stock kernel the I_DIRTY_TIME business is guaranteed to be sorted
out before the call to iput_final().

In principle the flag may reappear after mark_inode_dirty_sync() returns
and before the retried atomic_dec_and_lock succeeds, in which case it
will get cleared again.

With your change the flag is only handled once and should it reappear
before you take the ->i_lock, it will stay there.

I agree the stock handling is pretty crap though.

Your change should test the flag again after taking the spin lock but
before messing with the refcount and if need be unlock + retry.

I would not hurt to assert in iput_final that the spin lock held and
that this flag is not set.

Here is my diff to your diff to illustrate + a cosmetic change, not even
compile-tested:

diff --git a/fs/inode.c b/fs/inode.c
index 421e248b690f..a9ae0c790b5d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1911,7 +1911,7 @@ void iput(struct inode *inode)
 	if (!inode)
 		return;
 	BUG_ON(inode->i_state & I_CLEAR);
-
+retry:
 	if (atomic_add_unless(&inode->i_count, -1, 1))
 		return;
 
@@ -1921,12 +1921,19 @@ void iput(struct inode *inode)
 	}
 
 	spin_lock(&inode->i_lock);
+
+	if (inode->i_count == 1 && inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
+		spin_unlock(&inode->i_lock);
+		goto retry;
+	}
+
 	if (atomic_dec_and_test(&inode->i_count)) {
-		/* iput_final() drops i_lock */
-		iput_final(inode);
-	} else {
 		spin_unlock(&inode->i_lock);
+		return;
 	}
+
+	/* iput_final() drops i_lock */
+	iput_final(inode);
 }
 EXPORT_SYMBOL(iput);