Re: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/29/25 7:22 PM, Trond Myklebust wrote:
> On Tue, 2025-04-29 at 12:14 -0400, Chuck Lever wrote:
>> On 4/27/25 7:01 PM, trondmy@xxxxxxxxxx wrote:
>>> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>>>
>>> The Linux client assumes that all filehandles are non-volatile for
>>> renames within the same directory (otherwise sillyrename cannot
>>> work).
>>> However, the existence of the Linux 'subtree_check' export option
>>> has
>>> meant that nfs_rename() has always assumed it needs to flush writes
>>> before attempting to rename.
>>>
>>> Since NFSv4 does allow the client to query whether or not the
>>> server
>>> exhibits this behaviour, and since knfsd does actually set the
>>> appropriate flag when 'subtree_check' is enabled on an export, it
>>> should be OK to optimise away the write flushing behaviour in the
>>> cases
>>> where it is clearly not needed.
>>
>> No objection to the high level goal, seems like a sensible change.
>>
>> So NFSv3 still has the flushing requirement, but NFSv4 can disable
>> that
>> requirement as long as the server in question asserts
>> FH4_VOLATILE_ANY
>> and FH4_VOL_RENAME, correct?
>>
>> I'm wondering how confident we are that other server implementations
>> behave reasonably. Yes, they are broken if they don't behave, but
>> there
>> is still risk of introducing a regression.
> 
> I'd prefer to deal with that through other means if it turns out that
> we have to. The problem we have right now is that we are forcing a
> writeback of the entire file while holding several directory mutexes
> (as well as the filesystem-global s_vfs_rename_mutex). That's terrible
> for performance and scalability.
> 
>>
>>
>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>>> ---
>>>  fs/nfs/client.c           |  2 ++
>>>  fs/nfs/dir.c              | 15 ++++++++++++++-
>>>  include/linux/nfs_fs_sb.h |  6 ++++++
>>>  3 files changed, 22 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
>>> index 2115c1189c2d..6d63b958c4bb 100644
>>> --- a/fs/nfs/client.c
>>> +++ b/fs/nfs/client.c
>>> @@ -1105,6 +1105,8 @@ struct nfs_server *nfs_create_server(struct
>>> fs_context *fc)
>>>  		if (server->namelen == 0 || server->namelen >
>>> NFS2_MAXNAMLEN)
>>>  			server->namelen = NFS2_MAXNAMLEN;
>>>  	}
>>> +	/* Linux 'subtree_check' borkenness mandates this setting
>>> */
>>
>> Assuming you are going to respin this patch to deal with the build
>> bot
>> warnings, can you make this comment more useful? "borkenness" sounds
>> like you are dealing with a server bug here, but that's not really
>> the case. subtree_check is working as designed: it requires the extra
>> flushing. The no_subtree_check case does not, IIUC.
> 
> subtree checking is working as designed, but that doesn't change the
> fact that it is a violation of the NFSv3 protocol. You can't to change
> the filehandle in mid flight in any stateless protocol, because that
> will break applications.

My point is that the comment you add in this patch, although whimsical,
is not terribly illuminating. A more expansive comment (with the detail
you provide above) would be helpful.

But if subtree_check is not compliant with RFC 1813, perhaps we should
consider finally deprecating and removing it. At least exports(5) should
mention the spec compliance issue, but my copy of exports(5) does not.


>> It would be better to explain this need strictly in terms of file
>> handle
>> volatility, no?
>>
>>
>>> +	server->fh_expire_type = NFS_FH_VOL_RENAME;
>>>  
>>>  	if (!(fattr->valid & NFS_ATTR_FATTR)) {
>>>  		error = ctx->nfs_mod->rpc_ops->getattr(server,
>>> ctx->mntfh,
>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>>> index bd23fc736b39..d0e0b435a843 100644
>>> --- a/fs/nfs/dir.c
>>> +++ b/fs/nfs/dir.c
>>> @@ -2676,6 +2676,18 @@ nfs_unblock_rename(struct rpc_task *task,
>>> struct nfs_renamedata *data)
>>>  	unblock_revalidate(new_dentry);
>>>  }
>>>  
>>> +static bool nfs_rename_is_unsafe_cross_dir(struct dentry
>>> *old_dentry,
>>> +					   struct dentry
>>> *new_dentry)
>>> +{
>>> +	struct nfs_server *server = NFS_SB(old_dentry->d_sb);
>>> +
>>> +	if (old_dentry->d_parent != new_dentry->d_parent)
>>> +		return false;
>>> +	if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
>>> +		return !(server->fh_expire_type &
>>> NFS_FH_NOEXPIRE_WITH_OPEN);
>>> +	return true;
>>> +}
>>> +
>>>  /*
>>>   * RENAME
>>>   * FIXME: Some nfsds, like the Linux user space nfsd, may generate
>>> a
>>> @@ -2763,7 +2775,8 @@ int nfs_rename(struct mnt_idmap *idmap,
>>> struct inode *old_dir,
>>>  
>>>  	}
>>>  
>>> -	if (S_ISREG(old_inode->i_mode))
>>> +	if (S_ISREG(old_inode->i_mode) &&
>>> +	    nfs_rename_is_unsafe_cross_dir(old_dentry,
>>> new_dentry))
>>>  		nfs_sync_inode(old_inode);
>>>  	task = nfs_async_rename(old_dir, new_dir, old_dentry,
>>> new_dentry,
>>>  				must_unblock ? nfs_unblock_rename
>>> : NULL);
>>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>>> index 71319637a84e..6732c7e04d19 100644
>>> --- a/include/linux/nfs_fs_sb.h
>>> +++ b/include/linux/nfs_fs_sb.h
>>> @@ -236,6 +236,12 @@ struct nfs_server {
>>>  	u32			acl_bitmask;	/* V4 bitmask
>>> representing the ACEs
>>>  						   that are
>>> supported on this
>>>  						   filesystem */
>>> +	/* The following #defines numerically match the NFSv4
>>> equivalents */
>>> +#define NFS_FH_NOEXPIRE_WITH_OPEN (0x1)
>>> +#define NFS_FH_VOLATILE_ANY (0x2)
>>> +#define NFS_FH_VOL_MIGRATION (0x4)
>>> +#define NFS_FH_VOL_RENAME (0x8)
>>> +#define NFS_FH_RENAME_UNSAFE (NFS_FH_VOLATILE_ANY |
>>> NFS_FH_VOL_RENAME)
>>>  	u32			fh_expire_type;	/* V4
>>> bitmask representing file
>>>  						   handle
>>> volatility type for
>>>  						   this filesystem
>>> */
>>
> 


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux