On Tue, 2025-04-29 at 12:14 -0400, Chuck Lever wrote: > On 4/27/25 7:01 PM, trondmy@xxxxxxxxxx wrote: > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > The Linux client assumes that all filehandles are non-volatile for > > renames within the same directory (otherwise sillyrename cannot > > work). > > However, the existence of the Linux 'subtree_check' export option > > has > > meant that nfs_rename() has always assumed it needs to flush writes > > before attempting to rename. > > > > Since NFSv4 does allow the client to query whether or not the > > server > > exhibits this behaviour, and since knfsd does actually set the > > appropriate flag when 'subtree_check' is enabled on an export, it > > should be OK to optimise away the write flushing behaviour in the > > cases > > where it is clearly not needed. > > No objection to the high level goal, seems like a sensible change. > > So NFSv3 still has the flushing requirement, but NFSv4 can disable > that > requirement as long as the server in question asserts > FH4_VOLATILE_ANY > and FH4_VOL_RENAME, correct? > > I'm wondering how confident we are that other server implementations > behave reasonably. Yes, they are broken if they don't behave, but > there > is still risk of introducing a regression. I'd prefer to deal with that through other means if it turns out that we have to. The problem we have right now is that we are forcing a writeback of the entire file while holding several directory mutexes (as well as the filesystem-global s_vfs_rename_mutex). That's terrible for performance and scalability. > > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > --- > > fs/nfs/client.c | 2 ++ > > fs/nfs/dir.c | 15 ++++++++++++++- > > include/linux/nfs_fs_sb.h | 6 ++++++ > > 3 files changed, 22 insertions(+), 1 deletion(-) > > > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c > > index 2115c1189c2d..6d63b958c4bb 100644 > > --- a/fs/nfs/client.c > > +++ b/fs/nfs/client.c > > @@ -1105,6 +1105,8 @@ struct nfs_server *nfs_create_server(struct > > fs_context *fc) > > if (server->namelen == 0 || server->namelen > > > NFS2_MAXNAMLEN) > > server->namelen = NFS2_MAXNAMLEN; > > } > > + /* Linux 'subtree_check' borkenness mandates this setting > > */ > > Assuming you are going to respin this patch to deal with the build > bot > warnings, can you make this comment more useful? "borkenness" sounds > like you are dealing with a server bug here, but that's not really > the case. subtree_check is working as designed: it requires the extra > flushing. The no_subtree_check case does not, IIUC. subtree checking is working as designed, but that doesn't change the fact that it is a violation of the NFSv3 protocol. You can't to change the filehandle in mid flight in any stateless protocol, because that will break applications. > It would be better to explain this need strictly in terms of file > handle > volatility, no? > > > > + server->fh_expire_type = NFS_FH_VOL_RENAME; > > > > if (!(fattr->valid & NFS_ATTR_FATTR)) { > > error = ctx->nfs_mod->rpc_ops->getattr(server, > > ctx->mntfh, > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > index bd23fc736b39..d0e0b435a843 100644 > > --- a/fs/nfs/dir.c > > +++ b/fs/nfs/dir.c > > @@ -2676,6 +2676,18 @@ nfs_unblock_rename(struct rpc_task *task, > > struct nfs_renamedata *data) > > unblock_revalidate(new_dentry); > > } > > > > +static bool nfs_rename_is_unsafe_cross_dir(struct dentry > > *old_dentry, > > + struct dentry > > *new_dentry) > > +{ > > + struct nfs_server *server = NFS_SB(old_dentry->d_sb); > > + > > + if (old_dentry->d_parent != new_dentry->d_parent) > > + return false; > > + if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE) > > + return !(server->fh_expire_type & > > NFS_FH_NOEXPIRE_WITH_OPEN); > > + return true; > > +} > > + > > /* > > * RENAME > > * FIXME: Some nfsds, like the Linux user space nfsd, may generate > > a > > @@ -2763,7 +2775,8 @@ int nfs_rename(struct mnt_idmap *idmap, > > struct inode *old_dir, > > > > } > > > > - if (S_ISREG(old_inode->i_mode)) > > + if (S_ISREG(old_inode->i_mode) && > > + nfs_rename_is_unsafe_cross_dir(old_dentry, > > new_dentry)) > > nfs_sync_inode(old_inode); > > task = nfs_async_rename(old_dir, new_dir, old_dentry, > > new_dentry, > > must_unblock ? nfs_unblock_rename > > : NULL); > > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h > > index 71319637a84e..6732c7e04d19 100644 > > --- a/include/linux/nfs_fs_sb.h > > +++ b/include/linux/nfs_fs_sb.h > > @@ -236,6 +236,12 @@ struct nfs_server { > > u32 acl_bitmask; /* V4 bitmask > > representing the ACEs > > that are > > supported on this > > filesystem */ > > + /* The following #defines numerically match the NFSv4 > > equivalents */ > > +#define NFS_FH_NOEXPIRE_WITH_OPEN (0x1) > > +#define NFS_FH_VOLATILE_ANY (0x2) > > +#define NFS_FH_VOL_MIGRATION (0x4) > > +#define NFS_FH_VOL_RENAME (0x8) > > +#define NFS_FH_RENAME_UNSAFE (NFS_FH_VOLATILE_ANY | > > NFS_FH_VOL_RENAME) > > u32 fh_expire_type; /* V4 > > bitmask representing file > > handle > > volatility type for > > this filesystem > > */ > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx