On Wed, 2025-07-30 at 15:19 +0000, Alex Markuze wrote: > When the parent directory's i_rwsem is not locked, req->r_parent may become > stale due to concurrent operations (e.g. rename) between dentry lookup and > message creation. Validate that r_parent matches the encoded parent inode > and update to the correct inode if a mismatch is detected. Could we share any description of crash or workload misbehavior that can illustrate the symptoms of the issue? > --- > fs/ceph/inode.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 42 insertions(+), 2 deletions(-) > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > index 814f9e9656a0..49fb1e3a02e8 100644 > --- a/fs/ceph/inode.c > +++ b/fs/ceph/inode.c > @@ -56,6 +56,46 @@ static int ceph_set_ino_cb(struct inode *inode, void *data) > return 0; > } > > +/* > + * Validate that the directory inode referenced by @req->r_parent matches the > + * inode number and snapshot id contained in the reply's directory record. If > + * they do not match – which can theoretically happen if the parent dentry was > + * moved between the time the request was issued and the reply arrived – fall > + * back to looking up the correct inode in the inode cache. > + * > + * A reference is *always* returned. Callers that receive a different inode > + * than the original @parent are responsible for dropping the extra reference > + * once the reply has been processed. > + */ > +static struct inode *ceph_get_reply_dir(struct super_block *sb, > + struct inode *parent, > + struct ceph_mds_reply_info_parsed *rinfo) > +{ > + struct ceph_vino vino; > + > + if (unlikely(!rinfo->diri.in)) > + return parent; /* nothing to compare against */ If we could receive parent == NULL, then is it possible to receive rinfo == NULL? > + > + /* If we didn't have a cached parent inode to begin with, just bail out. */ > + if (!parent) > + return NULL; Is it normal workflow that parent == NULL? Should we complain about it here? > + > + vino.ino = le64_to_cpu(rinfo->diri.in->ino); > + vino.snap = le64_to_cpu(rinfo->diri.in->snapid); > + > + if (likely(parent && ceph_ino(parent) == vino.ino && ceph_snap(parent) == vino.snap)) It looks like long line. Could we introduce a inline static function with good name here? We already checked that parent is not NULL above. Does it makes sense to have this duplicated check here? > + return parent; /* matches – use the original reference */ > + > + /* Mismatch – this should be rare. Emit a WARN and obtain the correct inode. */ > + WARN(1, "ceph: reply dir mismatch (parent %s %llx.%llx reply %llx.%llx)\n", > + parent ? "valid" : "NULL", How parent can be NULL here? We already checked this pointer. And this construction looks pretty complicated. > + parent ? ceph_ino(parent) : 0ULL, > + parent ? ceph_snap(parent) : 0ULL, > + vino.ino, vino.snap); > + > + return ceph_get_inode(sb, vino, NULL); > +} > + > /** > * ceph_new_inode - allocate a new inode in advance of an expected create > * @dir: parent directory for new inode > @@ -1548,8 +1588,8 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) > } > > if (rinfo->head->is_dentry) { > - struct inode *dir = req->r_parent; > - > + /* r_parent may be stale, in cases when R_PARENT_LOCKED is not set, so we need to get the correct inode */ Comment is too long for one line. We could have multi-line comment here. If R_PARENT_LOCKED is not set, then, is it normal execution flow or not? Should we fix the use-case of not setting R_PARENT_LOCKED? Thanks, Slava. > + struct inode *dir = ceph_get_reply_dir(sb, req->r_parent, rinfo); > if (dir) { > err = ceph_fill_inode(dir, NULL, &rinfo->diri, > rinfo->dirfrag, session, -1,