Regards, Haihua Yang On Thu, Aug 7, 2025 at 9:33â?¯AM Mkrtchyan, Tigran <tigran.mkrtchyan@xxxxxxx> wrote: > > > > ----- Original Message ----- > > From: "Haihua Yang" <yanghh@xxxxxxxxx> > > To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > > Sent: Thursday, 7 August, 2025 18:14:57 > > Subject: LAYOUTCOMMIT Failure After CB_LAYOUTRECALL in pNFS Filelayout Scenario > > > I'm observing a consistent failure of LAYOUTCOMMIT when the NFS client > > accesses a pNFS share using filelayout. Below is the sequence of > > events: > > 1, The client opens a file for writing and successfully receives a > > layout (stateid with seqid = 1). > > 2, The client writes data to the data server (DS) successfully. > > 3, The NFS server sends a CB_LAYOUTRECALL (stateid with seqid = 2) > > due to some change on the server side. > > 4, The client sends a LAYOUTCOMMIT (still with seqid = 1), followed > > by a LAYOUTRETURN (with seqid = 2). > > 5, The server responds to LAYOUTCOMMIT with NFS4ERR_OLD_STATEID. > > 6, The server responds to LAYOUTRETURN with NFS4ERR_OK. > > 7, The client retries LAYOUTCOMMIT (still using seqid = 1). > > 8, The server replies with NFS4ERR_BAD_STATEID because the state was > > already removed when processing the LAYOUTRETURN. > > > > It seems there may be two issues with the Linux NFS clientâ??s behavior: > > 1, The client should not send LAYOUTRETURN before receiving a > > non-retryable response to LAYOUTCOMMIT. > > 2, After receiving a CB_LAYOUTRECALL, the client should not continue > > using the old seqid. > > I think this question should go to NFSv4 IETF working group list. > Noetheless, rfc8881 says: > > For CB_LAYOUTRECALL arguments, the client MUST send a response to the recall before using the seqid. > > So, it sounds, as long as the client hasn't responded to CB_LAYOUTRECALL, the 'valid' seqid is 1. Thus, > LAYOUTCOMMIT seqid=1, LAYOUTRETURN seqid=2 looks correct. > > See: https://datatracker.ietf.org/doc/html/rfc8881#layout_stateid > > Best regards, > Tigran. > > > > > Would you consider this a bug in the client? Or is there something I > > may have misunderstood in the protocol behavior? > > > > Thanks, > > Haihua Yang