Hi Tigran and Trond, The Linux client calls nfs4_layout_refresh_old_stateid when the server returns NFS4ERR_OLD_STATEID in response to a LAYOUTRETURN, but it doesn’t do the same for LAYOUTCOMMIT. Is there a specific reason for this difference? Thanks Haihua Yang On Thu, Aug 7, 2025 at 10:22 AM Haihua Yang <yanghh@xxxxxxxxx> wrote: > > Tigran, > I forgot to mention in the previous email, after step 4, client also > sends a reply to the CB_LAYOUTRECALL. But when retrying the > LAYOUTCOMMIT afterword, it still uses seqid 1. > From what I observed in the Linux implementation, the retry logic > doesn’t update the request arguments, so the client ends up resending > the same LAYOUTCOMMIT with the old seqid. > > Regards, > Haihua Yang > > > On Thu, Aug 7, 2025 at 9:33 AM Mkrtchyan, Tigran > <tigran.mkrtchyan@xxxxxxx> wrote: > > > > > > > > ----- Original Message ----- > > > From: "Haihua Yang" <yanghh@xxxxxxxxx> > > > To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > > > Sent: Thursday, 7 August, 2025 18:14:57 > > > Subject: LAYOUTCOMMIT Failure After CB_LAYOUTRECALL in pNFS Filelayout Scenario > > > > > I'm observing a consistent failure of LAYOUTCOMMIT when the NFS client > > > accesses a pNFS share using filelayout. Below is the sequence of > > > events: > > > 1, The client opens a file for writing and successfully receives a > > > layout (stateid with seqid = 1). > > > 2, The client writes data to the data server (DS) successfully. > > > 3, The NFS server sends a CB_LAYOUTRECALL (stateid with seqid = 2) > > > due to some change on the server side. > > > 4, The client sends a LAYOUTCOMMIT (still with seqid = 1), followed > > > by a LAYOUTRETURN (with seqid = 2). > > > 5, The server responds to LAYOUTCOMMIT with NFS4ERR_OLD_STATEID. > > > 6, The server responds to LAYOUTRETURN with NFS4ERR_OK. > > > 7, The client retries LAYOUTCOMMIT (still using seqid = 1). > > > 8, The server replies with NFS4ERR_BAD_STATEID because the state was > > > already removed when processing the LAYOUTRETURN. > > > > > > It seems there may be two issues with the Linux NFS client’s behavior: > > > 1, The client should not send LAYOUTRETURN before receiving a > > > non-retryable response to LAYOUTCOMMIT. > > > 2, After receiving a CB_LAYOUTRECALL, the client should not continue > > > using the old seqid. > > > > I think this question should go to NFSv4 IETF working group list. > > Noetheless, rfc8881 says: > > > > For CB_LAYOUTRECALL arguments, the client MUST send a response to the recall before using the seqid. > > > > So, it sounds, as long as the client hasn't responded to CB_LAYOUTRECALL, the 'valid' seqid is 1. Thus, > > LAYOUTCOMMIT seqid=1, LAYOUTRETURN seqid=2 looks correct. > > > > See: https://datatracker.ietf.org/doc/html/rfc8881#layout_stateid > > > > Best regards, > > Tigran. > > > > > > > > Would you consider this a bug in the client? Or is there something I > > > may have misunderstood in the protocol behavior? > > > > > > Thanks, > > > Haihua Yang