Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/11/25 2:50 PM, Nikhil Jha wrote:
> On Thu, Mar 20, 2025 at 09:16:15AM -0400, Chuck Lever wrote:
>> On 3/19/25 1:02 PM, Nikhil Jha via B4 Relay wrote:
>>> When the client retransmits an operation (for example, because the
>>> server is slow to respond), a new GSS sequence number is associated with
>>> the XID. In the current kernel code the original sequence number is
>>> discarded. Subsequently, if a response to the original request is
>>> received there will be a GSS sequence number mismatch. A mismatch will
>>> trigger another retransmit, possibly repeating the cycle, and after some
>>> number of failed retries EACCES is returned.
>>>
>>> RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
>>> RPCSEC_GSS sequence number of each request it sends” and "compute the
>>> checksum of each sequence number in the cache to try to match the
>>> checksum in the reply's verifier." This is what FreeBSD’s implementation
>>> does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).
>>>
>>> However, even with this cache, retransmits directly caused by a seqno
>>> mismatch can still cause a bad message interleaving that results in this
>>> bug. The RFC already suggests ignoring incorrect seqnos on the server
>>> side, and this seems symmetric, so this patchset also applies that
>>> behavior to the client.
>>>
>>> These two patches are *not* dependent on each other. I tested them by
>>> delaying packets with a Python script hooked up to NFQUEUE. If it would
>>> be helpful I can send this script along as well.
>>>
>>> Signed-off-by: Nikhil Jha <njha@xxxxxxxxxxxxxx>
>>> ---
>>> Changes since v1:
>>>  * Maintain the invariant that the first seqno is always first in
>>>    rq_seqnos, so that it doesn't need to be stored twice.
>>>  * Minor formatting, and resending with proper mailing-list headers so the
>>>    patches are easier to work with.
>>>
>>> ---
>>> Nikhil Jha (2):
>>>       sunrpc: implement rfc2203 rpcsec_gss seqnum cache
>>>       sunrpc: don't immediately retransmit on seqno miss
>>>
>>>  include/linux/sunrpc/xprt.h    | 17 +++++++++++-
>>>  include/trace/events/rpcgss.h  |  4 +--
>>>  include/trace/events/sunrpc.h  |  2 +-
>>>  net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
>>>  net/sunrpc/clnt.c              |  9 +++++--
>>>  net/sunrpc/xprt.c              |  3 ++-
>>>  6 files changed, 64 insertions(+), 30 deletions(-)
>>> ---
>>> base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
>>> change-id: 20250314-rfc2203-seqnum-cache-52389d14f567
>>>
>>> Best regards,
>>
>> This seems like a sensible thing to do to me.
>>
>> Acked-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>
>> -- 
>> Chuck Lever
> 
> Hi,
> 
> We've been running this patch for a while now and noticed a (very silly
> in hindsight) bug.
> 
> maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len);
> 
> needs to be
> 
> maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i++], seq, p, len);
> 
> Or the kernel gets stuck in a loop when you have more than two retries.
> I can resend this patch but I noticed it's already made its way into
> quite a few trees. Should this be a separate patch instead?

The course of action depends on what trees you found the patch in.


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux