On Wed, Jul 30, 2025 at 4:59 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > On 7/30/25 4:08 PM, Olga Kornievskaia wrote: > > Scott Mayhew discovered a security exploit in NFS over TLS in > > tls_alert_recv() due to its assumption it can read data from > > the msg iterator's kvec.. > > > > kTLS implementation splits TLS non-data record payload between > > the control message buffer (which includes the type such as TLS > > aler or TLS cipher change) and the rest of the payload (say TLS > > alert's level/description) which goes into the msg payload buffer. > > > > This patch proposes to rework how control messages are setup and > > used by sock_recvmsg(). > > > > If no control message structure is setup, kTLS layer will read and > > process TLS data record types. As soon as it encounters a TLS control > > message, it would return an error. At that point, NFS can setup a > > kvec backed msg buffer and read in the control message such as a > > TLS alert. Scott found that msg iterator can advance the kvec > > pointer as a part of the copy process thus we need to revert the > > iterator before calling into the tls_alert_recv. > > > > Reported-by: Scott Mayhew <smayhew@xxxxxxxxxx> > > Fixes: 5e052dda121e ("SUNRPC: Recognize control messages in server-side TCP socket code") > > Suggested-by: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> > > Suggested-by: Scott Mayhew <smayhew@xxxxxxxxxx> > > Signed-off-by: Olga Kornievskaia <okorniev@xxxxxxxxxx> > > --- > > net/sunrpc/svcsock.c | 43 +++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 35 insertions(+), 8 deletions(-) > > > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > > index 46c156b121db..e2c5e0e626f9 100644 > > --- a/net/sunrpc/svcsock.c > > +++ b/net/sunrpc/svcsock.c > > @@ -257,20 +257,47 @@ svc_tcp_sock_process_cmsg(struct socket *sock, struct msghdr *msg, > > } > > > > static int > > -svc_tcp_sock_recv_cmsg(struct svc_sock *svsk, struct msghdr *msg) > > +svc_tcp_sock_recv_cmsg(struct socket *sock, unsigned int *msg_flags) > > { > > union { > > struct cmsghdr cmsg; > > u8 buf[CMSG_SPACE(sizeof(u8))]; > > } u; > > - struct socket *sock = svsk->sk_sock; > > + u8 alert[2]; > > + struct kvec alert_kvec = { > > + .iov_base = alert, > > + .iov_len = sizeof(alert), > > + }; > > + struct msghdr msg = { > > + .msg_flags = *msg_flags, > > + .msg_control = &u, > > + .msg_controllen = sizeof(u), > > + }; > > + int ret; > > + > > + iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec, 1, > > + alert_kvec.iov_len); > > + ret = sock_recvmsg(sock, &msg, MSG_DONTWAIT); > > + if (ret > 0 && > > + tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) { > > + iov_iter_revert(&msg.msg_iter, ret); > > + ret = svc_tcp_sock_process_cmsg(sock, &msg, &u.cmsg, -EAGAIN); > > + } > > + return ret; > > +} > > + > > +static int > > +svc_tcp_sock_recvmsg(struct svc_sock *svsk, struct msghdr *msg) > > +{ > > int ret; > > + struct socket *sock = svsk->sk_sock; > > > > - msg->msg_control = &u; > > - msg->msg_controllen = sizeof(u); > > ret = sock_recvmsg(sock, msg, MSG_DONTWAIT); > > - if (unlikely(msg->msg_controllen != sizeof(u))) > > - ret = svc_tcp_sock_process_cmsg(sock, msg, &u.cmsg, ret); > > + if (msg->msg_flags & MSG_CTRUNC) { > > + msg->msg_flags &= ~(MSG_CTRUNC | MSG_EOR); > > + if (ret == 0 || ret == -EIO) > > + ret = svc_tcp_sock_recv_cmsg(sock, &msg->msg_flags); > > + } > > return ret; > > } > > > > @@ -321,7 +348,7 @@ static ssize_t svc_tcp_read_msg(struct svc_rqst *rqstp, size_t buflen, > > iov_iter_advance(&msg.msg_iter, seek); > > buflen -= seek; > > } > > - len = svc_tcp_sock_recv_cmsg(svsk, &msg); > > + len = svc_tcp_sock_recvmsg(svsk, &msg); > > if (len > 0) > > svc_flush_bvec(bvec, len, seek); > > > > @@ -1018,7 +1045,7 @@ static ssize_t svc_tcp_read_marker(struct svc_sock *svsk, > > iov.iov_base = ((char *)&svsk->sk_marker) + svsk->sk_tcplen; > > iov.iov_len = want; > > iov_iter_kvec(&msg.msg_iter, ITER_DEST, &iov, 1, want); > > - len = svc_tcp_sock_recv_cmsg(svsk, &msg); > > + len = svc_tcp_sock_recvmsg(svsk, &msg); > > if (len < 0) > > return len; > > svsk->sk_tcplen += len; > > Fwiw, I've already pulled 1/4 into nfsd-testing. But 4/4 might not apply > until the others are in the tree. We might want these to go through a > single tree. > > Or, can we delay 4/4 until 6.18 ? Delaying 4/4 until 6.18 sounds fine to me. > > > -- > Chuck Lever >