On Wed, Sep 10, 2025 at 8:16 PM Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote: > > On Wed, 2025-09-10 at 12:50 +0200, Ilya Dryomov wrote: > > There is a place where generic code in messenger.c is reading and > > another place where it is writing to con->v1 union member without > > checking that the union member is active (i.e. msgr1 is in use). > > > > On 64-bit systems, con->v1.auth_retry overlaps with con->v2.out_iter, > > so such a read is almost guaranteed to return a bogus value instead of > > 0 when msgr2 is in use. This ends up being fairly benign because the > > side effect is just the invalidation of the authorizer and successive > > fetching of new tickets. > > > > con->v1.connect_seq overlaps with con->v2.conn_bufs and the fact that > > it's being written to can cause more serious consequences, but luckily > > it's not something that happens often. > > > > Cc: stable@xxxxxxxxxxxxxxx > > Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") > > Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx> > > --- > > net/ceph/messenger.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > > index d1b5705dc0c6..9f6d860411cb 100644 > > --- a/net/ceph/messenger.c > > +++ b/net/ceph/messenger.c > > @@ -1524,7 +1524,7 @@ static void con_fault_finish(struct ceph_connection *con) > > * in case we faulted due to authentication, invalidate our > > * current tickets so that we can get new ones. > > */ > > - if (con->v1.auth_retry) { > > + if (!ceph_msgr2(from_msgr(con->msgr)) && con->v1.auth_retry) { > > Frankly speaking, this check implementation looks pretty not obvious :). > > static inline bool ceph_msgr2(struct ceph_client *client) > { > return client->options->con_modes[0] != CEPH_CON_MODE_UNKNOWN; > } > > It's strange that struct ceph_connection_v1_info and struct > ceph_connection_v2_info don't start with version field. Because, it will be much > cleaner if these structures simply start like this: > > struct ceph_connection_v1_info { > u8 version; > <skipped> > }; > > struct ceph_connection_v2_info { > u8 version; > <skipped> > }; > > But now it's too complicated to change this. > > > dout("auth_retry %d, invalidating\n", con->v1.auth_retry); > > if (con->ops->invalidate_authorizer) > > con->ops->invalidate_authorizer(con); > > @@ -1714,9 +1714,10 @@ static void clear_standby(struct ceph_connection *con) > > { > > /* come back from STANDBY? */ > > if (con->state == CEPH_CON_S_STANDBY) { > > - dout("clear_standby %p and ++connect_seq\n", con); > > This comment "++connect_seq" makes sense to delete or to rework even without the > fix. :) > > > + dout("clear_standby %p\n", con); > > con->state = CEPH_CON_S_PREOPEN; > > - con->v1.connect_seq++; > > + if (!ceph_msgr2(from_msgr(con->msgr))) > > + con->v1.connect_seq++; > > By the way, we have connect_seq field in struct ceph_connection_v1_info and in > struct ceph_connection_v2_info. Why do we increment this field only for version > 1? Version 2 doesn't need this increment? Hi Slava, Yes, v2 doesn't need it. This increment is specific to v1, but there wasn't a not-too-verbose way to move it into the protocol-specific file and so it remained in the generic messenger.c as one of a couple of abstraction leaks from v1-only era. Thanks, Ilya