On (25/03/28 00:03), Sergey Senozhatsky wrote: > Some of our devices crash in tb_cfg_request_dequeue(): > > general protection fault, probably for non-canonical address 0xdead000000000122 > > CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65 > RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0 > Call Trace: > <TASK> > ? tb_cfg_request_dequeue+0x2d/0xa0 > tb_cfg_request_work+0x33/0x80 > worker_thread+0x386/0x8f0 > kthread+0xed/0x110 > ret_from_fork+0x38/0x50 > ret_from_fork_asm+0x1b/0x30 > > The circumstances are unclear, however, the theory is that > tb_cfg_request_work() can be scheduled twice for a request: > first time via frame.callback from ring_work() and second > time from tb_cfg_request(). Both times kworkers will execute > tb_cfg_request_dequeue(), which results in double list_del() > from the ctl->request_queue (the list poison deference hints > at it: 0xdead000000000122). > > Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE > bit set. Mika, as was discussed in [1] thread we rolled out the fix to our fleet and we don't see the crashes anymore. So it's tested and verified. [1] https://lore.kernel.org/linux-kernel/20250327145543.GC3152277@xxxxxxxxxxxxxxxxxx > --- > > v3: tweaked commit message > > drivers/thunderbolt/ctl.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c > index cd15e84c47f4..1db2e951b53f 100644 > --- a/drivers/thunderbolt/ctl.c > +++ b/drivers/thunderbolt/ctl.c > @@ -151,6 +151,11 @@ static void tb_cfg_request_dequeue(struct tb_cfg_request *req) > struct tb_ctl *ctl = req->ctl; > > mutex_lock(&ctl->request_queue_lock); > + if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) { > + mutex_unlock(&ctl->request_queue_lock); > + return; > + } > + > list_del(&req->list); > clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags); > if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags)) > -- > 2.49.0.395.g12beb8f557-goog >