On Thu, May 15, 2025 at 9:26 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > Commit f40139fde527 ("ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd") > adds request state check in ublk_cancel_cmd(), and if the request is > started, skip canceling this uring_cmd. > > However, the current uring_cmd may be in ACTIVE state, without block > request coming to the uring command. Meantime, the cached request in > tag_set.tags[tag] is recycled and has been delivered to ublk server, > then this uring_cmd can't be canceled. To check my understanding, the scenario you're describing is that the request has started but also has already been completed by the ublk server? And this can happen because tag_set.tags[q_id]->rqs[tag] still points to the old request even after it has completed? Reading through blk-mq.c, that does seem possible since rqs[tag] is set in blk_mq_start_request() but not cleared in __blk_mq_end_request(). > > ublk requests are aborted in ublk char device release handler, which > depends on canceling all ACTIVE uring_cmd. So cause dead loop. Do you mean "deadlock"? > > Fix this issue by not taking stale request into account when canceling > uring_cmd in ublk_cancel_cmd(). > > Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> > Closes: https://lore.kernel.org/linux-block/mruqwpf4tqenkbtgezv5oxwq7ngyq24jzeyqy4ixzvivatbbxv@4oh2wzz4e6qn/ > Fixes: f40139fde527 ("ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd") > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > drivers/block/ublk_drv.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > index f9032076bc06..dc104c025cd5 100644 > --- a/drivers/block/ublk_drv.c > +++ b/drivers/block/ublk_drv.c > @@ -1708,7 +1708,7 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, unsigned tag, > * that ublk_dispatch_req() is always called > */ > req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); > - if (req && blk_mq_request_started(req)) > + if (req && blk_mq_request_started(req) && req->tag == tag) Is it possible that req now belongs to a different hctx (q_id)? From a quick reading of blk-mq.c, it looks like the hctx's requests are always allocated from the hctx's static_rqs, so I don't think that should be a problem. Reading req->tag here is probably racy though, as it's written by blk_mq_rq_ctx_init(), called from *blk_mq_alloc_request*() on the submitting task. How about checking blk_mq_rq_state(req) == MQ_RQ_IN_FLIGHT instead of blk_mq_request_started(req) && req->tag == tag? Best, Caleb > return; > > spin_lock(&ubq->cancel_lock); > -- > 2.47.1 >