On Wed, Apr 23, 2025 at 08:08:17AM -0700, Caleb Sander Mateos wrote: > On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but > > we may have scheduled task work via io_uring_cmd_complete_in_task() for > > dispatching request, then kernel crash can be triggered. > > > > Fix it by not trying to canceling the command if ublk block request is > > coming to this slot. > > > > Reported-by: Jared Holzman <jholzman@xxxxxxxxxx> > > Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@xxxxxxxxxx/ > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > --- > > drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------ > > 1 file changed, 31 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > > index c4d4be4f6fbd..fbfb5b815c8d 100644 > > --- a/drivers/block/ublk_drv.c > > +++ b/drivers/block/ublk_drv.c > > @@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx, > > if (res != BLK_STS_OK) > > return res; > > > > + /* > > + * Order writing to rq->state in blk_mq_start_request() and > > + * reading ubq->canceling, see comment in ublk_cancel_command() > > + * wrt. the pair barrier. > > + */ > > + smp_mb(); > > Adding an mfence to every ublk I/O would be really unfortunate. Memory > barriers are very expensive in a system with a lot of CPUs. Why can't I believe perf effect from the little smp_mb() may not be observed, actually there are several main contributions for ublk perf per my last profiling: - security_uring_cmd() With removing security_uring_cmd(), ublk/loop over fast nvme is close to kernel loop. - bio allocation & freeing ublk bio is allocated from one cpu, and usually freed on another CPU - generic io_uring or block layer handling which should be same with other io_uring application And ublk cost is usually pretty small compared with above when running workload with batched IOs. > we rely on blk_mq_quiesce_queue() to prevent new requests from being > queued? Is the bug that ublk_uring_cmd_cancel_fn() alls > ublk_start_cancel() (which calls blk_mq_quiesce_queue()), but > ublk_cancel_dev() does not? I guess it is because we just mark ->canceling for one ubq with queue quiesced. If all queues' ->canceling is set in ublk_start_cancel(), the issue may be avoided too without this change. Thanks Ming