Re: [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 23, 2025 at 6:47 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> On Wed, Apr 23, 2025 at 09:48:55AM -0700, Caleb Sander Mateos wrote:
> > On Wed, Apr 23, 2025 at 8:39 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Apr 23, 2025 at 08:08:17AM -0700, Caleb Sander Mateos wrote:
> > > > On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> > > > >
> > > > > ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
> > > > > we may have scheduled task work via io_uring_cmd_complete_in_task() for
> > > > > dispatching request, then kernel crash can be triggered.
> > > > >
> > > > > Fix it by not trying to canceling the command if ublk block request is
> > > > > coming to this slot.
> > > > >
> > > > > Reported-by: Jared Holzman <jholzman@xxxxxxxxxx>
> > > > > Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@xxxxxxxxxx/
> > > > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > > > > ---
> > > > >  drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------
> > > > >  1 file changed, 31 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > index c4d4be4f6fbd..fbfb5b815c8d 100644
> > > > > --- a/drivers/block/ublk_drv.c
> > > > > +++ b/drivers/block/ublk_drv.c
> > > > > @@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > > > >         if (res != BLK_STS_OK)
> > > > >                 return res;
> > > > >
> > > > > +       /*
> > > > > +        * Order writing to rq->state in blk_mq_start_request() and
> > > > > +        * reading ubq->canceling, see comment in ublk_cancel_command()
> > > > > +        * wrt. the pair barrier.
> > > > > +        */
> > > > > +       smp_mb();
> > > >
> > > > Adding an mfence to every ublk I/O would be really unfortunate. Memory
> > > > barriers are very expensive in a system with a lot of CPUs. Why can't
> > >
> > > I believe perf effect from the little smp_mb() may not be observed, actually
> > > there are several main contributions for ublk perf per my last profiling:
> >
> > I have seen a single mfence instruction in the I/O path account for
> > multiple percent of the CPU profile in the past. The cost of a fence
> > scales superlinearly with the number of CPUs, so it can be a real
> > parallelism bottleneck. I'm not opposed to the memory barrier if it's
> > truly necessary for correctness, but I would love to consider any
> > alternatives.
>
> Thinking of further, the added barrier is actually useless, because:
>
> - for any new coming request since ublk_start_cancel(), ubq->canceling is
>   always observed
>
> - this patch is only for addressing requests TW is scheduled before or
>   during quiesce, but not get chance to run yet
>
> The added single check of `req && blk_mq_request_started(req)` should be
> enough because:
>
> - either the started request is aborted via __ublk_abort_rq(), so the
> uring_cmd is canceled next time
>
> or
>
> - the uring cmd is done in TW function ublk_dispatch_req() because io_uring
> guarantees that it is called

Your logic seems reasonable to me. I'm all for eliminating the barrier!

>
> >
> > I have been looking at ublk zero-copy CPU profiles recently, and there
> > the most expensive instructions there are the atomic
> > reference-counting in ublk_get_req_ref()/ublk_put_req_ref(). I have
> > some ideas to reduce that overhead.
>
> The two can be observed by perf, but IMO it is still small part compared with
> the other ones, not sure you can get obvious IOPS boost in this area,
> especially it is hard to avoid the atomic reference.
>
> I am preparing patch to relax the context limit for register_io/un_register_io
> command which should have been run from non ublk queue contexts, just need
> one offload ublk server implementation.

Good guess! I was hoping to depend on the fact that buffers can only
be registered by the ubq_daemon task, so that reference count
increment doesn't need to be atomic. But if you plan to add support
for registering ublk request buffers from other tasks, then that won't
work.

Thanks,
Caleb





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux