On Wed, Jul 30, 2025 at 04:22:02PM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > Changes from v1: > - the ioc changes are send separately; > - change the patch 1-3 order as suggested by Damien; > > Currently, both mq-deadline and bfq have global spin lock that will be > grabbed inside elevator methods like dispatch_request, insert_requests, > and bio_merge. And the global lock is the main reason mq-deadline and > bfq can't scale very well. > > For dispatch_request method, current behavior is dispatching one request at > a time. In the case of multiple dispatching contexts, This behavior, on the > one hand, introduce intense lock contention: > > t1: t2: t3: > lock lock lock > // grab lock > ops.dispatch_request > unlock > // grab lock > ops.dispatch_request > unlock > // grab lock > ops.dispatch_request > unlock > > on the other hand, messing up the requests dispatching order: > t1: > > lock > rq1 = ops.dispatch_request > unlock > t2: > lock > rq2 = ops.dispatch_request > unlock > > lock > rq3 = ops.dispatch_request > unlock > > lock > rq4 = ops.dispatch_request > unlock > > //rq1,rq3 issue to disk > // rq2, rq4 issue to disk > > In this case, the elevator dispatch order is rq 1-2-3-4, however, > such order in disk is rq 1-3-2-4, the order for rq2 and rq3 is inversed. > > While dispatching request, blk_mq_get_disatpch_budget() and > blk_mq_get_driver_tag() must be called, and they are not ready to be > called inside elevator methods, hence introduce a new method like > dispatch_requests is not possible. > > In conclusion, this set factor the global lock out of dispatch_request > method, and support request batch dispatch by calling the methods > multiple time while holding the lock. > > nullblk setup: > modprobe null_blk nr_devices=0 && > udevadm settle && > cd /sys/kernel/config/nullb && > mkdir nullb0 && > cd nullb0 && > echo 0 > completion_nsec && > echo 512 > blocksize && > echo 0 > home_node && > echo 0 > irqmode && > echo 128 > submit_queues && > echo 1024 > hw_queue_depth && > echo 1024 > size && > echo 0 > memory_backed && > echo 2 > queue_mode && > echo 1 > power || > exit $? > > Test script: > fio -filename=/dev/$disk -name=test -rw=randwrite -bs=4k -iodepth=32 \ > -numjobs=16 --iodepth_batch_submit=8 --iodepth_batch_complete=8 \ > -direct=1 -ioengine=io_uring -group_reporting -time_based -runtime=30 > > Test result: iops > > | | deadline | bfq | > | --------------- | -------- | -------- | > | before this set | 263k | 124k | > | after this set | 475k | 292k | batch dispatch may hurt io merge performance which is important for elevator, so please provide test data on real HDD. & SSD., instead of null_blk only, and it can be perfect if merge sensitive workload is evaluated. Thanks, Ming