From: Yu Kuai <yukuai3@xxxxxxxxxx> Changes from v1: - the ioc changes are send separately; - change the patch 1-3 order as suggested by Damien; Currently, both mq-deadline and bfq have global spin lock that will be grabbed inside elevator methods like dispatch_request, insert_requests, and bio_merge. And the global lock is the main reason mq-deadline and bfq can't scale very well. For dispatch_request method, current behavior is dispatching one request at a time. In the case of multiple dispatching contexts, This behavior, on the one hand, introduce intense lock contention: t1: t2: t3: lock lock lock // grab lock ops.dispatch_request unlock // grab lock ops.dispatch_request unlock // grab lock ops.dispatch_request unlock on the other hand, messing up the requests dispatching order: t1: lock rq1 = ops.dispatch_request unlock t2: lock rq2 = ops.dispatch_request unlock lock rq3 = ops.dispatch_request unlock lock rq4 = ops.dispatch_request unlock //rq1,rq3 issue to disk // rq2, rq4 issue to disk In this case, the elevator dispatch order is rq 1-2-3-4, however, such order in disk is rq 1-3-2-4, the order for rq2 and rq3 is inversed. While dispatching request, blk_mq_get_disatpch_budget() and blk_mq_get_driver_tag() must be called, and they are not ready to be called inside elevator methods, hence introduce a new method like dispatch_requests is not possible. In conclusion, this set factor the global lock out of dispatch_request method, and support request batch dispatch by calling the methods multiple time while holding the lock. nullblk setup: modprobe null_blk nr_devices=0 && udevadm settle && cd /sys/kernel/config/nullb && mkdir nullb0 && cd nullb0 && echo 0 > completion_nsec && echo 512 > blocksize && echo 0 > home_node && echo 0 > irqmode && echo 128 > submit_queues && echo 1024 > hw_queue_depth && echo 1024 > size && echo 0 > memory_backed && echo 2 > queue_mode && echo 1 > power || exit $? Test script: fio -filename=/dev/$disk -name=test -rw=randwrite -bs=4k -iodepth=32 \ -numjobs=16 --iodepth_batch_submit=8 --iodepth_batch_complete=8 \ -direct=1 -ioengine=io_uring -group_reporting -time_based -runtime=30 Test result: iops | | deadline | bfq | | --------------- | -------- | -------- | | before this set | 263k | 124k | | after this set | 475k | 292k | Yu Kuai (5): blk-mq-sched: introduce high level elevator lock mq-deadline: switch to use elevator lock block, bfq: switch to use elevator lock blk-mq-sched: refactor __blk_mq_do_dispatch_sched() blk-mq-sched: support request batch dispatching for sq elevator block/bfq-cgroup.c | 4 +- block/bfq-iosched.c | 49 +++++---- block/bfq-iosched.h | 2 +- block/blk-mq-sched.c | 241 ++++++++++++++++++++++++++++++------------- block/blk-mq.h | 21 ++++ block/elevator.c | 1 + block/elevator.h | 4 +- block/mq-deadline.c | 58 +++++------ 8 files changed, 248 insertions(+), 132 deletions(-) -- 2.39.2