Re: [PATCH 3/3] scsi: core: Improve IOPS in case of host-wide tags

Bart Van Assche <bvanassche@xxxxxxx> · Thu, 11 Sep 2025 08:45:57 -0700

On 9/10/25 11:40 PM, Hannes Reinecke wrote:
That is actually a valid point.
There are devices which set 'cmd_per_lun' to the same value
as 'can_queue', rendering the budget map a bit pointless.
But calling blk_mq_all_tag_iter() is more expensive than a simple
sbitmap_weight(), so the improvement isn't _that_ big
(as demonstrated by just 1% performance increase).

Hi Hannes,

In the test I ran blk_mq_all_tag_iter() was not called at all from the
hot path. More in general, I think that blk_mq_all_tag_iter() should
never be called from the command processing path.

The performance improvement in my test was only 1% because the UFS
device in my test setup only supports about 100 K IOPS. The number of
IOPS supported by UFS devices is expected to increase significantly in
the near future. The faster a SCSI device is, the more IOPS will improve
by optimizing SCSI budget allocation.

+ * that have already been allocated but that have not yet been started.
+ */
+int scsi_device_busy(const struct scsi_device *sdev)
+{
+    struct sdev_in_flight_data sifd = { .sdev = sdev };
+    struct blk_mq_tag_set *set = &sdev->host->tag_set;
+
+    if (sdev->budget_map.map)
+        return sbitmap_weight(&sdev->budget_map);
+    if (WARN_ON_ONCE(!set->shared_tags))
+        return 0;

One wonders: what would happen if you would return '0' here if
there is only one LUN?

I don't think that the one LUN case should be handled separately.
The single hardware queue case however could be treated in the same way 
as the host-wide tag set case.

Thanks,

Bart.