On 9/10/25 11:40 PM, Hannes Reinecke wrote:
That is actually a valid point.
There are devices which set 'cmd_per_lun' to the same value
as 'can_queue', rendering the budget map a bit pointless.
But calling blk_mq_all_tag_iter() is more expensive than a simple
sbitmap_weight(), so the improvement isn't _that_ big
(as demonstrated by just 1% performance increase).
Hi Hannes,
In the test I ran blk_mq_all_tag_iter() was not called at all from the
hot path. More in general, I think that blk_mq_all_tag_iter() should
never be called from the command processing path.
The performance improvement in my test was only 1% because the UFS
device in my test setup only supports about 100 K IOPS. The number of
IOPS supported by UFS devices is expected to increase significantly in
the near future. The faster a SCSI device is, the more IOPS will improve
by optimizing SCSI budget allocation.
+ * that have already been allocated but that have not yet been started.
+ */
+int scsi_device_busy(const struct scsi_device *sdev)
+{
+ struct sdev_in_flight_data sifd = { .sdev = sdev };
+ struct blk_mq_tag_set *set = &sdev->host->tag_set;
+
+ if (sdev->budget_map.map)
+ return sbitmap_weight(&sdev->budget_map);
+ if (WARN_ON_ONCE(!set->shared_tags))
+ return 0;
One wonders: what would happen if you would return '0' here if
there is only one LUN?
I don't think that the one LUN case should be handled separately.
The single hardware queue case however could be treated in the same way
as the host-wide tag set case.
Thanks,
Bart.