On 2025/09/03/18:35PM, Yu Kuai wrote: >On 2025/09/03 16:41 PM, Xue He wrote: >> On 2025/09/02 08:47 AM, Yu Kuai wrote: >>> On 2025/09/01 16:22 PM, Xue He wrote: >> ...... >> >> the information of my nvme like this: >> number of CPU: 16 >> memory: 16G >> nvme nvme0: 16/0/16 default/read/poll queue >> cat /sys/class/nvme/nvme0/nvme0n1/queue/nr_requests >> 1023 >> >> In more precise terms, I think it is not that the tags are fully exhausted, >> but rather that after scanning the bitmap for free bits, the remaining >> contiguous bits are nsufficient to meet the requirement (have but not enough). >> The specific function involved is __sbitmap_queue_get_batch in lib/sbitmap.c. >> get_mask = ((1UL << nr_tags) - 1) << nr; >> if (nr_tags > 1) { >> printk("before %ld\n", get_mask); >> } >> while (!atomic_long_try_cmpxchg(ptr, &val, >> get_mask | val)) >> ; >> get_mask = (get_mask & ~val) >> nr; >> >> where during the batch acquisition of contiguous free bits, an atomic operation >> is performed, resulting in the actual tag_mask obtained differing from the >> originally requested one. > >Yes, so this function will likely to obtain less tags than nr_tags,the >mask is always start from first zero bit with nr_tags bit, and >sbitmap_deferred_clear() is called uncondionally, it's likely there are >non-zero bits within this range. > >Just wonder, do you consider fixing this directly in >__blk_mq_alloc_requests_batch()? > > - call sbitmap_deferred_clear() and retry on allocation failure, so >that the whole word can be used even if previous allocated request are >done, especially for nvme with huge tag depths; > - retry blk_mq_get_tags() until data->nr_tags is zero; > I haven't tried this yet, as I'm concerned that if it spin here, it might introduce more latency. Anyway, I may try to implement this idea and do some tests to observe the results. Thanks.