Hi Joanne, thanks for your review and sorry for me late reply. Had sent out the series night before going on vacation. On 7/25/25 02:43, Joanne Koong wrote: > On Tue, Jul 22, 2025 at 2:58 PM Bernd Schubert <bschubert@xxxxxxx> wrote: >> >> Currently, FUSE io-uring requires all queues to be registered before >> becoming ready, which can result in too much memory usage. >> >> This patch introduces a static queue mapping system that allows FUSE >> io-uring to operate with a reduced number of registered queues by: >> >> 1. Adding a queue_mapping array to track which registered queue each >> CPU should use >> 2. Replacing the is_ring_ready() check with immediate queue mapping >> once any queues are registered >> 3. Implementing fuse_uring_map_queues() to create CPU-to-queue mappings >> that prefer NUMA-local queues when available >> 4. Updating fuse_uring_get_queue() to use the static mapping instead >> of direct CPU-to-queue correspondence >> >> The mapping prioritizes NUMA locality by first attempting to map CPUs >> to queues on the same NUMA node, falling back to any available >> registered queue if no local queue exists. > > Do we need a static queue map or does it suffice to just overload a > queue on the local node if we're not able to find an "ideal" queue for > the request? it seems to me like if we default to that behavior, then > we get the advantages the static queue map is trying to provide (eg > marking the ring as ready as soon as the first queue is registered and > finding a last-resort queue for the request) without the overhead. > I have a branch for that, that uses the first available queue from the registered queue bitmask. In testing with our ddn file system it resulted in too imbalanced queue usage and I had given up that approach therefore. Assuming the scheduler balances processes between cores the static mappping guarantees balanced queues. Thanks, Bernd