Hi Bernd, Bernd Schubert <bernd@xxxxxxxxxxx> 于2025年8月16日周六 04:56写道: > > On August 15, 2025 9:45:34 AM GMT+02:00, Gang He <dchg2000@xxxxxxxxx> wrote: > >Hi Bernd, > > > >Sorry for interruption. > >I tested your fuse over io_uring patch set with libfuse null example, > >the fuse over io_uring mode has better performance than the default > >mode. e.g., the fio command is as below, > >fio -direct=1 --filename=/mnt/singfile --rw=read -iodepth=1 > >--ioengine=libaio --bs=4k --size=4G --runtime=60 --numjobs=1 > >-name=test_fuse1 > > > >But, if I increased fio iodepth option, the fuse over io_uring mode > >has worse performance than the default mode. e.g., the fio command is > >as below, > >fio -direct=1 --filename=/mnt/singfile --rw=read -iodepth=4 > >--ioengine=libaio --bs=4k --size=4G --runtime=60 --numjobs=1 > >-name=test_fuse2 > > > >The test result showed the fuse over io_uring mode cannot handle this > >case properly. could you take a look at this issue? or this is design > >issue? > > > >I went through the related source code, I do not understand each > >fuse_ring_queue thread has only one available ring entry? this design > >will cause the above issue? > >the related code is as follows, > >dev_uring.c > >1099 > >1100 queue = ring->queues[qid]; > >1101 if (!queue) { > >1102 queue = fuse_uring_create_queue(ring, qid); > >1103 if (!queue) > >1104 return err; > >1105 } > >1106 > >1107 /* > >1108 * The created queue above does not need to be destructed in > >1109 * case of entry errors below, will be done at ring destruction time. > >1110 */ > >1111 > >1112 ent = fuse_uring_create_ring_ent(cmd, queue); > >1113 if (IS_ERR(ent)) > >1114 return PTR_ERR(ent); > >1115 > >1116 fuse_uring_do_register(ent, cmd, issue_flags); > >1117 > >1118 return 0; > >1119 } > > > > > >Thanks > >Gang > > > Hi Gang, > > we are just slowly traveling back with my family from Germany to France - sorry for delayed responses. > > Each queue can have up to N ring entries - I think I put in max 65535. > > The code you are looking at will just add new entries to per queue lists. > > I don't know why higher fio io-depth results in lower performance. A possible reason is that /dev/fuse request get distributed to multiple threads, while fuse-io-uring might all go the same thread/ring. I had posted patches recently that add request balancing between queues. Io-depth > 1 case means asynchronous IO implementation, but from the code in the fuse_uring_commit_fetch() function, this function completes one IO request, then fetches the next request. This logic will block handling more IO requests before the last request is being processed in this thread. Can each thread accept more IO requests before the last request in the thread is being processed? Maybe this is the root cause for fio (iodepth>1) test case. Thanks Gang > > Cheers, > Bernd > > > > Cheers, > Bernd