On Thu, May 01, 2025 at 06:31:03PM -0700, Caleb Sander Mateos wrote: > On Wed, Apr 30, 2025 at 8:34 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > On Mon, Apr 28, 2025 at 05:43:12PM -0700, Caleb Sander Mateos wrote: > > > On Mon, Apr 28, 2025 at 2:44 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > > > > > Extend io_buffer_register_bvec() and io_buffer_unregister_bvec() for > > > > supporting to register/unregister bvec buffer to specified io_uring, > > > > which FD is usually passed from userspace. > > > > > > > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > > > --- > > > > include/linux/io_uring/cmd.h | 4 ++ > > > > io_uring/rsrc.c | 83 +++++++++++++++++++++++++++--------- > > > > 2 files changed, 67 insertions(+), 20 deletions(-) > > > > > > > > diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h > > > > index 78fa336a284b..7516fe5cd606 100644 > > > > --- a/include/linux/io_uring/cmd.h > > > > +++ b/include/linux/io_uring/cmd.h > > > > @@ -25,6 +25,10 @@ struct io_uring_cmd_data { > > > > > > > > struct io_buf_data { > > > > unsigned short index; > > > > + bool has_fd; > > > > + bool registered_fd; > > > > + > > > > + int ring_fd; > > > > struct request *rq; > > > > void (*release)(void *); > > > > }; > > > > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c > > > > index 5f8ab130a573..701dd33fecf7 100644 > > > > --- a/io_uring/rsrc.c > > > > +++ b/io_uring/rsrc.c > > > > @@ -969,21 +969,6 @@ static int __io_buffer_register_bvec(struct io_ring_ctx *ctx, > > > > return 0; > > > > } > > > > > > > > -int io_buffer_register_bvec(struct io_uring_cmd *cmd, > > > > - struct io_buf_data *buf, > > > > - unsigned int issue_flags) > > > > -{ > > > > - struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; > > > > - int ret; > > > > - > > > > - io_ring_submit_lock(ctx, issue_flags); > > > > - ret = __io_buffer_register_bvec(ctx, buf); > > > > - io_ring_submit_unlock(ctx, issue_flags); > > > > - > > > > - return ret; > > > > -} > > > > -EXPORT_SYMBOL_GPL(io_buffer_register_bvec); > > > > - > > > > static int __io_buffer_unregister_bvec(struct io_ring_ctx *ctx, > > > > struct io_buf_data *buf) > > > > { > > > > @@ -1006,19 +991,77 @@ static int __io_buffer_unregister_bvec(struct io_ring_ctx *ctx, > > > > return 0; > > > > } > > > > > > > > -int io_buffer_unregister_bvec(struct io_uring_cmd *cmd, > > > > - struct io_buf_data *buf, > > > > - unsigned int issue_flags) > > > > +static inline int do_reg_unreg_bvec(struct io_ring_ctx *ctx, > > > > + struct io_buf_data *buf, > > > > + unsigned int issue_flags, > > > > + bool reg) > > > > { > > > > - struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; > > > > int ret; > > > > > > > > io_ring_submit_lock(ctx, issue_flags); > > > > - ret = __io_buffer_unregister_bvec(ctx, buf); > > > > + if (reg) > > > > + ret = __io_buffer_register_bvec(ctx, buf); > > > > + else > > > > + ret = __io_buffer_unregister_bvec(ctx, buf); > > > > > > It feels like unifying __io_buffer_register_bvec() and > > > __io_buffer_unregister_bvec() would belong better in the prior patch > > > that changes their signatures. > > > > Can you share how to do above in previous patch? > > I was thinking you could define do_reg_unreg_bvec() in the previous > patch. It's a logical step once you've extracted out all the > differences between io_buffer_register_bvec() and > io_buffer_unregister_bvec() into the helpers > __io_buffer_register_bvec() and __io_buffer_unregister_bvec(). But > either way is fine. 'has_fd' and 'ring_fd' fields isn't added yet, the defined do_reg_unreg_bvec() could be quite simple, looks no big difference, I can do that... > > > > > > > > > > io_ring_submit_unlock(ctx, issue_flags); > > > > > > > > return ret; > > > > } > > > > + > > > > +static int io_buffer_reg_unreg_bvec(struct io_ring_ctx *ctx, > > > > + struct io_buf_data *buf, > > > > + unsigned int issue_flags, > > > > + bool reg) > > > > +{ > > > > + struct io_ring_ctx *remote_ctx = ctx; > > > > + struct file *file = NULL; > > > > + int ret; > > > > + > > > > + if (buf->has_fd) { > > > > + file = io_uring_register_get_file(buf->ring_fd, buf->registered_fd); > > > > + if (IS_ERR(file)) > > > > + return PTR_ERR(file); > > > > > > It would be good to avoid the overhead of this lookup and > > > reference-counting in the I/O path. Would it be possible to move this > > > lookup to when UBLK_IO_FETCH_REQ (and UBLK_IO_COMMIT_AND_FETCH_REQ, if > > > it specifies a different ring_fd) is submitted? I guess that might > > > require storing an extra io_ring_ctx pointer in struct ublk_io. > > > > Let's start from the flexible way & simple implementation. > > > > Any optimization & improvement can be done as follow-up. > > Sure, we can start with this as-is. But I suspect the extra > reference-counting here will significantly decrease the benefit of the > auto-register register feature. The reference-counting should only be needed for registering buffer to external ring, which may have been slow because of the cross-ring thing... Maybe we can start automatic buffer register for ubq_daemon context only, meantime allow to register buffer from external io_uring by adding per-io spin_lock, which may help the per-io task Uday is working on too. And the interface still allow to support automatic buffer register to external io_uring since `ublk_auto_buf_reg` includes 'flags' field, we can enable it in future when efficient implementation is figured out. What do you think of this approach? > > > > > Each command may register buffer to different io_uring context, > > it can't be done in UBLK_IO_FETCH_REQ stage, because new IO with same > > tag may register buffer to new io_uring context. > > Right, if UBLK_IO_COMMIT_AND_FETCH_REQ specifies a different io_uring > fd, then we'd have to look it up anew. But it seems likely that all > UBLK_IO_COMMIT_AND_FETCH_REQs for a single tag will specify the same > io_uring (certainly that's how our ublk server works). And in that > case, the I/O could just reuse the io_uring context that was looked up > for the prior UBLK_IO_(COMMIT_AND_)FETCH_REQ. It is a special case, and one follow-up feature flag can help to optimize this case only. Thanks, Ming