On Wed, Jun 11, 2025 at 08:47:15AM -0700, Caleb Sander Mateos wrote: > On Mon, Jun 9, 2025 at 6:34 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > On Mon, Jun 09, 2025 at 10:49:09AM -0700, Caleb Sander Mateos wrote: > > > On Mon, Jun 9, 2025 at 5:34 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > > > > > On Fri, Jun 06, 2025 at 03:40:09PM -0600, Caleb Sander Mateos wrote: > > > > > Currently, UBLK_IO_REGISTER_IO_BUF and UBLK_IO_UNREGISTER_IO_BUF are > > > > > only permitted on the ublk_io's daemon task. But this restriction is > > > > > unnecessary. ublk_register_io_buf() calls __ublk_check_and_get_req() to > > > > > look up the request from the tagset and atomically take a reference on > > > > > the request without accessing the ublk_io. ublk_unregister_io_buf() > > > > > doesn't use the q_id or tag at all. > > > > > > > > > > So allow these opcodes even on tasks other than io->task. > > > > > > > > > > Handle UBLK_IO_UNREGISTER_IO_BUF before obtaining the ubq and io since > > > > > the buffer index being unregistered is not necessarily related to the > > > > > specified q_id and tag. > > > > > > > > > > Add a feature flag UBLK_F_BUF_REG_OFF_DAEMON that userspace can use to > > > > > determine whether the kernel supports off-daemon buffer registration. > > > > > > > > > > Suggested-by: Ming Lei <ming.lei@xxxxxxxxxx> > > > > > Signed-off-by: Caleb Sander Mateos <csander@xxxxxxxxxxxxxxx> > > > > > --- > > > > > drivers/block/ublk_drv.c | 37 +++++++++++++++++++++-------------- > > > > > include/uapi/linux/ublk_cmd.h | 8 ++++++++ > > > > > 2 files changed, 30 insertions(+), 15 deletions(-) > > > > > > > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > > > > > index a8030818f74a..2084bbdd2cbb 100644 > > > > > --- a/drivers/block/ublk_drv.c > > > > > +++ b/drivers/block/ublk_drv.c > > > > > @@ -68,11 +68,12 @@ > > > > > | UBLK_F_ZONED \ > > > > > | UBLK_F_USER_RECOVERY_FAIL_IO \ > > > > > | UBLK_F_UPDATE_SIZE \ > > > > > | UBLK_F_AUTO_BUF_REG \ > > > > > | UBLK_F_QUIESCE \ > > > > > - | UBLK_F_PER_IO_DAEMON) > > > > > + | UBLK_F_PER_IO_DAEMON \ > > > > > + | UBLK_F_BUF_REG_OFF_DAEMON) > > > > > > > > > > #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \ > > > > > | UBLK_F_USER_RECOVERY_REISSUE \ > > > > > | UBLK_F_USER_RECOVERY_FAIL_IO) > > > > > > > > > > @@ -2018,20 +2019,10 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd, > > > > > } > > > > > > > > > > return 0; > > > > > } > > > > > > > > > > -static int ublk_unregister_io_buf(struct io_uring_cmd *cmd, > > > > > - const struct ublk_queue *ubq, > > > > > - unsigned int index, unsigned int issue_flags) > > > > > -{ > > > > > - if (!ublk_support_zero_copy(ubq)) > > > > > - return -EINVAL; > > > > > - > > > > > - return io_buffer_unregister_bvec(cmd, index, issue_flags); > > > > > -} > > > > > - > > > > > static int ublk_fetch(struct io_uring_cmd *cmd, struct ublk_queue *ubq, > > > > > struct ublk_io *io, __u64 buf_addr) > > > > > { > > > > > struct ublk_device *ub = ubq->dev; > > > > > int ret = 0; > > > > > @@ -2184,10 +2175,18 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, > > > > > > > > > > ret = ublk_check_cmd_op(cmd_op); > > > > > if (ret) > > > > > goto out; > > > > > > > > > > + /* > > > > > + * io_buffer_unregister_bvec() doesn't access the ubq or io, > > > > > + * so no need to validate the q_id, tag, or task > > > > > + */ > > > > > + if (_IOC_NR(cmd_op) == UBLK_IO_UNREGISTER_IO_BUF) > > > > > + return io_buffer_unregister_bvec(cmd, ub_cmd->addr, > > > > > + issue_flags); > > > > > + > > > > > ret = -EINVAL; > > > > > if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) > > > > > goto out; > > > > > > > > > > ubq = ublk_get_queue(ub, ub_cmd->q_id); > > > > > @@ -2204,12 +2203,21 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, > > > > > > > > > > ublk_prep_cancel(cmd, issue_flags, ubq, tag); > > > > > return -EIOCBQUEUED; > > > > > } > > > > > > > > > > - if (READ_ONCE(io->task) != current) > > > > > + if (READ_ONCE(io->task) != current) { > > > > > + /* > > > > > + * ublk_register_io_buf() accesses only the request, not io, > > > > > + * so can be handled on any task > > > > > + */ > > > > > + if (_IOC_NR(cmd_op) == UBLK_IO_REGISTER_IO_BUF) > > > > > + return ublk_register_io_buf(cmd, ubq, tag, ub_cmd->addr, > > > > > + issue_flags); > > > > > + > > > > > goto out; > > > > > + } > > > > > > > > > > /* there is pending io cmd, something must be wrong */ > > > > > if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) { > > > > > ret = -EBUSY; > > > > > > > > It also skips check on UBLK_IO_FLAG_OWNED_BY_SRV for both UBLK_IO_REGISTER_IO_BUF > > > > and UBLK_IO_UNREGISTER_IO_BUF, :-( > > > > > > As we've discussed before[1], accessing io->flags on tasks other than > > > the io's daemon would be a race condition. So I don't see how it's > > > possible to keep this check for off-daemon > > > UBLK_IO_(UN)REGISTER_IO_BUF. What value do you see in checking for > > > UBLK_IO_FLAG_OWNED_BY_SRV? My understanding is that the > > > refcount_inc_not_zero() already ensures the ublk I/O has been > > > dispatched to the ublk server and either hasn't been completed or has > > > other registered buffers still in use, which is pretty similar to > > > UBLK_IO_FLAG_OWNED_BY_SRV. > > > > request can't be trusted any more for UBLK_F_BUF_REG_OFF_DAEMON because it > > may be freed from elevator switch code or stopping dev code path, so we > > can't rely on refcount_inc_not_zero(pdu(req)) only. > > I don't know much about how an elevator switch works, could you > explain a bit more how the request can be freed? Is this not already a If elevator is attached, any request is allocated from elevator, and it will be freed when switching the elevator off, so request retrieved from ub->tag_set.tags[tag] may become stale since elevator may be switched out anytime. > concern for user copy, where ublk_ch_read_iter() and > ublk_ch_write_iter() can be issued on any thread? Those code paths > also seem to be relying on the refcount_inc_not_zero() (plus the > blk_mq_request_started(req) and req->tag checks) in > __ublk_check_and_get_req() to prevent use-after-free. Looks there is the risk. It could be solved by grabbing queue usage counter for __ublk_check_and_get_req() or moving the reference counter to 'ublk_io'. Thanks, Ming