On Mon, Jun 9, 2025 at 6:34 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > On Mon, Jun 09, 2025 at 10:49:09AM -0700, Caleb Sander Mateos wrote: > > On Mon, Jun 9, 2025 at 5:34 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > > > On Fri, Jun 06, 2025 at 03:40:09PM -0600, Caleb Sander Mateos wrote: > > > > Currently, UBLK_IO_REGISTER_IO_BUF and UBLK_IO_UNREGISTER_IO_BUF are > > > > only permitted on the ublk_io's daemon task. But this restriction is > > > > unnecessary. ublk_register_io_buf() calls __ublk_check_and_get_req() to > > > > look up the request from the tagset and atomically take a reference on > > > > the request without accessing the ublk_io. ublk_unregister_io_buf() > > > > doesn't use the q_id or tag at all. > > > > > > > > So allow these opcodes even on tasks other than io->task. > > > > > > > > Handle UBLK_IO_UNREGISTER_IO_BUF before obtaining the ubq and io since > > > > the buffer index being unregistered is not necessarily related to the > > > > specified q_id and tag. > > > > > > > > Add a feature flag UBLK_F_BUF_REG_OFF_DAEMON that userspace can use to > > > > determine whether the kernel supports off-daemon buffer registration. > > > > > > > > Suggested-by: Ming Lei <ming.lei@xxxxxxxxxx> > > > > Signed-off-by: Caleb Sander Mateos <csander@xxxxxxxxxxxxxxx> > > > > --- > > > > drivers/block/ublk_drv.c | 37 +++++++++++++++++++++-------------- > > > > include/uapi/linux/ublk_cmd.h | 8 ++++++++ > > > > 2 files changed, 30 insertions(+), 15 deletions(-) > > > > > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > > > > index a8030818f74a..2084bbdd2cbb 100644 > > > > --- a/drivers/block/ublk_drv.c > > > > +++ b/drivers/block/ublk_drv.c > > > > @@ -68,11 +68,12 @@ > > > > | UBLK_F_ZONED \ > > > > | UBLK_F_USER_RECOVERY_FAIL_IO \ > > > > | UBLK_F_UPDATE_SIZE \ > > > > | UBLK_F_AUTO_BUF_REG \ > > > > | UBLK_F_QUIESCE \ > > > > - | UBLK_F_PER_IO_DAEMON) > > > > + | UBLK_F_PER_IO_DAEMON \ > > > > + | UBLK_F_BUF_REG_OFF_DAEMON) > > > > > > > > #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \ > > > > | UBLK_F_USER_RECOVERY_REISSUE \ > > > > | UBLK_F_USER_RECOVERY_FAIL_IO) > > > > > > > > @@ -2018,20 +2019,10 @@ static int ublk_register_io_buf(struct io_uring_cmd *cmd, > > > > } > > > > > > > > return 0; > > > > } > > > > > > > > -static int ublk_unregister_io_buf(struct io_uring_cmd *cmd, > > > > - const struct ublk_queue *ubq, > > > > - unsigned int index, unsigned int issue_flags) > > > > -{ > > > > - if (!ublk_support_zero_copy(ubq)) > > > > - return -EINVAL; > > > > - > > > > - return io_buffer_unregister_bvec(cmd, index, issue_flags); > > > > -} > > > > - > > > > static int ublk_fetch(struct io_uring_cmd *cmd, struct ublk_queue *ubq, > > > > struct ublk_io *io, __u64 buf_addr) > > > > { > > > > struct ublk_device *ub = ubq->dev; > > > > int ret = 0; > > > > @@ -2184,10 +2175,18 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, > > > > > > > > ret = ublk_check_cmd_op(cmd_op); > > > > if (ret) > > > > goto out; > > > > > > > > + /* > > > > + * io_buffer_unregister_bvec() doesn't access the ubq or io, > > > > + * so no need to validate the q_id, tag, or task > > > > + */ > > > > + if (_IOC_NR(cmd_op) == UBLK_IO_UNREGISTER_IO_BUF) > > > > + return io_buffer_unregister_bvec(cmd, ub_cmd->addr, > > > > + issue_flags); > > > > + > > > > ret = -EINVAL; > > > > if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) > > > > goto out; > > > > > > > > ubq = ublk_get_queue(ub, ub_cmd->q_id); > > > > @@ -2204,12 +2203,21 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, > > > > > > > > ublk_prep_cancel(cmd, issue_flags, ubq, tag); > > > > return -EIOCBQUEUED; > > > > } > > > > > > > > - if (READ_ONCE(io->task) != current) > > > > + if (READ_ONCE(io->task) != current) { > > > > + /* > > > > + * ublk_register_io_buf() accesses only the request, not io, > > > > + * so can be handled on any task > > > > + */ > > > > + if (_IOC_NR(cmd_op) == UBLK_IO_REGISTER_IO_BUF) > > > > + return ublk_register_io_buf(cmd, ubq, tag, ub_cmd->addr, > > > > + issue_flags); > > > > + > > > > goto out; > > > > + } > > > > > > > > /* there is pending io cmd, something must be wrong */ > > > > if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) { > > > > ret = -EBUSY; > > > > > > It also skips check on UBLK_IO_FLAG_OWNED_BY_SRV for both UBLK_IO_REGISTER_IO_BUF > > > and UBLK_IO_UNREGISTER_IO_BUF, :-( > > > > As we've discussed before[1], accessing io->flags on tasks other than > > the io's daemon would be a race condition. So I don't see how it's > > possible to keep this check for off-daemon > > UBLK_IO_(UN)REGISTER_IO_BUF. What value do you see in checking for > > UBLK_IO_FLAG_OWNED_BY_SRV? My understanding is that the > > refcount_inc_not_zero() already ensures the ublk I/O has been > > dispatched to the ublk server and either hasn't been completed or has > > other registered buffers still in use, which is pretty similar to > > UBLK_IO_FLAG_OWNED_BY_SRV. > > request can't be trusted any more for UBLK_F_BUF_REG_OFF_DAEMON because it > may be freed from elevator switch code or stopping dev code path, so we > can't rely on refcount_inc_not_zero(pdu(req)) only. I don't know much about how an elevator switch works, could you explain a bit more how the request can be freed? Is this not already a concern for user copy, where ublk_ch_read_iter() and ublk_ch_write_iter() can be issued on any thread? Those code paths also seem to be relying on the refcount_inc_not_zero() (plus the blk_mq_request_started(req) and req->tag checks) in __ublk_check_and_get_req() to prevent use-after-free. > > However the existing per-io-task and checking UBLK_IO_FLAG_OWNED_BY_SRV can > guarantee that the request is valid. > > > For UBLK_IO_UNREGISTER_IO_BUF, I don't think checking io->flags & > > UBLK_IO_FLAG_OWNED_BY_SRV is sufficient to prevent misuse, since > > there's no requirement that the buffer index (addr) being unregistered > > matches the q_id, tag, or even ublk device specified in the command. > > It should be fine to skip the check for UBLK_IO_UNREGISTER_IO_BUF because it > doesn't touch io & request. > > However I think it is still correct to validate ZERO_COPY flag for > UBLK_IO_UNREGISTER_IO_BUF cause ZERO_COPY is only allowed for privileged > userspace. Okay, I will check for UBLK_F_SUPPORT_ZERO_COPY on the ublk_device instead to avoid the ublk_queue lookup. Best, Caleb