Re: [PATCH 5/8] ublk: document zero copy feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 25, 2025 at 7:30 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> On Tue, Mar 25, 2025 at 08:26:18AM -0700, Caleb Sander Mateos wrote:
> > On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> > >
> > > Add words to explain how zero copy feature works, and why it has to be
> > > trusted for handling IO read command.
> > >
> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > > ---
> > >  Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
> > >  1 file changed, 20 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> > > index 1e0e7358e14a..33efff25b54d 100644
> > > --- a/Documentation/block/ublk.rst
> > > +++ b/Documentation/block/ublk.rst
> > > @@ -309,18 +309,30 @@ with specified IO tag in the command data:
> > >    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
> > >    the server buffer (pages) read to the IO request pages.
> > >
> > > -Future development
> > > -==================
> > > -
> > >  Zero copy
> > >  ---------
> > >
> > > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A
> > > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
> > > -can't be remapped any more in kernel with existing mm interfaces. This can
> > > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
> > > -big requests (IO size >= 256 KB) may benefit a lot from zero copy.
> > > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides
> > > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
> >
> > nit: missing parentheses after io_buffer_unregister_bvec
>
> OK
>
> >
> > > +
> > > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
> > > +`io_buffer_register_bvec()` for ublk server to register client request
> > > +buffer into io_uring buffer table, then ublk server can submit io_uring
> > > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
> > > +calls `io_buffer_unregister_bvec` to unregister the buffer.
> >
> > Parentheses missing here too.
> > It might be worth adding a note that the registered buffer and any I/O
> > that uses it will hold a reference on the ublk request. For a ublk
> > server implementer, I think it's useful to know that the buffer needs
> > to be unregistered in order to complete the ublk request, and that the
> > zero-copy I/O won't corrupt any data even if it completes after the
> > buffer is unregistered.
>
> Good point, how about the following words?
>
> ```
> The ublk client IO request buffer is guaranteed to be live between calling
> `io_buffer_register_bvec()` and `io_buffer_unregister_bvec()`.
>
> And any io_uring operation which supports this kind of kernel buffer will
> grab one reference of the buffer until the operation is completed.
> ```

That's definitely better. I think it could be a bit more explicit that
the buffer needs to be unregistered in order for the ublk request to
complete.

>
> >
> > > +
> > > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
> > > +because it is ublk server's responsibility to make sure IO buffer filled
> > > +with data, and ublk server has to handle short read correctly by returning
> > > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
> > > +will be exposed to client application.
> >
> > This isn't specific to zero-copy, right? ublk devices configured with
> > UBLK_F_USER_COPY also need to initialize the full request buffer. I
>
> Right, but it is important for zero copy.

Sure. Maybe change "zero copy" to "zero copy or user copy"?

>
> > would also drop the "handle short read" part; ublk servers don't
> > necessarily issue read operations in the backend, so "short read" may
> > not be applicable.
>
> Here 'read' in 'short read' means ublk front READ command, not actual read
> done in ublk server. Maybe we can make it more accurate:
>
> ```
> ..., and ublk server has to return correct result to ublk driver when handling
> READ command, and the result has to match with how many bytes filled to the IO
> buffer. Otherwise, uninitialized kernel IO buffer will be exposed to client
> application.
> ```

Thanks for clarifying. That looks great!

Best,
Caleb





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux