Re: [PATCH 5/8] ublk: document zero copy feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 25, 2025 at 08:26:18AM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> >
> > Add words to explain how zero copy feature works, and why it has to be
> > trusted for handling IO read command.
> >
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> >  Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
> >  1 file changed, 20 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> > index 1e0e7358e14a..33efff25b54d 100644
> > --- a/Documentation/block/ublk.rst
> > +++ b/Documentation/block/ublk.rst
> > @@ -309,18 +309,30 @@ with specified IO tag in the command data:
> >    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
> >    the server buffer (pages) read to the IO request pages.
> >
> > -Future development
> > -==================
> > -
> >  Zero copy
> >  ---------
> >
> > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A
> > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
> > -can't be remapped any more in kernel with existing mm interfaces. This can
> > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
> > -big requests (IO size >= 256 KB) may benefit a lot from zero copy.
> > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides
> > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
> 
> nit: missing parentheses after io_buffer_unregister_bvec

OK

> 
> > +
> > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
> > +`io_buffer_register_bvec()` for ublk server to register client request
> > +buffer into io_uring buffer table, then ublk server can submit io_uring
> > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
> > +calls `io_buffer_unregister_bvec` to unregister the buffer.
> 
> Parentheses missing here too.
> It might be worth adding a note that the registered buffer and any I/O
> that uses it will hold a reference on the ublk request. For a ublk
> server implementer, I think it's useful to know that the buffer needs
> to be unregistered in order to complete the ublk request, and that the
> zero-copy I/O won't corrupt any data even if it completes after the
> buffer is unregistered.

Good point, how about the following words?

```
The ublk client IO request buffer is guaranteed to be live between calling
`io_buffer_register_bvec()` and `io_buffer_unregister_bvec()`.

And any io_uring operation which supports this kind of kernel buffer will
grab one reference of the buffer until the operation is completed.
```

> 
> > +
> > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
> > +because it is ublk server's responsibility to make sure IO buffer filled
> > +with data, and ublk server has to handle short read correctly by returning
> > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
> > +will be exposed to client application.
> 
> This isn't specific to zero-copy, right? ublk devices configured with
> UBLK_F_USER_COPY also need to initialize the full request buffer. I

Right, but it is important for zero copy.

> would also drop the "handle short read" part; ublk servers don't
> necessarily issue read operations in the backend, so "short read" may
> not be applicable.

Here 'read' in 'short read' means ublk front READ command, not actual read
done in ublk server. Maybe we can make it more accurate:

```
..., and ublk server has to return correct result to ublk driver when handling
READ command, and the result has to match with how many bytes filled to the IO
buffer. Otherwise, uninitialized kernel IO buffer will be exposed to client
application.
```

Thanks,
Ming





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux