On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > Add words to explain how zero copy feature works, and why it has to be > trusted for handling IO read command. > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > Documentation/block/ublk.rst | 28 ++++++++++++++++++++-------- > 1 file changed, 20 insertions(+), 8 deletions(-) > > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst > index 1e0e7358e14a..33efff25b54d 100644 > --- a/Documentation/block/ublk.rst > +++ b/Documentation/block/ublk.rst > @@ -309,18 +309,30 @@ with specified IO tag in the command data: > ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy > the server buffer (pages) read to the IO request pages. > > -Future development > -================== > - > Zero copy > --------- > > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace > -can't be remapped any more in kernel with existing mm interfaces. This can > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that > -big requests (IO size >= 256 KB) may benefit a lot from zero copy. > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`. nit: missing parentheses after io_buffer_unregister_bvec > + > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call > +`io_buffer_register_bvec()` for ublk server to register client request > +buffer into io_uring buffer table, then ublk server can submit io_uring > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF` > +calls `io_buffer_unregister_bvec` to unregister the buffer. Parentheses missing here too. It might be worth adding a note that the registered buffer and any I/O that uses it will hold a reference on the ublk request. For a ublk server implementer, I think it's useful to know that the buffer needs to be unregistered in order to complete the ublk request, and that the zero-copy I/O won't corrupt any data even if it completes after the buffer is unregistered. > + > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted, > +because it is ublk server's responsibility to make sure IO buffer filled > +with data, and ublk server has to handle short read correctly by returning > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer > +will be exposed to client application. This isn't specific to zero-copy, right? ublk devices configured with UBLK_F_USER_COPY also need to initialize the full request buffer. I would also drop the "handle short read" part; ublk servers don't necessarily issue read operations in the backend, so "short read" may not be applicable. Best, Caleb > + > +ublk server needs to align the parameter of `struct ublk_param_dma_align` > +with backend for zero copy to work correctly. > > +For reaching best IO performance, ublk server should align its segment > +parameter of `struct ublk_param_segment` with backend for avoiding > +unnecessary IO split. > > References > ========== > -- > 2.47.0 >