Re: [PATCH v1 12/16] iomap: add iomap_read_ops for read and readahead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 3, 2025 at 2:08 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> On Fri, Aug 29, 2025 at 04:56:23PM -0700, Joanne Koong wrote:
> > Add a "struct iomap_read_ops" that contains a read_folio_range()
> > callback that callers can provide as a custom handler for reading in a
> > folio range, if the caller does not wish to issue bio read requests
> > (which otherwise is the default behavior). read_folio_range() may read
> > the request asynchronously or synchronously. The caller is responsible
> > for calling iomap_start_folio_read()/iomap_finish_folio_read() when
> > reading the folio range.
> >
> > This makes it so that non-block based filesystems may use iomap for
> > reads.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> > ---
> >  .../filesystems/iomap/operations.rst          | 19 +++++
> >  block/fops.c                                  |  4 +-
> >  fs/erofs/data.c                               |  4 +-
> >  fs/gfs2/aops.c                                |  4 +-
> >  fs/iomap/buffered-io.c                        | 79 +++++++++++++------
> >  fs/xfs/xfs_aops.c                             |  4 +-
> >  fs/zonefs/file.c                              |  4 +-
> >  include/linux/iomap.h                         | 21 ++++-
> >  8 files changed, 105 insertions(+), 34 deletions(-)
> >
> > diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> > index 067ed8e14ef3..215053f0779d 100644
> > --- a/Documentation/filesystems/iomap/operations.rst
> > +++ b/Documentation/filesystems/iomap/operations.rst
> > @@ -57,6 +57,25 @@ The following address space operations can be wrapped easily:
> >   * ``bmap``
> >   * ``swap_activate``
> >
> > +``struct iomap_read_ops``
> > +--------------------------
> > +
> > +.. code-block:: c
> > +
> > + struct iomap_read_ops {
> > +     int (*read_folio_range)(const struct iomap_iter *iter,
> > +                        struct folio *folio, loff_t pos, size_t len);
> > + };
> > +
> > +iomap calls these functions:
> > +
> > +  - ``read_folio_range``: Called to read in the range (read does not need to
> > +    be synchronous). The caller is responsible for calling
>
> Er... does this perform the read synchronously or asynchronously?
> Does the implementer need to know?  How does iomap figure out what
> happened?

It is up to the implementer whether they do the read synchronously or
asynchronously. Most filesystems I think will issue readahead
asynchronously but for fuse, readahead needs to be synchronous unless
the server explicitly opts into async read, otherwise the read
requests can be sent in non-sequential order which some servers may
not be able to handle.

I don't think it matters from the iomap side whether the read is
synchronous or asynchronous, or even if the read has completed by the
time iomap_readahead() completes. I think it only needs to make sure
the reads get kicked off.

>
> My guess is that iomap_finish_folio_read unlocks the folio, and anyone
> who cared is by this point already waiting on the folio lock?  So it's
> actually not important if the ->read_folio_range implementation runs
> async or not; the key is that the folio stays locked until we've
> completed the read IO?

This is my understanding too. The unlocking and
waking-any-waiting-threads stuff happens in folio_end_read().

>
> > +    iomap_start_folio_read() and iomap_finish_folio_read() when reading the
> > +    folio range. This should be done even if an error is encountered during
> > +    the read. If this function is not provided by the caller, then iomap
> > +    will default to issuing asynchronous bio read requests.
>
> What is this function supposed to return?  The usual 0 or negative
> errno?

Good point, I will add that info in.

>
> > +
> >  ``struct iomap_write_ops``
> >  --------------------------
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 5d153c6b16b6..06f2c857de64 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -335,8 +335,8 @@ void iomap_start_folio_read(struct folio *folio, size_t len)
> >  }
> >  EXPORT_SYMBOL_GPL(iomap_start_folio_read);
> >
> > -void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> > -             int error)
> > +static void __iomap_finish_folio_read(struct folio *folio, size_t off,
> > +             size_t len, int error, bool update_bitmap)
> >  {
> >       struct iomap_folio_state *ifs = folio->private;
> >       bool uptodate = !error;
> > @@ -346,7 +346,7 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> >               unsigned long flags;
> >
> >               spin_lock_irqsave(&ifs->state_lock, flags);
> > -             if (!error)
> > +             if (!error && update_bitmap)
> >                       uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
>
> When do we /not/ want to set uptodate after a successful read?  I guess
> iomap_read_folio_range_async goes through the bio machinery and sets
> uptodate via iomap_finish_folio_read()?  Does the ->read_folio_range
> function need to set the uptodate bits itself?  Possibly by calling
> iomap_finish_folio_read as well?
>

Maybe this is hacky but I'm not sure if there's a better way to do it,
but I essentially need a "bias" for read requests if they are async so
that we don't prematurely end the read on the folio if the first few
async requests are completed before the next ones are issued. For
example if there's a large folio and a readahead request for 16 pages
in that folio, if doing readahead on the 16 pages is split into 4
async requests and the first request is sent off and then completed
before we send off the second request, then the
"iomap_finish_folio_read()" call on the first request will set
ifs->read_bytes_pending to now 0 and call folio_end_read().

For that reason, I added a "__iomap_start_folio_read(folio, ..., 1);"
before any async requests are sent and a
"__iomap_finish_folio_read(folio, ..., 1);" after all the requests
have been sent. (This is the same thing the iomap writeback logic does
for their async requests). Those calls though should not update the
uptodate bitmap, they are used only to prevent the premature
folio_end_read().

Yes, ->read_folio_range() is responsible for calling
iomap_finish_folio_read() (as well as iomap_start_folio_read()); this
call will update the uptodate bitmap.


Thanks,
Joanne

> >               ifs->read_bytes_pending -= len;
> >               finished = !ifs->read_bytes_pending;
> > @@ -356,6 +356,12 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> >       if (finished)
> >               folio_end_read(folio, uptodate);
> >  }
> > +
> > @@ -471,7 +478,14 @@ static int iomap_readfolio_iter(struct iomap_iter *iter,
> >       }
> >
> >       /* zero post-eof blocks as the page may be mapped */
> > -     ifs_alloc(iter->inode, folio, iter->flags);
> > +     ifs = ifs_alloc(iter->inode, folio, iter->flags);
> > +
> > +     /*
> > +      * Add a bias to ifs->read_bytes_pending so that a read is ended only
> > +      * after all the ranges have been read in.
> > +      */
> > +     if (ifs)
> > +             iomap_start_folio_read(folio, 1);
> >
> >       length = min_t(loff_t, length,
> >                       folio_size(folio) - offset_in_folio(folio, pos));
> > @@ -479,35 +493,53 @@ static int iomap_readfolio_iter(struct iomap_iter *iter,
> >               iomap_adjust_read_range(iter->inode, folio, &pos,
> >                               length, &poff, &plen);
> >               count = pos - iter->pos + plen;
> > -             if (plen == 0)
> > -                     return iomap_iter_advance(iter, &count);
> > +             if (plen == 0) {
> > +                     ret = iomap_iter_advance(iter, &count);
> > +                     break;
> > +             }
> >
> >               if (iomap_block_needs_zeroing(iter, pos)) {
> >                       folio_zero_range(folio, poff, plen);
> >                       iomap_set_range_uptodate(folio, poff, plen);
> >               } else {
> > -                     iomap_read_folio_range_async(iter, ctx, pos, plen);
> > +                     ctx->folio_unlocked = true;
> > +                     if (read_ops && read_ops->read_folio_range) {
> > +                             ret = read_ops->read_folio_range(iter, folio, pos, plen);
> > +                             if (ret)
> > +                                     break;
> > +                     } else {
> > +                             iomap_read_folio_range_async(iter, ctx, pos, plen);
> > +                     }
> >               }
> >
> >               length -= count;
> >               ret = iomap_iter_advance(iter, &count);
> >               if (ret)
> > -                     return ret;
> > +                     break;
> >               pos = iter->pos;
> >       }
> > -     return 0;
> > +
> > +     if (ifs) {
> > +             __iomap_finish_folio_read(folio, 0, 1, ret, false);
> > +             ctx->folio_unlocked = true;
>
> Er.... so we subtract 1 from read_bytes_pending?  I thought the
> ->read_folio_range ioend was supposed to decrease that?
>
> --D
>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux