Re: [PATCH v5 09/11] fuse: support large folios for readahead

Joanne Koong <joannelkoong@xxxxxxxxx> · Mon, 5 May 2025 15:05:51 -0700

On Mon, May 5, 2025 at 8:23 AM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:
>
>
>
> On 5/5/25 16:40, Darrick J. Wong wrote:
> > On Sun, May 04, 2025 at 09:13:44PM +0200, Bernd Schubert wrote:
> >>
> >>
> >> On 4/26/25 02:08, Joanne Koong wrote:
> >>> Add support for folios larger than one page size for readahead.
> >>>
> >>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> >>> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
> >>> ---
> >>>  fs/fuse/file.c | 36 +++++++++++++++++++++++++++---------
> >>>  1 file changed, 27 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> >>> index 1d38486fae50..9a31f2a516b9 100644
> >>> --- a/fs/fuse/file.c
> >>> +++ b/fs/fuse/file.c
> >>> @@ -876,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> >>>     fuse_io_free(ia);
> >>>  }
> >>>
> >>> -static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
> >>> +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
> >>> +                           unsigned int count)
> >>>  {
> >>>     struct fuse_file *ff = file->private_data;
> >>>     struct fuse_mount *fm = ff->fm;
> >>>     struct fuse_args_pages *ap = &ia->ap;
> >>>     loff_t pos = folio_pos(ap->folios[0]);
> >>> -   /* Currently, all folios in FUSE are one page */
> >>> -   size_t count = ap->num_folios << PAGE_SHIFT;
> >>>     ssize_t res;
> >>>     int err;
> >>>
> >>> @@ -918,6 +917,7 @@ static void fuse_readahead(struct readahead_control *rac)
> >>>     struct inode *inode = rac->mapping->host;
> >>>     struct fuse_conn *fc = get_fuse_conn(inode);
> >>>     unsigned int max_pages, nr_pages;
> >>> +   struct folio *folio = NULL;
> >>>
> >>>     if (fuse_is_bad(inode))
> >>>             return;
> >>> @@ -939,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac)
> >>>     while (nr_pages) {
> >>>             struct fuse_io_args *ia;
> >>>             struct fuse_args_pages *ap;
> >>> -           struct folio *folio;
> >>>             unsigned cur_pages = min(max_pages, nr_pages);
> >>> +           unsigned int pages = 0;
> >>>
> >>>             if (fc->num_background >= fc->congestion_threshold &&
> >>>                 rac->ra->async_size >= readahead_count(rac))
> >>> @@ -952,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac)
> >>>
> >>>             ia = fuse_io_alloc(NULL, cur_pages);
> >>>             if (!ia)
> >>> -                   return;
> >>> +                   break;
> >>>             ap = &ia->ap;
> >>>
> >>> -           while (ap->num_folios < cur_pages) {
> >>> +           while (pages < cur_pages) {
> >>> +                   unsigned int folio_pages;
> >>> +
> >>>                     /*
> >>>                      * This returns a folio with a ref held on it.
> >>>                      * The ref needs to be held until the request is
> >>> @@ -963,13 +965,29 @@ static void fuse_readahead(struct readahead_control *rac)
> >>>                      * fuse_try_move_page()) drops the ref after it's
> >>>                      * replaced in the page cache.
> >>>                      */
> >>> -                   folio = __readahead_folio(rac);
> >>> +                   if (!folio)
> >>> +                           folio =  __readahead_folio(rac);
> >>> +
> >>> +                   folio_pages = folio_nr_pages(folio);
> >>> +                   if (folio_pages > cur_pages - pages)
> >>> +                           break;
> >>> +
> >>
> >> Hmm, so let's assume this would be a 2MB folio, but fc->max_pages is
> >> limited to 1MB - we not do read-ahead anymore?
> >
> > It's hard for me to say without seeing the actual enablement patches,
> > but filesystems are supposed to call mapping_set_folio_order_range to
> > constrain the sizes of the folios that the pagecache requests.

Yes, exactly. For enabling fuse, I envision adding something like this
in fuse_init_file_inode():

max_pages = min(min(fc->max_write, fc->max_read) >> PAGE_SHIFT, fc->max_pages);
max_order = ilog2(max_pages);
mapping_set_folio_order_range(inode->i_mapping, 0, max_order);

>
> I think large folios do not get enabled ye in this series. Could we have
> a comment here that folio size is supposed to be restricted to
> fc->max_pages? And wouldn't that be a case for unlikely()?

Large folios are not enabled yet in this series. The cover letter
explains a bit why,

"This does not yet switch fuse to using large folios. Using large folios in
fuse is dependent on adding granular dirty-page tracking. This will be done
in a separate patchset that will have fuse use iomap [1]. There also needs
to be a followup (also part of future work) for having dirty page balancing
not tank performance for unprivileged servers where bdi limits lead to subpar
throttling [1], before enabling large folios for fuse."

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a38pv3OgFZRfdTiDMXuPWuBgN8KY47XfOsYHj=N2wxAg@xxxxxxxxxxxxxx/#t

I'll add a comment about this in v6.

Thanks,
Joanne
>
>
> Thanks,
> Bernd