On Mon, May 5, 2025 at 8:23 AM Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: > > > > On 5/5/25 16:40, Darrick J. Wong wrote: > > On Sun, May 04, 2025 at 09:13:44PM +0200, Bernd Schubert wrote: > >> > >> > >> On 4/26/25 02:08, Joanne Koong wrote: > >>> Add support for folios larger than one page size for readahead. > >>> > >>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > >>> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > >>> --- > >>> fs/fuse/file.c | 36 +++++++++++++++++++++++++++--------- > >>> 1 file changed, 27 insertions(+), 9 deletions(-) > >>> > >>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c > >>> index 1d38486fae50..9a31f2a516b9 100644 > >>> --- a/fs/fuse/file.c > >>> +++ b/fs/fuse/file.c > >>> @@ -876,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args, > >>> fuse_io_free(ia); > >>> } > >>> > >>> -static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file) > >>> +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file, > >>> + unsigned int count) > >>> { > >>> struct fuse_file *ff = file->private_data; > >>> struct fuse_mount *fm = ff->fm; > >>> struct fuse_args_pages *ap = &ia->ap; > >>> loff_t pos = folio_pos(ap->folios[0]); > >>> - /* Currently, all folios in FUSE are one page */ > >>> - size_t count = ap->num_folios << PAGE_SHIFT; > >>> ssize_t res; > >>> int err; > >>> > >>> @@ -918,6 +917,7 @@ static void fuse_readahead(struct readahead_control *rac) > >>> struct inode *inode = rac->mapping->host; > >>> struct fuse_conn *fc = get_fuse_conn(inode); > >>> unsigned int max_pages, nr_pages; > >>> + struct folio *folio = NULL; > >>> > >>> if (fuse_is_bad(inode)) > >>> return; > >>> @@ -939,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac) > >>> while (nr_pages) { > >>> struct fuse_io_args *ia; > >>> struct fuse_args_pages *ap; > >>> - struct folio *folio; > >>> unsigned cur_pages = min(max_pages, nr_pages); > >>> + unsigned int pages = 0; > >>> > >>> if (fc->num_background >= fc->congestion_threshold && > >>> rac->ra->async_size >= readahead_count(rac)) > >>> @@ -952,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac) > >>> > >>> ia = fuse_io_alloc(NULL, cur_pages); > >>> if (!ia) > >>> - return; > >>> + break; > >>> ap = &ia->ap; > >>> > >>> - while (ap->num_folios < cur_pages) { > >>> + while (pages < cur_pages) { > >>> + unsigned int folio_pages; > >>> + > >>> /* > >>> * This returns a folio with a ref held on it. > >>> * The ref needs to be held until the request is > >>> @@ -963,13 +965,29 @@ static void fuse_readahead(struct readahead_control *rac) > >>> * fuse_try_move_page()) drops the ref after it's > >>> * replaced in the page cache. > >>> */ > >>> - folio = __readahead_folio(rac); > >>> + if (!folio) > >>> + folio = __readahead_folio(rac); > >>> + > >>> + folio_pages = folio_nr_pages(folio); > >>> + if (folio_pages > cur_pages - pages) > >>> + break; > >>> + > >> > >> Hmm, so let's assume this would be a 2MB folio, but fc->max_pages is > >> limited to 1MB - we not do read-ahead anymore? > > > > It's hard for me to say without seeing the actual enablement patches, > > but filesystems are supposed to call mapping_set_folio_order_range to > > constrain the sizes of the folios that the pagecache requests. Yes, exactly. For enabling fuse, I envision adding something like this in fuse_init_file_inode(): max_pages = min(min(fc->max_write, fc->max_read) >> PAGE_SHIFT, fc->max_pages); max_order = ilog2(max_pages); mapping_set_folio_order_range(inode->i_mapping, 0, max_order); > > I think large folios do not get enabled ye in this series. Could we have > a comment here that folio size is supposed to be restricted to > fc->max_pages? And wouldn't that be a case for unlikely()? Large folios are not enabled yet in this series. The cover letter explains a bit why, "This does not yet switch fuse to using large folios. Using large folios in fuse is dependent on adding granular dirty-page tracking. This will be done in a separate patchset that will have fuse use iomap [1]. There also needs to be a followup (also part of future work) for having dirty page balancing not tank performance for unprivileged servers where bdi limits lead to subpar throttling [1], before enabling large folios for fuse." [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a38pv3OgFZRfdTiDMXuPWuBgN8KY47XfOsYHj=N2wxAg@xxxxxxxxxxxxxx/#t I'll add a comment about this in v6. Thanks, Joanne > > > Thanks, > Bernd