Re: [PATCH 1/1] fuse: Use filemap_invalidate_pages()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 3, 2025 at 12:25 PM Matthew Wilcox (Oracle)
<willy@xxxxxxxxxxxxx> wrote:
>
> FUSE relies on invalidate_inode_pages2() / invalidate_inode_pages2_range()
> doing writeback by calling fuse_launder_folio().  While this works, it
> is inefficient as each page is written back and waited for individually.
> Far better to call filemap_invalidate_pages() which will do a bulk write
> first, then remove the page cache.

This logic makes sense to me but what about the case where writeback
errors out? With invalidate_inode_pages2() /
invalidate_inode_pages2_range(), the pages still get unmapped, but
with filemap_invalidate_pages(), this will return an error without
unmapping. AFAICT, I think we would still want the unmapping to be
done if there's an error in writeback.

I'm a bit confused about the difference between "each page is written
back and waited for individually" vs bulk write. AFAICT, with the bulk
write, which calls ->writepages(), in fuse this will still write each
page/folio individually through write_cache_pages() and AFAICT, each
page/folio is also waited for individually in
__filemap_fdatawait_range(). With doing the bulk write, I wonder if
this sometimes could be more inefficient than doing the writeback only
after it's been unmapped, eg if there's writes after the
filemap_invalidate_pages() writeback, we'd end up doing 2 writebacks.

>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> ---
>  fs/fuse/dax.c   | 16 ++++------------
>  fs/fuse/dir.c   | 12 +++++++-----
>  fs/fuse/file.c  | 16 +++++-----------
>  fs/fuse/inode.c | 17 +++++------------
>  4 files changed, 21 insertions(+), 40 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 2d817d7cab26..0151343d8393 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -718,7 +718,8 @@ static int fuse_create_open(struct mnt_idmap *idmap, struct inode *dir,
>                 if (fm->fc->atomic_o_trunc && trunc)
>                         truncate_pagecache(inode, 0);
>                 else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
> -                       invalidate_inode_pages2(inode->i_mapping);
> +                       filemap_invalidate_pages(inode->i_mapping, 0,
> +                                       OFFSET_MAX, false);

nit (here and below): in fuse, the line break spacing for args is
aligned to be right underneath the first arg

>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 2b04a142b493..eaa659c08132 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1566,7 +1567,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
>                 return -ENOMEM;
>
>         if (fopen_direct_io && fc->direct_io_allow_mmap) {
> -               res = filemap_write_and_wait_range(mapping, pos, pos + count - 1);
> +               res = filemap_invalidate_pages(mapping, pos, (pos + count - 1),

nit: don't need the parentheses around pos + count - 1

> +                               false);
>                 if (res) {
>                         fuse_io_free(ia);
>                         return res;
> @@ -1580,14 +1582,6 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
>                         inode_unlock(inode);
>         }
>
> -       if (fopen_direct_io && write) {
> -               res = invalidate_inode_pages2_range(mapping, idx_from, idx_to);
> -               if (res) {
> -                       fuse_io_free(ia);
> -                       return res;
> -               }
> -       }

I'm having trouble understanding why we can get rid of this logic
here. Is this because we now do filemap_invalidate_pages() for the "if
(fopen_direct_io && fc->direct_io_allow_mmap)" part above? AFAIS if
fc->direct_io_allow_mmap is false and we're handling a write, we still
need to call invalidate_inode_pages2_range()/filemap_invalidate_pages()
here, no?

> -
>         io->should_dirty = !write && user_backed_iter(iter);
>         while (count) {
>                 ssize_t nres;
> @@ -2358,7 +2352,7 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
>                 if ((vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_allow_mmap)
>                         return -ENODEV;
>
> -               invalidate_inode_pages2(file->f_mapping);
> +               filemap_invalidate_pages(file->f_mapping, 0, OFFSET_MAX, false);
>
>                 if (!(vma->vm_flags & VM_MAYSHARE)) {
>                         /* MAP_PRIVATE */
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index ecb869e895ab..905b192fa12e 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -559,8 +560,6 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid,
>  {
>         struct fuse_inode *fi;
>         struct inode *inode;
> -       pgoff_t pg_start;
> -       pgoff_t pg_end;
>
>         inode = fuse_ilookup(fc, nodeid, NULL);
>         if (!inode)
> @@ -573,15 +572,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid,
>
>         fuse_invalidate_attr(inode);
>         forget_all_cached_acls(inode);
> -       if (offset >= 0) {
> -               pg_start = offset >> PAGE_SHIFT;
> -               if (len <= 0)
> -                       pg_end = -1;

Should we preserve this len <= 0 logic? The len value comes from the
server. If they pass in -1, that means they want the entire file
invalidated.


Thanks,
Joanne

> -               else
> -                       pg_end = (offset + len - 1) >> PAGE_SHIFT;
> -               invalidate_inode_pages2_range(inode->i_mapping,
> -                                             pg_start, pg_end);
> -       }
> +       if (offset >= 0)
> +               filemap_invalidate_pages(inode->i_mapping, offset,
> +                               offset + len - 1, false);
>         iput(inode);
>         return 0;
>  }
> --
> 2.47.2
>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux