On 4/10/25 11:07 PM, Joanne Koong wrote: > On Wed, Apr 9, 2025 at 7:12 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: >> >> >> >> On 4/10/25 7:47 AM, Joanne Koong wrote: >>> On Tue, Apr 8, 2025 at 7:43 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: >>>> >>>> Hi Joanne, >>>> >>>> On 4/5/25 2:14 AM, Joanne Koong wrote: >>>>> In the current FUSE writeback design (see commit 3be5a52b30aa >>>>> ("fuse: support writable mmap")), a temp page is allocated for every >>>>> dirty page to be written back, the contents of the dirty page are copied over >>>>> to the temp page, and the temp page gets handed to the server to write back. >>>>> >>>>> This is done so that writeback may be immediately cleared on the dirty page, >>>>> and this in turn is done in order to mitigate the following deadlock scenario >>>>> that may arise if reclaim waits on writeback on the dirty page to complete: >>>>> * single-threaded FUSE server is in the middle of handling a request >>>>> that needs a memory allocation >>>>> * memory allocation triggers direct reclaim >>>>> * direct reclaim waits on a folio under writeback >>>>> * the FUSE server can't write back the folio since it's stuck in >>>>> direct reclaim >>>>> >>>>> With a recent change that added AS_WRITEBACK_INDETERMINATE and mitigates >>>>> the situations described above, FUSE writeback does not need to use >>>>> temp pages if it sets AS_WRITEBACK_INDETERMINATE on its inode mappings. >>>>> >>>>> This commit sets AS_WRITEBACK_INDETERMINATE on the inode mappings >>>>> and removes the temporary pages + extra copying and the internal rb >>>>> tree. >>>>> >>>>> fio benchmarks -- >>>>> (using averages observed from 10 runs, throwing away outliers) >>>>> >>>>> Setup: >>>>> sudo mount -t tmpfs -o size=30G tmpfs ~/tmp_mount >>>>> ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 -o source=~/tmp_mount ~/fuse_mount >>>>> >>>>> fio --name=writeback --ioengine=sync --rw=write --bs={1k,4k,1M} --size=2G >>>>> --numjobs=2 --ramp_time=30 --group_reporting=1 --directory=/root/fuse_mount >>>>> >>>>> bs = 1k 4k 1M >>>>> Before 351 MiB/s 1818 MiB/s 1851 MiB/s >>>>> After 341 MiB/s 2246 MiB/s 2685 MiB/s >>>>> % diff -3% 23% 45% >>>>> >>>>> Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> >>>>> Reviewed-by: Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> >>>>> Acked-by: Miklos Szeredi <mszeredi@xxxxxxxxxx> >>>> >>> >>> Hi Jingbo, >>> >>> Thanks for sharing your analysis for this. >>> >>>> Overall this patch LGTM. >>>> >>>> Apart from that, IMO the fi->writectr and fi->queued_writes mechanism is >>>> also unneeded then, at least the DIRECT IO routine (i.e. >>> >>> I took a look at fi->writectr and fi->queued_writes and my >>> understanding is that we do still need this. For example, for >>> truncates (I'm looking at fuse_do_setattr()), I think we still need to >>> prevent concurrent writeback or else the setattr request and the >>> writeback request could race which would result in a mismatch between >>> the file's reported size and the actual data written to disk. >> >> I haven't looked into the truncate routine yet. I will see it later. >> >>> >>>> fuse_direct_io()) doesn't need fuse_sync_writes() anymore. That is >>>> because after removing the temp page, the DIRECT IO routine has already >>>> been waiting for all inflight WRITE requests, see >>>> >>>> # DIRECT read >>>> generic_file_read_iter >>>> kiocb_write_and_wait >>>> filemap_write_and_wait_range >>> >>> Where do you see generic_file_read_iter() getting called for direct io reads? >> >> # DIRECT read >> fuse_file_read_iter >> fuse_cache_read_iter >> generic_file_read_iter >> kiocb_write_and_wait >> filemap_write_and_wait_range >> a_ops->direct_IO(),i.e. fuse_direct_IO() >> > > Oh I see, I thought files opened with O_DIRECT automatically call the > .direct_IO handler for reads/writes but you're right, it first goes > through .read_iter / .write_iter handlers, and the .direct_IO handler > only gets invoked through generic_file_read_iter() / > generic_file_direct_write() in mm/filemap.c > > There's two paths for direct io in FUSE: > a) fuse server sets fi->direct_io = true when a file is opened, which > will set the FOPEN_DIRECT_IO bit in ff->open_flags on the kernel side > b) fuse server doesn't set fi->direct_io = true, but the client opens > the file with O_DIRECT > > We only go through the stack trace you listed above for the b) case. > For the a) case, we'll hit > > if (ff->open_flags & FOPEN_DIRECT_IO) > return fuse_direct_read_iter(iocb, to); > > and > > if (ff->open_flags & FOPEN_DIRECT_IO) > return fuse_direct_write_iter(iocb, from); > > which will invoke fuse_direct_IO() / fuse_direct_io() without going > through the kiocb_write_and_wait() -> filemap_write_and_wait_range() / > kiocb_invalidate_pages() -> filemap_write_and_wait_range() you listed > above. > > So for the a) case I think we'd still need the fuse_sync_writes() in > case there's still pending writeback. > > Do you agree with this analysis or am I missing something here? Yeah, that's true. But instead of calling fuse_sync_writes(), we can call filemap_wait_range() or something similar here. >> filp_close() >> filp_flush() >> filp->f_op->flush() >> fuse_flush() >> write_inode_now >> writeback_single_inode(WB_SYNC_ALL) >> do_writepages >> # flush dirty page >> filemap_fdatawait >> # wait for WRITE completion > > Nice. I missed that write_inode_now() will invoke filemap_fdatawait(). > This seems pretty straightforward. I'll remove the fuse_sync_writes() > call in fuse_flush() when I send out v8. > > The direct io one above is less straight-forward. I won't add that to > v8 but that can be done in a separate future patch when we figure that > out. Thanks for keep working on this. Appreciated. -- Thanks, Jingbo