Re: [RFC PATCH v1 10/10] iomap: add granular dirty and writeback accounting

Joanne Koong <joannelkoong@xxxxxxxxx> · Fri, 29 Aug 2025 16:02:54 -0700

On Wed, Aug 27, 2025 at 5:08 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> On Fri, Aug 15, 2025 at 11:38 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> >
> > On Thu, Aug 14, 2025 at 9:38 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jul 31, 2025 at 05:21:31PM -0700, Joanne Koong wrote:
> > > > Add granular dirty and writeback accounting for large folios. These
> > > > stats are used by the mm layer for dirty balancing and throttling.
> > > > Having granular dirty and writeback accounting helps prevent
> > > > over-aggressive balancing and throttling.
> > > >
> > > > There are 4 places in iomap this commit affects:
> > > > a) filemap dirtying, which now calls filemap_dirty_folio_pages()
> > > > b) writeback_iter with setting the wbc->no_stats_accounting bit and
> > > > calling clear_dirty_for_io_stats()
> > > > c) starting writeback, which now calls __folio_start_writeback()
> > > > d) ending writeback, which now calls folio_end_writeback_pages()
> > > >
> > > > This relies on using the ifs->state dirty bitmap to track dirty pages in
> > > > the folio. As such, this can only be utilized on filesystems where the
> > > > block size >= PAGE_SIZE.
> > >
> > > Apologies for my slow responses this month. :)
> >
> > No worries at all, thanks for looking at this.
> > >
> > > I wonder, does this cause an observable change in the writeback
> > > accounting and throttling behavior for non-fuse filesystems like XFS
> > > that use large folios?  I *think* this does actually reduce throttling
> > > for XFS, but it might not be so noticeable because the limits are much
> > > more generous outside of fuse?
> >
> > I haven't run any benchmarks on non-fuse filesystems yet but that's
> > what I would expect too. Will run some benchmarks to see!
>
> I ran some benchmarks on xfs for the contrived test case I used for
> fuse (eg writing 2 GB in 128 MB chunks and then doing 50k 50-byte
> random writes) and I don't see any noticeable performance difference.
>
> I re-tested it on fuse but this time with strictlimiting disabled and
> didn't notice any difference on that either, probably because with
> strictlimiting off we don't run into the upper limit in that test so
> there's no extra throttling that needs to be mitigated.
>
> It's unclear to me how often (if at all?) real workloads run up
> against their dirty/writeback limits.
>

I benchmarked it again today but this time with manually setting
/proc/sys/vm/dirty_bytes to 20% of 16 GiB and
/proc/sys/vm/dirty_background_bytes to 10% of 16 GB and testing it on
a more intense workload (the original test scenario but on 10+
threads) and and I see results now on xfs, around 3 seconds (with some
variability of taking 0.3 seconds to 5 seconds sometimes) for writes
prior to this patchset vs. a pretty consistent 0.14 seconds with this
patchset. I ran the test scenario setup a few times but it'd be great
if someone else could also run it to verify it shows up on their
system too.

I set up xfs by following the instructions in the xfstests readme:
    # xfs_io -f -c "falloc 0 10g" test.img
    # xfs_io -f -c "falloc 0 10g" scratch.img
    # mkfs.xfs test.img
    # losetup /dev/loop0 ./test.img
    # losetup /dev/loop1 ./scratch.img
    # mkdir -p /mnt/test && mount /dev/loop0 /mnt/test

and then ran:
sudo sysctl -w vm.dirty_bytes=$((3276 * 1024 * 1024)) # roughly 20% of 16GB
sudo sysctl -w vm.dirty_background_bytes=$((1638*1024*1024)) # roughly
10% of 16 GB

and then ran this test program (ai-generated) https://pastebin.com/CbcwTXjq

I'll send out an updated v2 of this series.

Thanks,
Joanne