On Thu, Aug 14, 2025 at 9:38 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > On Thu, Jul 31, 2025 at 05:21:31PM -0700, Joanne Koong wrote: > > Add granular dirty and writeback accounting for large folios. These > > stats are used by the mm layer for dirty balancing and throttling. > > Having granular dirty and writeback accounting helps prevent > > over-aggressive balancing and throttling. > > > > There are 4 places in iomap this commit affects: > > a) filemap dirtying, which now calls filemap_dirty_folio_pages() > > b) writeback_iter with setting the wbc->no_stats_accounting bit and > > calling clear_dirty_for_io_stats() > > c) starting writeback, which now calls __folio_start_writeback() > > d) ending writeback, which now calls folio_end_writeback_pages() > > > > This relies on using the ifs->state dirty bitmap to track dirty pages in > > the folio. As such, this can only be utilized on filesystems where the > > block size >= PAGE_SIZE. > > Apologies for my slow responses this month. :) No worries at all, thanks for looking at this. > > I wonder, does this cause an observable change in the writeback > accounting and throttling behavior for non-fuse filesystems like XFS > that use large folios? I *think* this does actually reduce throttling > for XFS, but it might not be so noticeable because the limits are much > more generous outside of fuse? I haven't run any benchmarks on non-fuse filesystems yet but that's what I would expect too. Will run some benchmarks to see! > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > --- > > fs/iomap/buffered-io.c | 136 ++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 128 insertions(+), 8 deletions(-) > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > index bcc6e0e5334e..626c3c8399cc 100644 > > --- a/fs/iomap/buffered-io.c > > +++ b/fs/iomap/buffered-io.c > > @@ -20,6 +20,8 @@ struct iomap_folio_state { > > spinlock_t state_lock; > > unsigned int read_bytes_pending; > > atomic_t write_bytes_pending; > > + /* number of pages being currently written back */ > > + unsigned nr_pages_writeback; > > > > /* > > * Each block has two bits in this bitmap: > > @@ -81,6 +83,25 @@ static inline bool ifs_block_is_dirty(struct folio *folio, > > return test_bit(block + blks_per_folio, ifs->state); > > } > > > > +static unsigned ifs_count_dirty_pages(struct folio *folio) > > +{ > > + struct iomap_folio_state *ifs = folio->private; > > + struct inode *inode = folio->mapping->host; > > + unsigned block_size = 1 << inode->i_blkbits; > > + unsigned start_blk = 0; > > + unsigned end_blk = min((unsigned)(i_size_read(inode) >> inode->i_blkbits), > > + i_blocks_per_folio(inode, folio)); > > + unsigned nblks = 0; > > + > > + while (start_blk < end_blk) { > > + if (ifs_block_is_dirty(folio, ifs, start_blk)) > > + nblks++; > > + start_blk++; > > + } > > Hmm, isn't this bitmap_weight(ifs->state, blks_per_folio) ? > > Ohh wait no, the dirty bitmap doesn't start on a byte boundary because > the format of the bitmap is [uptodate bits][dirty bits]. > > Maybe those two should be reversed, because I bet the dirty state gets > changed a lot more over the lifetime of a folio than the uptodate bits. I think there's the find_next_bit() helper (which Christoph also pointed out) that could probably be used here instead. Or at least that's how I see a lot of the driver code doing it for their bitmaps.