[RFC PATCH v1 00/10] mm/iomap: add granular dirty and writeback accounting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This patchset is a stab at adding granular dirty and writeback stats
accounting for large folios.

The dirty page balancing logic uses these stats to determine things like
whether the ratelimit has been exceeded, the frequency with which pages need
to be written back, if dirtying should be throttled, etc. Currently for large
folios, if any byte in the folio is dirtied or written back, all the bytes in
the folio are accounted as such.

In particular, there are four places where dirty and writeback stats get
incremented and decremented as pages get dirtied and written back:
a) folio dirtying (filemap_dirty_folio() -> ... -> folio_account_dirtied())
   - increments NR_FILE_DIRTY, NR_ZONE_WRITE_PENDING, WB_RECLAIMABLE,
     current->nr_dirtied

b) writing back a mapping (writeback_iter() -> ... ->
folio_clear_dirty_for_io())
   - decrements NR_FILE_DIRTY, NR_ZONE_WRITE_PENDING, WB_RECLAIMABLE

c) starting writeback on a folio (folio_start_writeback())
   - increments WB_WRITEBACK, NR_WRITEBACK, NR_ZONE_WRITE_PENDING

d) ending writeback on a folio (folio_end_writeback())
   - decrements WB_WRITEBACK, NR_WRITEBACK, NR_ZONE_WRITE_PENDING

Patches 1 to 9 adds support for the 4 cases above to take in the number of
pages to be accounted, instead of accounting for the entire folio.

Patch 10 adds the iomap changes that uses these new APIs. This relies on the
iomap folio state bitmap to track which pages are dirty (so that we avoid
any double-counting). As such we can only do granular accounting if the
block size >= PAGE_SIZE.

This patchset was run through xfstests using fuse passthrough hp (with an
out-of-tree kernel patch enabling fuse large folios).

This is on top of commit d5212d81 ("Merge patch series "fuse: use iomap..."")
in Christian's vfs iomap tree, and on top of the patchset that removes
BDI_CAP_WRITEBACK_ACCT [1].

Benchmarks using a contrived test program that writes 2 GB in 128 MB chunks to
a fuse mount (with out-of-tree kernel patch that enables fuse large folios) and
then does 50k 50-byte random writes showed roughly a 10% performance improvement 
(0.625 seconds -> 0.547 seconds for the random writes).


Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20250707234606.2300149-1-joannelkoong@xxxxxxxxx/


Joanne Koong (10):
  mm: pass number of pages to __folio_start_writeback()
  mm: pass number of pages to __folio_end_writeback()
  mm: add folio_end_writeback_pages() helper
  mm: pass number of pages dirtied to __folio_mark_dirty()
  mm: add filemap_dirty_folio_pages() helper
  mm: add __folio_clear_dirty_for_io() helper
  mm: add no_stats_accounting bitfield to wbc
  mm: refactor clearing dirty stats into helper function
  mm: add clear_dirty_for_io_stats() helper
  iomap: add granular dirty and writeback accounting

 fs/buffer.c                |   6 +-
 fs/ext4/page-io.c          |   2 +-
 fs/iomap/buffered-io.c     | 136 ++++++++++++++++++++++++++++++++++---
 include/linux/page-flags.h |   6 +-
 include/linux/pagemap.h    |   4 +-
 include/linux/writeback.h  |   6 ++
 mm/filemap.c               |  25 ++++---
 mm/internal.h              |   2 +-
 mm/page-writeback.c        | 127 ++++++++++++++++++++++------------
 9 files changed, 246 insertions(+), 68 deletions(-)

-- 
2.47.3





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux