Based on feedback received during LSFMM 2025 discussions, we have implemented a version that parallelizes writeback using N writeback threads (N=4 in this version). We create four delay work instances (dwork) per bdi_writeback. The scheduling of writeback work is done in a round-robin manner across these workers. We continue to use a **single set of b_* inode lists and a unified dirty throttling mechanism**, while achieving parallelism through multiple workers processing the same list concurrently. This avoids any inode list sharding or dirty throttling duplication at this stage, keeping the design simple. In our internal evaluation on a PMEM device, we observed a **120% improvement in IOPS**, demonstrating clear benefits from enabling parallel submission even with the current global structures. We look forward to feedback and ideas for further improvements. Kundan Kumar (1): writeback: enable parallel writeback using multiple work items fs/fs-writeback.c | 49 ++++++++++++++++++++++++++------ include/linux/backing-dev-defs.h | 10 ++++++- mm/backing-dev.c | 16 ++++++++--- 3 files changed, 62 insertions(+), 13 deletions(-) -- 2.25.1