Re: [RFC 0/1] writeback: add sysfs to config the number of writeback contexts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/29/2025 4:59 PM, Kundan Kumar wrote:
On 8/25/2025 5:59 PM, wangyufei wrote:
Hi everyone,

We've been interested in this patch about parallelizing writeback [1]
and have been following its discussion and development. Our testing in
several application scenarios on mobile devices has shown significant
performance improvements.

Hi,

Thanks for sharing this work.

Could you clarify a few details about your test setup?

- Which filesystem did you run these experiments on?
- What were the specifics of the workload (number of threads, block size,
    I/O size)?
- If you are using fio, can you please share the fio command.
- How much RAM was available on the test system?
- Can you share the performance improvement numbers you observed?

That would help in understanding the impact of parallel writeback?
Hi Kundan,

Most of the time we tested this patch on mobile devices. The test platform setup is shown as below:

- filesystem:F2FS

- system config:
Number of CPUs = 8
System RAM = 11G

- workload & fio:We used the same fio command as mentioned in your patch
fio command line:
fio --directory=/mnt --name=test --bs=4k --iodepth=1024 --rw=randwrite
--ioengine=io_uring --time_based=1 -runtime=60 --numjobs=8 --size=450M
--direct=0  --eta-interval=1 --eta-newline=1 --group_reporting

- Performance gains:
Base F2FS                         :973 MiB/s
Parallel Writeback F2FS   :1237 MiB/s (+27%)

I made similar modifications to dynamically configure the number of
writeback threads in this experimental patch. Refer to patches 14 and 15:
https://lore.kernel.org/all/20250807045706.2848-1-kundan.kumar@xxxxxxxxxxx/
The key difference is that this change also enables a reduction in the
number of writeback threads.
Thanks for sharing the patch. I have a few questions:
- The current approach freezes the filesystem and reallocates all writeback_ctx structures. Could this introduce latency? In some cases, I think the existing bdi_writeback_ctx structures could be reused instead. - Are there other use cases for dynamic thread tuning besides initialization and testing?
- What methods are used to test the stability of this function?

Finally, I would like to ask if there are any problems to be solved or optimization directions worth discussing for the parallelizing filesystem writeback?


Thanks,

yufei





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux