Re: [PATCH 3/4] writeback: Avoid excessively long inode switching times

Tejun Heo <tj@xxxxxxxxxx> · Tue, 9 Sep 2025 06:56:59 -1000

On Tue, Sep 09, 2025 at 04:44:04PM +0200, Jan Kara wrote:
> With lazytime mount option enabled we can be switching many dirty inodes
> on cgroup exit to the parent cgroup. The numbers observed in practice
> when systemd slice of a large cron job exits can easily reach hundreds
> of thousands or millions. The logic in inode_do_switch_wbs() which sorts
> the inode into appropriate place in b_dirty list of the target wb
> however has linear complexity in the number of dirty inodes thus overall
> time complexity of switching all the inodes is quadratic leading to
> workers being pegged for hours consuming 100% of the CPU and switching
> inodes to the parent wb.
> 
> Simple reproducer of the issue:
>   FILES=10000
>   # Filesystem mounted with lazytime mount option
>   MNT=/mnt/
>   echo "Creating files and switching timestamps"
>   for (( j = 0; j < 50; j ++ )); do
>       mkdir $MNT/dir$j
>       for (( i = 0; i < $FILES; i++ )); do
>           echo "foo" >$MNT/dir$j/file$i
>       done
>       touch -a -t 202501010000 $MNT/dir$j/file*
>   done
>   wait
>   echo "Syncing and flushing"
>   sync
>   echo 3 >/proc/sys/vm/drop_caches
> 
>   echo "Reading all files from a cgroup"
>   mkdir /sys/fs/cgroup/unified/mycg1 || exit
>   echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit
>   for (( j = 0; j < 50; j ++ )); do
>       cat /mnt/dir$j/file* >/dev/null &
>   done
>   wait
>   echo "Switching wbs"
>   # Now rmdir the cgroup after the script exits
> 
> We need to maintain b_dirty list ordering to keep writeback happy so
> instead of sorting inode into appropriate place just append it at the
> end of the list and clobber dirtied_time_when. This may result in inode
> writeback starting later after cgroup switch however cgroup switches are
> rare so it shouldn't matter much. Since the cgroup had write access to
> the inode, there are no practical concerns of the possible DoS issues.
> 
> Signed-off-by: Jan Kara <jack@xxxxxxx>

Acked-by: Tejun Heo <tj@xxxxxxxxxx>

Thanks.

-- 
tejun