Re: temporary hung tasks on XFS since updating to 6.6.92

Christian Theune <ct@xxxxxxxxxxxxxxx> · Tue, 17 Jun 2025 07:44:54 +0200

> On 16. Jun 2025, at 14:15, Carlos Maiolino <cem@xxxxxxxxxx> wrote:
> 
> On Mon, Jun 16, 2025 at 12:09:21PM +0200, Christian Theune wrote:
> 
>> 
>> # xfs_info /tmp/
>> meta-data=/dev/vdb1              isize=512    agcount=8, agsize=229376 blks
>>         =                       sectsz=512   attr=2, projid32bit=1
>>         =                       crc=1        finobt=1, sparse=1, rmapbt=0
>>         =                       reflink=0    bigtime=0 inobtcount=0 nrext64=0
>>         =                       exchange=0
>> data     =                       bsize=4096   blocks=1833979, imaxpct=25
>>         =                       sunit=1024   swidth=1024 blks
>> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
>> log      =internal log           bsize=4096   blocks=2560, version=2
>>         =                       sectsz=512   sunit=8 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> This is worrisome. Your journal size is 10MiB, this can easily keep stalling IO
> waiting for log space to be freed, depending on the nature of the machine this
> can be easily triggered. I'm curious though how you made this FS, because 2560
> is below the minimal log size that xfsprogs allows since (/me goes look
> into git log) 2022, xfsprogs 5.15.
> 
> FWIW, one of the reasons the minimum journal log size has been increased is the
> latency/stalls that happens when waiting for free log space, which is exactly
> the symptom you've been seeing.
> 
> I'd suggest you to check the xfsprogs commit below if you want more details,
> but if this is one of the filesystems where you see the stalls, this might very
> well be the cause:

Interesting catch! I’ll double check this against our fleet and the affected machines and will dive into the traffic patterns of the specific underlying devices.

This filesystem is used for /tmp and is getting created fresh after a “cold boot” from our hypervisor. It could be that a number of VMs have only seen warm reboots for a couple of years but get kernel upgrades with warm reboots quite regularly. We’re in the process of changing the /tmp filesystem creation to happen fresh during initrd so that the VM internal xfsprogs will more closely match the guest kernel.

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick