Re: flakey assert failures in xfs/538 in for-next

Chandan Babu R <chandanbabu@xxxxxxxxxx> · Fri, 18 Jul 2025 17:49:29 +0530

On Wed, Jul 16, 2025 at 06:02:34 PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 16, 2025 at 08:38:12AM -0700, Darrick J. Wong wrote:
>> I've seen this happen maybe once or twice, I think the problem is that
>> the symlink xfs_bmapi_write fails to allocate enough blocks to store the
>> symlink target, doesn't notice, and then the actual target write runs
>> out of blocks before it runs out of pathlen and kaboom.
>> 
>> Probably the right answer is to ENOSPC if we can't allocate blocks, but
>> I guess we did reserve free space so perhaps we just keep bmapi'ing
>> until we get all the space we need?
>> 
>> The weird part is that XFS_SYMLINK_MAPS should be large enough to fit
>> all the target we need, so ... I don't know if bmapi_write is returning
>> fewer than 3 nmaps because it hit ENOSPC or what?
>> 
>> (and because I can't reproduce it reliably, I have not investigated
>> further :()

I think you are right. Most likely we were able to successfully allocate less
than XFS_SYMLINK_MAPS (i.e. 3) and the next allocation only found free extents
whose length were larger than 1 FSB.

The test fills 90% of the filesystem and then punches holes at every other
block used by each of the "filler" files. So the filesystem could have some
"free extents" whose size is larger than 1 FSB. These larger free extents
allowed the block reservation to succeed.

During the test run, we could have consumed all the 1 FSB sized free extents
and hence a later allocation attempt can fail since we were trying to allocate
only 1 FSB sized extent.

>
> I guess the recent cleanups are not too blame then, or just slightly
> changed the timing for me to have a streak to frequently hit it.
>
> xfs/538 is the alloc minlen test that injects getting back the minlen
> or failing allocations if minlen > 1.  I guess that interacts badly
> somehow with the rather uncommon multi-map allocations.  The only
> other one is xfs_da_grow_inode_int, and that only for directories
> with a larger directory block size, and as a fallback when the contig
> allocations fails.  It might be worth crafting a test doing a lot
> of symlinking while doing that error injetion to trigger it more
> reliably.

I have modifed xfs/538 to perform only write* and symlink
operations. Unfortunately, the test hasn't failed yet despite running for 27
iterations. I will let it run during the weekend.

-- 
Chandan