On Wed, Jul 30, 2025 at 07:18:48AM -0700, Christoph Hellwig wrote: > On Tue, Jul 29, 2025 at 01:08:46PM -0700, Darrick J. Wong wrote: > > The pwrite failure comes from the aio-dio-eof-race.c program because the > > filesystem ran out of space. There are no speculative posteof > > preallocations on a zoned filesystem, so let's skip this test on those > > setups. > > Did it run out of space because it is overwriting and we need a new > allocation (I've not actually seen this fail in my zoned testing, > that's why I'm asking)? If so it really should be using the new > _require_inplace_writes Filipe just sent to the list. I took a deeper look into what's going on here, and I think the intermittent ENOSPC failures are caused by: 1. First we write to every byte in the 256M zoned rt device so that 0x55 gets written to the disk. 2. Then we delete the huge file we created. 3. The zoned garbage collector doesn't run. 4. aio-dio-eof-race starts up and initiates an aiodio at pos 0. 5. xfs_file_dio_write_zoned calls xfs_zoned_write_space_reserve 6. xfs_zoned_space_reserve tries to decrement 64k from XC_FREE_RTEXTENTS but gets ENOSPC. 7. We didn't pass XFS_ZR_GREEDY, so we error out. If I make the test sleep until I see zonegc do some work before starting aio-dio-eof-race, the problem goes away. I'm not sure what the proper solution is, but maybe it's adding a wake_up to the gc process and waiting for it? diff --git a/fs/xfs/xfs_zone_space_resv.c b/fs/xfs/xfs_zone_space_resv.c index 1313c55b8cbe51..dfd0384f8e3931 100644 --- a/fs/xfs/xfs_zone_space_resv.c +++ b/fs/xfs/xfs_zone_space_resv.c @@ -223,15 +223,25 @@ xfs_zoned_space_reserve( unsigned int flags, struct xfs_zone_alloc_ctx *ac) { + int tries = 5; int error; ASSERT(ac->reserved_blocks == 0); ASSERT(ac->open_zone == NULL); +again: error = xfs_dec_freecounter(mp, XC_FREE_RTEXTENTS, count_fsb, flags & XFS_ZR_RESERVED); if (error == -ENOSPC && (flags & XFS_ZR_GREEDY) && count_fsb > 1) error = xfs_zoned_reserve_extents_greedy(mp, &count_fsb, flags); + if (error == -ENOSPC && !(flags & XFS_ZR_GREEDY) && --tries) { + struct xfs_zone_info *zi = mp->m_zone_info; + + xfs_err(mp, "OI ZONEGC %d", tries); + wake_up_process(zi->zi_gc_thread); + udelay(100); + goto again; + } if (error) return error; This fugly patch makes the test failures go away. On my system we rarely go below "OI ZONEGC 2" after 100x runs. > If now we need to figure out what this depends on instead of adding > random xfs-specific hacks to common code. <nod> I saw the "this tests speculative posteof preallocations" and thought that didn't sound like an interesting test on a zoned fs. ;) --D