Re: ext4 v6.15-rc2 baseline

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 21 Apr 2025 09:47:59 -0700

On Mon, Apr 21, 2025 at 11:29:52AM -0500, Theodore Ts'o wrote:
> On Mon, Apr 21, 2025 at 08:54:33AM -0700, Darrick J. Wong wrote:
> > 
> > I might be wading in deeper than I know, but it seems to me that
> > after a crash recovery it's not great to see 64k files with no blocks
> > allocated to them at all.
> 
> Well, what ext4 in no dioread_nolock mode will do is to allocate
> blocks marked as unitializationed, and then write the data blocks, and
> then change them to be marked as initialized.  So it's not that there
> are no blocks allocated at all; but that there are blocks allocated
> but attempts to read from the file will return all zeros.

But that's not what I see -- on my system, I get files with i_size ==
65536, but no mappings at all:

--- /run/fstests/bin/tests/generic/044.out      2025-04-17 14:52:53.521658441 -0700
+++ /var/tmp/fstests/generic/044.out.bad        2025-04-21 08:46:15.328757541 -0700
@@ -1 +1,95 @@
 QA output created by 044
+corrupt file /opt/906 - non-zero size but no extents
+corrupt file /opt/907 - non-zero size but no extents

# mount /opt/
# ls /opt/906
-rw------- 1 root root 65536 Apr 21 08:45 /opt/906
# filefrag -v !$
filefrag -v /opt/906
Filesystem type is: ef53
File size of /opt/906 is 65536 (16 blocks of 4096 bytes)
/opt/906: 0 extents found

...unless ext4 is removing those unwritten blocks during recovery?

> This is non-ideal, but my main concern is a performance issue, not a
> correctness one.  We're modifying the metadata blocks twice, and while
> most of the time the two modifications happen within a single
> transaction (so the user won't actually see the zero blocks after the
> crash _most_ of the time), the extra journal handles means extra CPU
> and extra jbd2 spinlocks getting taken and released.
> 
> So it's on my todo list to fix, in my copious spare time.....
> 
> > (I don't care about the others whining about _exclude_fs-- if
> > you make the design decision that the current ext4 behavior is
> > good enough, then the test cannot ever be satisfied so let's
> > capture that in the test > itself, not in everyone's scattered
> > exclusion lists.)
> 
> Fair enough, I can try, and see if we get people attempting to NACK
> the changes this time around.  Support beating back the whiners would
> be appreciated.

Ok, I'll chime in whenever I see patches. :)

> I can also see if Luis's LBS changes might it easier to deal with the
> bigalloc test bugs.  It will mean exposing the concept of cluster
> allocation size (as distinct from block size) to the core xfstests
> infrastructure, and again, we can see if people try to NACK the
> changes.  This will require a bit more work, however as this is a big
> difference between XFS's LBS feature and ext4's bigalloc feature.

That shouldn't be a problem; _xfs_get_file_block_size has returned the
allocation unit size for XFS files for quite some time, despite being
badly named.

--D

> 
> 						- Ted