On 9/8/2025 2:33 PM, Yongjian Sun wrote: > From: Yongjian Sun <sunyongjian1@xxxxxxxxxx> > > After running a stress test combined with fault injection, > we performed fsck -a followed by fsck -fn on the filesystem > image. During the second pass, fsck -fn reported: > > Inode 131512, end of extent exceeds allowed value > (logical block 405, physical block 1180540, len 2) > > This inode was not in the orphan list. Analysis revealed the > following call chain that leads to the inconsistency: > > ext4_da_write_end() > //does not update i_disksize > ext4_punch_hole() > //truncate folio, keep size > ext4_page_mkwrite() > ext4_block_page_mkwrite() > ext4_block_write_begin() > ext4_get_block() > //insert written extent without update i_disksize > journal commit > echo 1 > /sys/block/xxx/device/delete > > da-write path updates i_size but does not update i_disksize. Then > ext4_punch_hole truncates the da-folio yet still leaves i_disksize > unchanged(in the ext4_update_disksize_before_punch function, the > condition offset + len < size is met). Then ext4_page_mkwrite sees > ext4_nonda_switch return 1 and takes the nodioread_nolock path, the > folio about to be written has just been punched out, and it’s offset > sits beyond the current i_disksize. This may result in a written > extent being inserted, but again does not update i_disksize. If the > journal gets committed and then the block device is yanked, we might > run into this. It should be noted that replacing ext4_punch_hole with > ext4_zero_range in the call sequence may also trigger this issue, as > neither will update i_disksize under these circumstances. > > To fix this, we can modify ext4_update_disksize_before_punch to always > increase i_disksize to offset + len. > > Signed-off-by: Yongjian Sun <sunyongjian1@xxxxxxxxxx> > --- > Changes in v2: > - The modification of i_disksize should be moved into ext4_update_disksize_before_punch, > rather than being done in ext4_page_mkwrite. > - Link to v1: https://lore.kernel.org/all/20250731140528.1554917-1-sunyongjian@xxxxxxxxxxxxxxx/ > --- > fs/ext4/inode.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 5b7a15db4953..2b1ed729a0f0 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4298,7 +4298,7 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, > loff_t size = i_size_read(inode); > > WARN_ON(!inode_is_locked(inode)); > - if (offset > size || offset + len < size) > + if (offset > size) > return 0; > > if (EXT4_I(inode)->i_disksize >= size) Hi, Yongjian! I think this check also needs to be updated; otherwise, the limitation will be too lenient. If the end position of the punch hole is <= i_disksize, we should also avoid updating the i_disksize (this is a more general use case). Besides, I'd suggested updating the comment of ext4_update_disksize_before_punch() together. Regards, Yi. > @@ -4307,7 +4307,7 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, > handle = ext4_journal_start(inode, EXT4_HT_MISC, 1); > if (IS_ERR(handle)) > return PTR_ERR(handle); > - ext4_update_i_disksize(inode, size); > + ext4_update_i_disksize(inode, min_t(loff_t, size, offset + len)); > ret = ext4_mark_inode_dirty(handle, inode); > ext4_journal_stop(handle); >