On 2025-09-11 10:54, Yongjian Sun wrote: > From: Yongjian Sun <sunyongjian1@xxxxxxxxxx> > > After running a stress test combined with fault injection, > we performed fsck -a followed by fsck -fn on the filesystem > image. During the second pass, fsck -fn reported: > > Inode 131512, end of extent exceeds allowed value > (logical block 405, physical block 1180540, len 2) > > This inode was not in the orphan list. Analysis revealed the > following call chain that leads to the inconsistency: > > ext4_da_write_end() > //does not update i_disksize > ext4_punch_hole() > //truncate folio, keep size > ext4_page_mkwrite() > ext4_block_page_mkwrite() > ext4_block_write_begin() > ext4_get_block() > //insert written extent without update i_disksize > journal commit > echo 1 > /sys/block/xxx/device/delete > > da-write path updates i_size but does not update i_disksize. Then > ext4_punch_hole truncates the da-folio yet still leaves i_disksize > unchanged(in the ext4_update_disksize_before_punch function, the > condition offset + len < size is met). Then ext4_page_mkwrite sees > ext4_nonda_switch return 1 and takes the nodioread_nolock path, the > folio about to be written has just been punched out, and it’s offset > sits beyond the current i_disksize. This may result in a written > extent being inserted, but again does not update i_disksize. If the > journal gets committed and then the block device is yanked, we might > run into this. It should be noted that replacing ext4_punch_hole with > ext4_zero_range in the call sequence may also trigger this issue, as > neither will update i_disksize under these circumstances. > > To fix this, we can modify ext4_update_disksize_before_punch to > increase i_disksize to min(offset + len) when both i_size and ^^^ min(i_size, offset + len) > (offset + len) are greater than i_disksize. > > Signed-off-by: Yongjian Sun <sunyongjian1@xxxxxxxxxx> Otherwise looks good. Feel free to add: Reviewed-by: Baokun Li <libaokun1@xxxxxxxxxx> > --- > Changes in v4: > - Make the comments simpler and clearer. > - Link to v3: https://lore.kernel.org/all/20250910042516.3947590-1-sunyongjian@xxxxxxxxxxxxxxx/ > Changes in v3: > - Add a condition to avoid increasing i_disksize and include some comments. > - Link to v2: https://lore.kernel.org/all/20250908063355.3149491-1-sunyongjian@xxxxxxxxxxxxxxx/ > Changes in v2: > - The modification of i_disksize should be moved into ext4_update_disksize_before_punch, > rather than being done in ext4_page_mkwrite. > - Link to v1: https://lore.kernel.org/all/20250731140528.1554917-1-sunyongjian@xxxxxxxxxxxxxxx/ > --- > fs/ext4/inode.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 5b7a15db4953..f82f7fb84e17 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4287,7 +4287,11 @@ int ext4_can_truncate(struct inode *inode) > * We have to make sure i_disksize gets properly updated before we truncate > * page cache due to hole punching or zero range. Otherwise i_disksize update > * can get lost as it may have been postponed to submission of writeback but > - * that will never happen after we truncate page cache. > + * that will never happen if we remove the folio containing i_size from the > + * page cache. Also if we punch hole within i_size but above i_disksize, > + * following ext4_page_mkwrite() may mistakenly allocate written blocks over > + * the hole and thus introduce allocated blocks beyond i_disksize which is > + * not allowed (e2fsck would complain in case of crash). > */ > int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, > loff_t len) > @@ -4298,9 +4302,11 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, > loff_t size = i_size_read(inode); > > WARN_ON(!inode_is_locked(inode)); > - if (offset > size || offset + len < size) > + if (offset > size) > return 0; > > + if (offset + len < size) > + size = offset + len; > if (EXT4_I(inode)->i_disksize >= size) > return 0; >