On Fri, May 09, 2025 at 11:12:46PM +0530, Ritesh Harjani wrote: > "Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx> writes: > > > This is v3 of multi-fsblock atomic write support using bigalloc. This has > > started looking into much better shape now. The major chunk of the design > > changes has been kept in Patch-4 & 5. > > > > This series can now be carefully reviewed, as all the error handling related > > code paths should be properly taken care of. > > > > We spotted that multi-fsblock changes might need to force a journal > commit if there were mixed mappings in the underlying region e.g. say WUWUWUW... > > The issue arises when, during block allocation, the unwritten ranges are > first zeroed out, followed by the unwritten-to-written extent > conversion. This conversion is part of a journaled metadata transaction > that has not yet been committed, as the transaction is still running. > If an iomap write then modifies the data on those multi-fsblocks and a > sudden power loss occurs before the transaction commits, the > unwritten-to-written conversion will not be replayed during journal > recovery. As a result, we end up with new data written over mapped > blocks, while the alternate unwritten blocks will read zeroes. This > could cause a torn write behavior for atomic writes. > > So we were thinking we might need something like this. Hopefully this > should still be ok, as mixed mapping case mostly is a non-performance > critical path. Thoughts? I agree the journal has to be written out before the atomic write is sent to the device. --D > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 2642e1ef128f..59b59d609976 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -3517,7 +3517,8 @@ static int ext4_map_blocks_atomic_write_slow(handle_t *handle, > * underlying short holes/unwritten extents within the requested range. > */ > static int ext4_map_blocks_atomic_write(handle_t *handle, struct inode *inode, > - struct ext4_map_blocks *map, int m_flags) > + struct ext4_map_blocks *map, int m_flags, > + bool *force_commit) > { > ext4_lblk_t m_lblk = map->m_lblk; > unsigned int m_len = map->m_len; > @@ -3537,6 +3538,11 @@ static int ext4_map_blocks_atomic_write(handle_t *handle, struct inode *inode, > map->m_len = m_len; > map->m_flags = 0; > > + /* > + * slow path means we have mixed mapping, that means we will need > + * to force txn commit. > + */ > + *force_commit = true; > return ext4_map_blocks_atomic_write_slow(handle, inode, map); > out: > return ret; > @@ -3548,6 +3554,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, > handle_t *handle; > u8 blkbits = inode->i_blkbits; > int ret, dio_credits, m_flags = 0, retries = 0; > + bool force_commit = false; > > /* > * Trim the mapping request to the maximum value that we can map at > @@ -3610,7 +3617,8 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, > m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT; > > if (flags & IOMAP_ATOMIC) > - ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags); > + ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags, > + &force_commit); > else > ret = ext4_map_blocks(handle, inode, map, m_flags); > > @@ -3626,6 +3634,9 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, > if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) > goto retry; > > + if (ret > 0 && force_commit) > + ext4_force_commit(inode->i_sb); > + > return ret; > } > > > -ritesh >