Hi Zhang Yi, Greetings! I used Syzkaller and found that there is WARNING in xfs_bmapi_convert_one_delalloc in linux v6.15-rc2. After bisection and the first bad commit is: " 5ce5674187c3 xfs: convert delayed extents to unwritten when zeroing post eof blocks " All detailed into can be found at: https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc Syzkaller repro code: https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/repro.c Syzkaller repro syscall steps: https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/repro.prog Syzkaller report: https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/repro.report Kconfig(make olddefconfig): https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/kconfig_origin Bisect info: https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/bisect_info.log bzImage: https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/250413_025108_xfs_bmapi_convert_one_delalloc/bzImage_0af2f6be1b4281385b618cb86ad946eded089ac8 Issue dmesg: https://github.com/laifryiee/syzkaller_logs/blob/main/250413_025108_xfs_bmapi_convert_one_delalloc/0af2f6be1b4281385b618cb86ad946eded089ac8_dmesg.log " [ 21.631771] ------------[ cut here ]------------ [ 21.632173] WARNING: CPU: 1 PID: 760 at fs/xfs/libxfs/xfs_bmap.c:4401 xfs_bmapi_convert_one_delalloc+0x520/0xca0 [ 21.633001] Modules linked in: [ 21.633292] CPU: 1 UID: 0 PID: 760 Comm: repro Tainted: G W 6.15.0-rc2-v6.15-rc2 #1 PREEMPT(voluntary) [ 21.634151] Tainted: [W]=WARN [ 21.634401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 21.635476] RIP: 0010:xfs_bmapi_convert_one_delalloc+0x520/0xca0 [ 21.635970] Code: fc ff ff e8 32 76 eb fe 44 89 fe bf 02 00 00 00 41 bc f5 ff ff ff e8 cf 70 eb fe 41 83 ff 02 0f 84 03 ff ff ff e8 10 76 eb fe <0f> 0b e9 f7 fe ff ff e8 04 76 eb fe 48 c7 c0 c0 ca ff 89 48 ba 00 [ 21.637426] RSP: 0018:ff1100001cc37228 EFLAGS: 00010293 [ 21.637855] RAX: 0000000000000000 RBX: ff1100001cc373e8 RCX: ffffffff829c44b1 [ 21.638430] RDX: ff110000108b2bc0 RSI: ffffffff829c44c0 RDI: 0000000000000005 [ 21.639001] RBP: ff1100001cc37410 R08: ff1100001cc37318 R09: fffffbfff0fdc374 [ 21.639614] R10: 0000000000000000 R11: ff110000108b3a18 R12: 00000000fffffff5 [ 21.640180] R13: ff1100001d471680 R14: ff1100000ec74000 R15: 0000000000000000 [ 21.640751] FS: 00007f4c1b99a640(0000) GS:ff110000e3a84000(0000) knlGS:0000000000000000 [ 21.641394] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.641862] CR2: 00007f4c14cfd000 CR3: 000000000d4f4006 CR4: 0000000000771ef0 [ 21.642439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 21.643007] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 21.643620] PKRU: 55555554 [ 21.643848] Call Trace: [ 21.644060] <TASK> [ 21.644256] ? __pfx_xfs_bmapi_convert_one_delalloc+0x10/0x10 [ 21.644738] ? __up_read+0x21e/0x810 [ 21.645063] ? unmap_mapping_range+0x118/0x2c0 [ 21.645436] ? lock_release+0x14f/0x2c0 [ 21.645761] ? __pfx_unmap_mapping_range+0x10/0x10 [ 21.646164] xfs_bmapi_convert_delalloc+0xb0/0x100 [ 21.646568] xfs_buffered_write_iomap_begin+0x153d/0x1890 [ 21.647022] ? __pfx_xfs_buffered_write_iomap_begin+0x10/0x10 [ 21.647535] ? filemap_range_has_writeback+0x318/0x460 [ 21.647977] ? filemap_range_has_writeback+0xf1/0x460 [ 21.648395] ? __pfx_xfs_buffered_write_iomap_begin+0x10/0x10 [ 21.648870] iomap_iter+0x58b/0xd30 [ 21.649178] iomap_zero_range+0x547/0x740 [ 21.649526] ? __pfx_iomap_zero_range+0x10/0x10 [ 21.649926] ? __this_cpu_preempt_check+0x21/0x30 [ 21.650317] ? lock_is_held_type+0xef/0x150 [ 21.650673] xfs_zero_range+0xaf/0x100 [ 21.650990] xfs_file_write_checks+0x639/0xa00 [ 21.651385] xfs_file_buffered_write+0x183/0x9a0 [ 21.651806] ? __pfx_xfs_file_buffered_write+0x10/0x10 [ 21.652225] ? __lock_acquire+0x410/0x2260 [ 21.652580] ? __lruvec_stat_mod_folio+0x1aa/0x3e0 [ 21.652985] xfs_file_write_iter+0x52a/0xc00 [ 21.653350] do_iter_readv_writev+0x6b1/0x9c0 [ 21.653723] ? __pfx_do_iter_readv_writev+0x10/0x10 [ 21.654127] ? __this_cpu_preempt_check+0x21/0x30 [ 21.654526] ? lock_is_held_type+0xef/0x150 [ 21.654879] vfs_writev+0x311/0xd70 [ 21.655204] ? __lock_acquire+0x410/0x2260 [ 21.655574] ? __pfx_vfs_writev+0x10/0x10 [ 21.655922] ? __fget_files+0x1fa/0x3b0 [ 21.656251] ? __this_cpu_preempt_check+0x21/0x30 [ 21.656643] ? lock_release+0x14f/0x2c0 [ 21.656973] ? __fget_files+0x204/0x3b0 [ 21.657306] do_pwritev+0x1c9/0x280 [ 21.657612] ? do_pwritev+0x1c9/0x280 [ 21.657927] ? __pfx_do_pwritev+0x10/0x10 [ 21.658269] ? __audit_syscall_entry+0x39c/0x500 [ 21.658661] __x64_sys_pwritev+0xa3/0x100 [ 21.659000] ? syscall_trace_enter+0x14d/0x280 [ 21.659387] x64_sys_call+0x1bbc/0x2150 [ 21.659741] do_syscall_64+0x6d/0x150 [ 21.660055] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 21.660473] RIP: 0033:0x7f4c1b63ee5d [ 21.660774] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48 [ 21.662222] RSP: 002b:00007f4c1b999df8 EFLAGS: 00000202 ORIG_RAX: 0000000000000128 [ 21.662830] RAX: ffffffffffffffda RBX: 00007f4c1b99a640 RCX: 00007f4c1b63ee5d [ 21.663432] RDX: 0000000000000001 RSI: 0000000020000300 RDI: 0000000000000004 [ 21.664001] RBP: 00007f4c1b999e20 R08: 0000000000000000 R09: 0000000000000000 [ 21.664569] R10: 0000000002fffffd R11: 0000000000000202 R12: 00007f4c1b99a640 [ 21.665137] R13: 000000000000006e R14: 00007f4c1b69f560 R15: 0000000000000000 [ 21.665726] </TASK> " Hope this cound be insightful to you. Regards, Yi Lai --- If you don't need the following environment to reproduce the problem or if you already have one reproduced environment, please ignore the following information. How to reproduce: git clone https://gitlab.com/xupengfe/repro_vm_env.git cd repro_vm_env tar -xvf repro_vm_env.tar.gz cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0 // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel // You could change the bzImage_xxx as you want // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version You could use below command to log in, there is no password for root. ssh -p 10023 root@localhost After login vm(virtual machine) successfully, you could transfer reproduced binary to the vm by below way, and reproduce the problem in vm: gcc -pthread -o repro repro.c scp -P 10023 repro root@localhost:/root/ Get the bzImage for target kernel: Please use target kconfig and copy it to kernel_src/.config make olddefconfig make -jx bzImage //x should equal or less than cpu num your pc has Fill the bzImage file into above start3.sh to load the target kernel in vm. Tips: If you already have qemu-system-x86_64, please ignore below info. If you want to install qemu v7.1.0 version: git clone https://github.com/qemu/qemu.git cd qemu git checkout -f v7.1.0 mkdir build cd build yum install -y ninja-build.x86_64 yum -y install libslirp-devel.x86_64 ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp make make install On Thu, Apr 25, 2024 at 09:13:30PM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@xxxxxxxxxx> > > Current clone operation could be non-atomic if the destination of a file > is beyond EOF, user could get a file with corrupted (zeroed) data on > crash. > > The problem is about preallocations. If you write some data into a file: > > [A...B) > > and XFS decides to preallocate some post-eof blocks, then it can create > a delayed allocation reservation: > > [A.........D) > > The writeback path tries to convert delayed extents to real ones by > allocating blocks. If there aren't enough contiguous free space, we can > end up with two extents, the first real and the second still delalloc: > > [A....C)[C.D) > > After that, both the in-memory and the on-disk file sizes are still B. > If we clone into the range [E...F) from another file: > > [A....C)[C.D) [E...F) > > then xfs_reflink_zero_posteof() calls iomap_zero_range() to zero out the > range [B, E) beyond EOF and flush it. Since [C, D) is still a delalloc > extent, its pagecache will be zeroed and both the in-memory and on-disk > size will be updated to D after flushing but before cloning. This is > wrong, because the user can see the size change and read the zeroes > while the clone operation is ongoing. > > We need to keep the in-memory and on-disk size before the clone > operation starts, so instead of writing zeroes through the page cache > for delayed ranges beyond EOF, we convert these ranges to unwritten and > invalidate any cached data over that range beyond EOF. > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> > Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> > --- > Changes since v4: > > Move the delalloc converting hunk before searching the COW fork. Because > if the file has been reflinked and copied on write, > xfs_bmap_extsize_align() aligned the range of COW delalloc extent, after > the writeback, there might be some unwritten extents left over in the > COW fork that overlaps the delalloc extent we found in data fork. > > data fork ...wwww|dddddddddd... > cow fork |uuuuuuuuuu... > ^ > i_size > > In my v4, we search the COW fork before checking the delalloc extent, > goto found_cow tag and return unconverted delalloc srcmap in the above > case, so the delayed extent in the data fork will have no chance to > convert to unwritten, it will lead to delalloc extent residue and break > generic/522 after merging patch 6. > > fs/xfs/xfs_iomap.c | 29 +++++++++++++++++++++++++++++ > 1 file changed, 29 insertions(+) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 236ee78aa75b..2857ef1b0272 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -1022,6 +1022,24 @@ xfs_buffered_write_iomap_begin( > goto out_unlock; > } > > + /* > + * For zeroing, trim a delalloc extent that extends beyond the EOF > + * block. If it starts beyond the EOF block, convert it to an > + * unwritten extent. > + */ > + if ((flags & IOMAP_ZERO) && imap.br_startoff <= offset_fsb && > + isnullstartblock(imap.br_startblock)) { > + xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)); > + > + if (offset_fsb >= eof_fsb) > + goto convert_delay; > + if (end_fsb > eof_fsb) { > + end_fsb = eof_fsb; > + xfs_trim_extent(&imap, offset_fsb, > + end_fsb - offset_fsb); > + } > + } > + > /* > * Search the COW fork extent list even if we did not find a data fork > * extent. This serves two purposes: first this implements the > @@ -1167,6 +1185,17 @@ xfs_buffered_write_iomap_begin( > xfs_iunlock(ip, lockmode); > return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq); > > +convert_delay: > + xfs_iunlock(ip, lockmode); > + truncate_pagecache(inode, offset); > + error = xfs_bmapi_convert_delalloc(ip, XFS_DATA_FORK, offset, > + iomap, NULL); > + if (error) > + return error; > + > + trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &imap); > + return 0; > + > found_cow: > seq = xfs_iomap_inode_sequence(ip, 0); > if (imap.br_startoff <= offset_fsb) { > -- > 2.39.2 >