On 5/26/25 9:31 AM, Vlastimil Babka wrote: > On 5/26/25 17:06, Jens Axboe wrote: >> On 5/26/25 7:05 AM, Jens Axboe wrote: >>> On 5/25/25 1:12 PM, Vlastimil Babka wrote: >>> >>> Thanks for taking a look at this! I tried to reproduce this this morning >>> and failed miserably. I then injected a delay for the above case, and it >>> does indeed then trigger for me. So far, so good. >>> >>> I agree with your analysis, we should only be doing the dropbehind for a >>> non-zero return from __folio_end_writeback(), and that includes the >>> test_and_clear to avoid dropping the drop-behind state. But we also need >>> to check/clear this state pre __folio_end_writeback(), which then puts >>> us in a spot where it needs to potentially be re-set. Which fails pretty >>> racy... >>> >>> I'll ponder this a bit. Good thing fsx got RWF_DONTCACHE support, or I >>> suspect this would've taken a while to run into. >> >> Took a closer look... I may be smoking something good here, but I don't >> see what the __folio_end_writeback()() return value has to do with this >> at all. Regardless of what it returns, it should've cleared >> PG_writeback, and in fact the only thing it returns is whether or not we >> had anyone waiting on it. Which should have _zero_ bearing on whether or >> not we can clear/invalidate the range. > > Yeah it's very much possible that I was wrong, folio_xor_flags_has_waiters() > looked a bit impenetrable to me, and it seemed like an simple explanation to > the splats. But as you had to add delays, this indeed smells as a race. Here's my delay trace fwiw, which is a bit different: BUG: Bad page state in process fsx pfn:4866b page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x25 pfn:0x4866b flags: 0x3ffe0000000000a(uptodate|writeback|node=0|zone=0|lastcpupid=0x1fff) raw: 03ffe0000000000a dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000025 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Modules linked in: CPU: 6 UID: 0 PID: 1853 Comm: fsx Not tainted 6.15.0-rc7-00144-gb1427432d3b6-dirty #1053 NONE Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x1c/0x30 (C) dump_stack_lvl+0x58/0x78 dump_stack+0x18/0x20 bad_page+0x1a4/0x228 free_unref_folios+0xc2c/0x1920 folios_put_refs+0x354/0x5f0 __folio_batch_release+0x98/0xd0 writeback_iter+0x8f8/0xd00 iomap_writepages+0x16e4/0x2090 xfs_vm_writepages+0x200/0x2c0 do_writepages+0x148/0x7c0 filemap_fdatawrite_wbc+0xe0/0x138 __filemap_fdatawrite_range+0xb0/0x100 filemap_write_and_wait_range+0x68/0x100 __generic_remap_file_range_prep+0x418/0x1090 generic_remap_file_range_prep+0x18/0x80 xfs_reflink_remap_prep+0x160/0x7d8 xfs_file_remap_range+0x164/0xa90 vfs_dedupe_file_range_one+0x398/0x4a0 vfs_dedupe_file_range+0x410/0x648 do_vfs_ioctl+0x13c4/0x1fc0 __arm64_sys_ioctl+0xd8/0x188 invoke_syscall.constprop.0+0x60/0x2a0 el0_svc_common.constprop.0+0x148/0x240 do_el0_svc+0x40/0x60 el0_svc+0x34/0x70 el0t_64_sync_handler+0x104/0x138 el0t_64_sync+0x170/0x178 Disabling lock debugging due to kernel taint page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x25 pfn:0x4866b flags: 0x3ffe0000000000a(uptodate|writeback|node=0|zone=0|lastcpupid=0x1fff) raw: 03ffe0000000000a dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000025 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u)) ------------[ cut here ]------------ kernel BUG at ./include/linux/mm.h:1543! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP Modules linked in: CPU: 6 UID: 0 PID: 0 Comm: swapper/6 Tainted: G B 6.15.0-rc7-00144-gb1427432d3b6-dirty #1053 NONE Tainted: [B]=BAD_PAGE Hardware name: linux,dummy-virt (DT) pstate: 614000c5 (nZCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : folio_end_writeback+0x470/0x560 lr : folio_end_writeback+0x470/0x560 sp : ffff8000859978f0 x29: ffff8000859978f0 x28: dfff800000000000 x27: fffffdffc0219ac0 x26: 0000000000000000 x25: ffff000005ed8138 x24: 0000000000000000 x23: 1fffffbff804335e x22: 0000000000000004 x21: 0000000000000001 x20: fffffdffc0219af4 x19: fffffdffc0219ac0 x18: 000000000000000f x17: 635f6665725f6f69 x16: 6c6f662029746e69 x15: 0720072007200720 x14: 0720072007200720 x13: 0720072007200720 x12: ffff60001b67150b x11: 1fffe0001b67150a x10: ffff60001b67150a x9 : dfff800000000000 x8 : 00009fffe498eaf6 x7 : ffff0000db38a853 x6 : 0000000000000001 x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff0000c1f98000 x0 : 000000000000005c Call trace: folio_end_writeback+0x470/0x560 (P) iomap_finish_ioend_buffered+0x38c/0x9e0 iomap_writepage_end_bio+0x80/0xc0 bio_endio+0x4dc/0x678 blk_mq_end_request_batch+0x2b4/0x10c0 nvme_pci_complete_batch+0x338/0x518 nvme_irq+0xd8/0xf0 __handle_irq_event_percpu+0xdc/0x528 handle_irq_event+0x174/0x3d8 handle_fasteoi_irq+0x2cc/0xba0 handle_irq_desc+0xb8/0x120 generic_handle_domain_irq+0x20/0x30 gic_handle_irq+0x50/0x140 call_on_irq_stack+0x24/0x50 do_interrupt_handler+0xe0/0x148 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x14/0x20 el1h_64_irq+0x6c/0x70 do_idle+0x244/0x4c8 (P) cpu_startup_entry+0x64/0x80 secondary_start_kernel+0x1e4/0x240 __secondary_switched+0x74/0x78 Code: 91190021 91218021 aa1303e0 94039279 (d4210000) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0000,000000e0,0109a650,834e7607 Memory Limit: none ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]--- -- Jens Axboe