On Tuesday 06/10 at 10:07 +0800, Yu Kuai wrote: > So, this is blk-mq IO accounting, a different problem than nvme mpath. > > What kind of test you're running, can you reporduce ths problem? I don't > have a clue yet after a quick code review. > > Thanks, > Kuai Hi all, I've also been hitting this warning, I can reproduce it pretty consistently within a few hours of running large Yocto builds. If I can help test any patches, let me know. A close approximation to what I'm doing is to clone Poky and build core-image-weston: https://github.com/yoctoproject/poky Using a higher than reasonable concurrency seems to help: I'm setting BB_NUMBER_THREADS and PARALLEL_MAKE to 2x - 4x the number of CPUs. I'm trying to narrow it down to a simpler reproducer, but haven't had any luck yet. I see this on three machines. One is btrfs/luks/nvme, the other two are btrfs/luks/mdraid1/nvme*2. All three have a very large swapfile on the rootfs. This is from the machine without mdraid: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 1768274 at block/genhd.c:144 bdev_count_inflight_rw+0x8a/0xc0 CPU: 6 UID: 1000 PID: 1768274 Comm: cc1plus Not tainted 6.16.0-rc2-gcc-slubdebug-lockdep-00071-g74b4cc9b8780 #1 PREEMPT Hardware name: Gigabyte Technology Co., Ltd. A620I AX/A620I AX, BIOS F3 07/10/2023 RIP: 0010:bdev_count_inflight_rw+0x8a/0xc0 Code: 00 01 d7 89 3e 49 8b 50 20 4a 03 14 d5 c0 4b 76 82 48 8b 92 90 00 00 00 01 d1 48 63 d0 89 4e 04 48 83 fa 1f 76 92 85 ff 79 a7 <0f> 0b c7 06 00 00 00 00 85 c9 79 9f 0f 0b c7 46 04 00 00 00 00 48 RSP: 0000:ffffc9002b027ab8 EFLAGS: 00010282 RAX: 0000000000000020 RBX: ffff88810dec0000 RCX: 000000000000000a RDX: 0000000000000020 RSI: ffffc9002b027ac8 RDI: 00000000fffffffe RBP: ffff88810dec0000 R08: ffff888100660b40 R09: ffffffffffffffff R10: 000000000000001f R11: ffff888f3a30e9a8 R12: ffff8881098855d0 R13: ffffc9002b027b90 R14: 0000000000000001 R15: ffffc9002b027e18 FS: 00007fb394b48400(0000) GS:ffff888ccc9b9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb3884a81c8 CR3: 00000013db708000 CR4: 0000000000350ef0 Call Trace: <TASK> bdev_count_inflight+0x16/0x30 update_io_ticks+0xb7/0xd0 blk_account_io_start+0xe8/0x200 blk_mq_submit_bio+0x34c/0x910 __submit_bio+0x95/0x5a0 ? submit_bio_noacct_nocheck+0x169/0x400 submit_bio_noacct_nocheck+0x169/0x400 swapin_readahead+0x18a/0x550 ? __filemap_get_folio+0x26/0x400 ? get_swap_device+0xe8/0x210 ? lock_release+0xc3/0x2a0 do_swap_page+0x1fa/0x1850 ? __lock_acquire+0x46d/0x25c0 ? wake_up_state+0x10/0x10 __handle_mm_fault+0x5e5/0x880 handle_mm_fault+0x70/0x2e0 exc_page_fault+0x374/0x8a0 asm_exc_page_fault+0x22/0x30 RIP: 0033:0x915570 Code: ff 01 0f 86 c4 05 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 89 fb 0f 1f 40 00 48 89 df e8 98 c8 0b 00 84 c0 0f 85 90 05 00 00 <0f> b7 03 48 c1 e0 06 80 b8 99 24 d1 02 00 48 8d 90 80 24 d1 02 0f RSP: 002b:00007ffc9327dfd0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00007fb3884a81c8 RCX: 0000000000000008 RDX: 0000000000000006 RSI: 0000000005dba008 RDI: 0000000000000000 RBP: 00007fb3884a81c8 R08: 000000000000000c R09: 00000007fb3884a8 R10: 0000000000000007 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000002 R14: 00007ffc9329cb90 R15: 00007fb36e5d2700 </TASK> irq event stamp: 36649373 hardirqs last enabled at (36649387): [<ffffffff813cea2d>] __up_console_sem+0x4d/0x50 hardirqs last disabled at (36649398): [<ffffffff813cea12>] __up_console_sem+0x32/0x50 softirqs last enabled at (36648786): [<ffffffff8136017f>] __irq_exit_rcu+0x8f/0xb0 softirqs last disabled at (36648617): [<ffffffff8136017f>] __irq_exit_rcu+0x8f/0xb0 ---[ end trace 0000000000000000 ]--- I dumped all the similar WARNs I've seen here (blk-warn-%d.txt): https://github.com/jcalvinowens/lkml-debug-616/tree/master I don't have any evidence it's related, but I'm also hitting a rare OOPS in futex with this same Yocto build workload. Sebastian has done some analysis here: https://lore.kernel.org/lkml/20250618160333.PdGB89yt@xxxxxxxxxxxxx/ I get this warning most of the time I get the oops, but not all of the time. Curious if anyone else is seeing the oops? Thanks, Calvin