Re: [bug report] WARNING: CPU: 3 PID: 522 at block/genhd.c:144 bdev_count_inflight_rw+0x26e/0x410

Calvin Owens <calvin@xxxxxxxxxx> · Thu, 19 Jun 2025 21:10:42 -0700

On Tuesday 06/10 at 10:07 +0800, Yu Kuai wrote:
> So, this is blk-mq IO accounting, a different problem than nvme mpath.
>
> What kind of test you're running, can you reporduce ths problem? I don't
> have a clue yet after a quick code review.
>
> Thanks,
> Kuai

Hi all,

I've also been hitting this warning, I can reproduce it pretty
consistently within a few hours of running large Yocto builds. If I can
help test any patches, let me know.

A close approximation to what I'm doing is to clone Poky and build
core-image-weston: https://github.com/yoctoproject/poky

Using a higher than reasonable concurrency seems to help: I'm setting
BB_NUMBER_THREADS and PARALLEL_MAKE to 2x - 4x the number of CPUs. I'm
trying to narrow it down to a simpler reproducer, but haven't had any
luck yet.

I see this on three machines. One is btrfs/luks/nvme, the other two are
btrfs/luks/mdraid1/nvme*2. All three have a very large swapfile on the
rootfs. This is from the machine without mdraid:

    ------------[ cut here ]------------
    WARNING: CPU: 6 PID: 1768274 at block/genhd.c:144 bdev_count_inflight_rw+0x8a/0xc0
    CPU: 6 UID: 1000 PID: 1768274 Comm: cc1plus Not tainted 6.16.0-rc2-gcc-slubdebug-lockdep-00071-g74b4cc9b8780 #1 PREEMPT
    Hardware name: Gigabyte Technology Co., Ltd. A620I AX/A620I AX, BIOS F3 07/10/2023
    RIP: 0010:bdev_count_inflight_rw+0x8a/0xc0
    Code: 00 01 d7 89 3e 49 8b 50 20 4a 03 14 d5 c0 4b 76 82 48 8b 92 90 00 00 00 01 d1 48 63 d0 89 4e 04 48 83 fa 1f 76 92 85 ff 79 a7 <0f> 0b c7 06 00 00 00 00 85 c9 79 9f 0f 0b c7 46 04 00 00 00 00 48
    RSP: 0000:ffffc9002b027ab8 EFLAGS: 00010282
    RAX: 0000000000000020 RBX: ffff88810dec0000 RCX: 000000000000000a
    RDX: 0000000000000020 RSI: ffffc9002b027ac8 RDI: 00000000fffffffe
    RBP: ffff88810dec0000 R08: ffff888100660b40 R09: ffffffffffffffff
    R10: 000000000000001f R11: ffff888f3a30e9a8 R12: ffff8881098855d0
    R13: ffffc9002b027b90 R14: 0000000000000001 R15: ffffc9002b027e18
    FS:  00007fb394b48400(0000) GS:ffff888ccc9b9000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fb3884a81c8 CR3: 00000013db708000 CR4: 0000000000350ef0
    Call Trace:
     <TASK>
     bdev_count_inflight+0x16/0x30
     update_io_ticks+0xb7/0xd0
     blk_account_io_start+0xe8/0x200
     blk_mq_submit_bio+0x34c/0x910
     __submit_bio+0x95/0x5a0
     ? submit_bio_noacct_nocheck+0x169/0x400
     submit_bio_noacct_nocheck+0x169/0x400
     swapin_readahead+0x18a/0x550
     ? __filemap_get_folio+0x26/0x400
     ? get_swap_device+0xe8/0x210
     ? lock_release+0xc3/0x2a0
     do_swap_page+0x1fa/0x1850
     ? __lock_acquire+0x46d/0x25c0
     ? wake_up_state+0x10/0x10
     __handle_mm_fault+0x5e5/0x880
     handle_mm_fault+0x70/0x2e0
     exc_page_fault+0x374/0x8a0
     asm_exc_page_fault+0x22/0x30
    RIP: 0033:0x915570
    Code: ff 01 0f 86 c4 05 00 00 41 56 41 55 41 54 55 48 89 fd 53 48 89 fb 0f 1f 40 00 48 89 df e8 98 c8 0b 00 84 c0 0f 85 90 05 00 00 <0f> b7 03 48 c1 e0 06 80 b8 99 24 d1 02 00 48 8d 90 80 24 d1 02 0f
    RSP: 002b:00007ffc9327dfd0 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 00007fb3884a81c8 RCX: 0000000000000008
    RDX: 0000000000000006 RSI: 0000000005dba008 RDI: 0000000000000000
    RBP: 00007fb3884a81c8 R08: 000000000000000c R09: 00000007fb3884a8
    R10: 0000000000000007 R11: 0000000000000206 R12: 0000000000000000
    R13: 0000000000000002 R14: 00007ffc9329cb90 R15: 00007fb36e5d2700
     </TASK>
    irq event stamp: 36649373
    hardirqs last  enabled at (36649387): [<ffffffff813cea2d>] __up_console_sem+0x4d/0x50
    hardirqs last disabled at (36649398): [<ffffffff813cea12>] __up_console_sem+0x32/0x50
    softirqs last  enabled at (36648786): [<ffffffff8136017f>] __irq_exit_rcu+0x8f/0xb0
    softirqs last disabled at (36648617): [<ffffffff8136017f>] __irq_exit_rcu+0x8f/0xb0
    ---[ end trace 0000000000000000 ]---

I dumped all the similar WARNs I've seen here (blk-warn-%d.txt):

    https://github.com/jcalvinowens/lkml-debug-616/tree/master

I don't have any evidence it's related, but I'm also hitting a rare OOPS
in futex with this same Yocto build workload. Sebastian has done some
analysis here:

    https://lore.kernel.org/lkml/20250618160333.PdGB89yt@xxxxxxxxxxxxx/

I get this warning most of the time I get the oops, but not all of the
time. Curious if anyone else is seeing the oops?

Thanks,
Calvin