hi, Nilay, On Tue, Mar 18, 2025 at 07:13:20PM +0530, Nilay Shroff wrote: > > > On 3/17/25 7:10 PM, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed "INFO:task_blocked_for_more_than#seconds" on: > > > > commit: f35c9ef2ba17842de59176b29df32999803bd9fa ("[PATCHv6 6/7] block: protect wbt_lat_usec using q->elevator_lock") > > url: https://github.com/intel-lab-lkp/linux/commits/Nilay-Shroff/block-acquire-q-limits_lock-while-reading-sysfs-attributes/20250304-182738 > > base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next > > patch link: https://lore.kernel.org/all/20250304102551.2533767-7-nilay@xxxxxxxxxxxxx/ > > patch subject: [PATCHv6 6/7] block: protect wbt_lat_usec using q->elevator_lock > > > > in testcase: fio-basic > > version: fio-x86_64-3.38-1_20250308 > > with following parameters: > > > > runtime: 300s > > disk: 1HDD > > fs: btrfs > > nr_task: 100% > > test_size: 128G > > rw: randwrite > > bs: 4M > > ioengine: posixaio > > cpufreq_governor: performance > > > > > > > > config: x86_64-rhel-9.4 > > compiler: gcc-12 > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > | Closes: https://lore.kernel.org/oe-lkp/202503171650.cc082b66-lkp@xxxxxxxxx > > > > > > [ 991.017071][ T472] INFO: task umount:12320 blocked for more than 491 seconds. > > [ 991.024314][ T472] Tainted: G W 6.14.0-rc5-00192-gf35c9ef2ba17 #1 > > [ 991.032414][ T472] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 991.040948][ T472] task:umount state:D stack:0 pid:12320 tgid:12320 ppid:12317 task_flags:0x400100 flags:0x00004002 > > [ 991.052695][ T472] Call Trace: > > [ 991.055849][ T472] <TASK> > > [ 991.058658][ T472] __schedule (kernel/sched/core.c:5378 kernel/sched/core.c:6765) > > [ 991.062856][ T472] schedule (arch/x86/include/asm/bitops.h:206 arch/x86/include/asm/bitops.h:238 include/linux/thread_info.h:192 include/linux/thread_info.h:208 include/linux/sched.h:2149 kernel/sched/core.c:6844 kernel/sched/core.c:6857) > > [ 991.066706][ T472] wb_wait_for_completion (fs/fs-writeback.c:216 fs/fs-writeback.c:213) > > [ 991.071773][ T472] ? __pfx_autoremove_wake_function (kernel/sched/wait.c:383) > > [ 991.077702][ T472] __writeback_inodes_sb_nr (fs/fs-writeback.c:2736) > > [ 991.082936][ T472] sync_filesystem (fs/sync.c:55 fs/sync.c:30) > > [ 991.087390][ T472] generic_shutdown_super (fs/super.c:622) > > [ 991.092538][ T472] kill_anon_super (fs/super.c:434 fs/super.c:1238) > > [ 991.096991][ T472] btrfs_kill_super (fs/btrfs/super.c:2101) btrfs > > [ 991.102280][ T472] deactivate_locked_super (fs/super.c:473) > > [ 991.107678][ T472] cleanup_mnt (fs/namespace.c:281 fs/namespace.c:1414) > > [ 991.112082][ T472] task_work_run (kernel/task_work.c:227 (discriminator 1)) > > [ 991.116534][ T472] syscall_exit_to_user_mode (include/linux/resume_user_mode.h:50 kernel/entry/common.c:114 include/linux/entry-common.h:329 kernel/entry/common.c:207 kernel/entry/common.c:218) > > [ 991.122197][ T472] do_syscall_64 (arch/x86/entry/common.c:102) > > [ 991.126731][ T472] ? do_syscall_64 (arch/x86/entry/common.c:102) > > [ 991.131430][ T472] ? __count_memcg_events (mm/memcontrol.c:583 mm/memcontrol.c:857) > > [ 991.136738][ T472] ? handle_mm_fault (arch/x86/include/asm/irqflags.h:154 include/linux/memcontrol.h:970 include/linux/memcontrol.h:993 include/linux/memcontrol.h:1000 mm/memory.c:6077 mm/memory.c:6238) > > [ 991.141606][ T472] ? do_user_addr_fault (include/linux/mm.h:743 arch/x86/mm/fault.c:1339) > > [ 991.146823][ T472] ? clear_bhb_loop (arch/x86/entry/entry_64.S:1538) > > [ 991.151517][ T472] ? clear_bhb_loop (arch/x86/entry/entry_64.S:1538) > > [ 991.156203][ T472] ? clear_bhb_loop (arch/x86/entry/entry_64.S:1538) > > [ 991.160881][ T472] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) > > [ 991.166777][ T472] RIP: 0033:0x7f415ea2aa77 > > [ 991.171197][ T472] RSP: 002b:00007ffe0db2fd98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 > > [ 991.179611][ T472] RAX: 0000000000000000 RBX: 000055cc64b55048 RCX: 00007f415ea2aa77 > > [ 991.187586][ T472] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055cc64b55160 > > [ 991.195555][ T472] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000073 > > [ 991.203514][ T472] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f415eb65264 > > [ 991.211476][ T472] R13: 000055cc64b55160 R14: 0000000000000000 R15: 000055cc64b54f30 > > [ 991.219431][ T472] </TASK> > > [ 1008.358661][T12320] BTRFS info (device sda1): last unmount of filesystem 8b972718-96ad-4a66-b549-8be29321e91a > > > > > > The kernel config and materials to reproduce are available at: > > https://download.01.org/0day-ci/archive/20250317/202503171650.cc082b66-lkp@xxxxxxxxx > > > I attempted to reproduce the above issue multiple times using the provided > reproducer but was unable to do so. However, during further investigation, > I discovered a lockdep warning related to a circular buffer. The patch in > question introduces q->elevator_lock to protect writes to the sysfs attribute > wbt_lat_usec and the cgroup attribute io.cost.qos. However, write to these > attributes also acquire q->rq_qos_mutex, which may lead to a potential lock > ordering issue reported by lockdep. Unfortunately, blktest doesn't have any > testcase testing writes to these attributes. I think we should have one and > so will submit a blktest. > > The lockdep warning reports an incorrect locking order between q->elevator_lock > and q->rq_qos_mutex, which might cause the observed symptom reported. Notably, > I saw that the LKP test case did not have lockdep enabled, which may have > allowed this issue to manifest much earlier rather than being detected later > while unmounting the file system. > > Anyways, we have to fix the circular locking dependency between q->elevator_lock > and q->rq_qos_mutex. I will prepare a patch to address this and submit it upstream, > tagging you in the commit. > > On another, if you're able to recreate this issue then whenever this issue manifests > would you please help run the below command and collect dmesg output: > # echo w > /proc/sysrq-trigger sorry for late. above need some changes in our auto flow. instead of collect this information, we noticed you've already had a fix patch-set in https://lore.kernel.org/all/20250319105518.468941-1-nilay@xxxxxxxxxxxxx/ we applied them upon previous patch-set (https://lore.kernel.org/all/20250304102551.2533767-1-nilay@xxxxxxxxxxxxx/) and confirmed the issue gone. thanks! Tested-by: kernel test robot <oliver.sang@xxxxxxxxx> > > Thanks, > --Nilay >