I am writing to report a kernel crash that occurred after terminating (kill -9) an SPDK application using ublk. Below are the details of the incident, including steps to reproduce the issue and the call stack. Incident Description: After terminating an SPDK application, the system occasionally experiences a kernel crash. This issue is not consistent but happens once every few tries under the following conditions. We are using kernel 6.14.0-061400-generic Steps to Reproduce: 1. install SPDK: git clone https://github.com/spdk/spdk ; cd spdk ./configure --disable-coverage --disable-debug --disable-tests --enable-unit-tests --without-crypto --without-fio --with-vhost --with-rdma --without-nvme-cuse --without-fuse --without-vfio-user --without-vtune --without-iscsi-initiator --without-shared --with-ublk --with-uring --with-raid5f make make install 2. Create SPDK bdev (here we used PCI 0000.8b.00.0 as the nvme target, and named the bdev as guy_bdev): ./spdk/scripts/setup.sh reset ./spdk/scripts/setup.sh /usr/local/bin/spdk_tgt --mem-size 2048 -m 0xff ./spdk/scripts/rpc.py bdev_nvme_attach_controller -b guy_bdev -t PCIe -a 0000.8b.00.0 3. Expose it via ublk modprobe ublk_drv ./spdk/scripts/rpc.py ublk_create_target ./spdk/scripts/rpc.py ublk_start_disk -q 8 -d 128 guy_bdevn1 0 4. Run IO to the /dev/ublkb0 that was created Kill the spdk_tgt process (kill -9) Call Stack: Below is the call stack captured during one of the crashes: [54346.157495] [ T288311] BUG: kernel NULL pointer dereference, address: 0000000000000000 [54346.157625] [ T288311] #PF: supervisor write access in kernel mode [54346.157708] [ T288311] #PF: error_code(0x0002) - not-present page [54346.157790] [ T288311] PGD 0 P4D 0 [54346.157911] [ T288311] Oops: Oops: 0002 [#1] PREEMPT SMP PTI [54346.158010] [ T288311] CPU: 0 UID: 0 PID: 288311 Comm: reactor_0 Kdump: loaded Tainted: G OE 6.14.0-061400-generic #202503241442 [54346.158264] [ T288311] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [54346.158374] [ T288311] Hardware name: Supermicro SYS-2028BT-HNR+/X10DRT-B+, BIOS 2.0 01/10/2017 [54346.158490] [ T288311] RIP: 0010:percpu_ref_get_many+0x35/0x50 [54346.158616] [ T288311] Code: 89 fb e8 ae 0a 97 ff 48 8b 03 a8 03 75 18 65 4c 01 20 e8 ae 52 97 ff 5b 41 5c 5d 31 c0 31 f6 31 ff c3 cc cc cc cc 48 8b 43 08 <f0> 4c 01 20 e8 92 52 97 ff 5b 41 5c 5d 31 c0 31 f6 31 ff c3 cc cc [54346.158914] [ T288311] RSP: 0000:ffffab2cf1b53cb8 EFLAGS: 00010206 [54346.159076] [ T288311] RAX: 0000000000000000 RBX: ffff93bddb2f8818 RCX: 0000000000000000 [54346.159257] [ T288311] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff93bddb2f8818 [54346.159415] [ T288311] RBP: ffffab2cf1b53cc8 R08: 0000000000000000 R09: 0000000000000000 [54346.159577] [ T288311] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [54346.159740] [ T288311] R13: 0000000000000001 R14: 0000000000000000 R15: ffff93c5679e2a00 [54346.159945] [ T288311] FS: 0000000000000000(0000) GS:ffff93c51f600000(0000) knlGS:0000000000000000 [54346.160142] [ T288311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [54346.160366] [ T288311] CR2: 0000000000000000 CR3: 0000000a63640001 CR4: 00000000003726f0 [54346.160546] [ T288311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [54346.160727] [ T288311] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [54346.160934] [ T288311] Call Trace: [54346.161134] [ T288311] <TASK> [54346.161351] [ T288311] ? show_trace_log_lvl+0x1be/0x310 [54346.161542] [ T288311] ? show_trace_log_lvl+0x1be/0x310 [54346.161730] [ T288311] ? __io_fallback_tw+0x63/0xe0 [54346.161963] [ T288311] ? show_regs.part.0+0x22/0x30 [54346.162197] [ T288311] ? __die_body.cold+0x8/0x10 [54346.162421] [ T288311] ? __die+0x2a/0x40 [54346.162614] [ T288311] ? page_fault_oops+0x16e/0x180 [54346.162814] [ T288311] ? do_user_addr_fault+0x4c9/0x7e0 [54346.163062] [ T288311] ? __mod_memcg_state+0xbf/0x200 [54346.163305] [ T288311] ? exc_page_fault+0x85/0x1e0 [54346.163508] [ T288311] ? asm_exc_page_fault+0x27/0x30 [54346.163716] [ T288311] ? percpu_ref_get_many+0x35/0x50 [54346.163963] [ T288311] ? percpu_ref_get_many+0x12/0x50 [54346.164240] [ T288311] __io_fallback_tw+0x63/0xe0 [54346.164447] [ T288311] tctx_task_work_run+0xd7/0xf0 [54346.164683] [ T288311] tctx_task_work+0x38/0x70 [54346.164928] [ T288311] task_work_run+0x60/0xa0 [54346.165203] [ T288311] do_exit+0x26e/0x4c0 [54346.165443] [ T288311] do_group_exit+0x34/0x90 [54346.165659] [ T288311] get_signal+0x7e5/0x870 [54346.165936] [ T288311] ? arch_exit_to_user_mode_prepare.isra.0+0xc8/0xd0 [54346.166206] [ T288311] arch_do_signal_or_restart+0x39/0x110 [54346.166462] [ T288311] irqentry_exit_to_user_mode+0x13b/0x1d0 [54346.166682] [ T288311] irqentry_exit+0x43/0x50 [54346.166946] [ T288311] sysvec_reschedule_ipi+0x65/0x110 [54346.167218] [ T288311] asm_sysvec_reschedule_ipi+0x1b/0x20 [54346.167449] [ T288311] RIP: 0033:0x7f813a899158 [54346.167663] [ T288311] Code: Unable to access opcode bytes at 0x7f813a89912e. [54346.167938] [ T288311] RSP: 002b:00007ffe3fc833b8 EFLAGS: 00000246 [54346.168209] [ T288311] RAX: 0000577bbe839820 RBX: 000020000ba72000 RCX: 00007f813a83bf24 [54346.168464] [ T288311] RDX: 0000577bbe83c620 RSI: 0000000000000001 RDI: 000020002770e540 [54346.168668] [ T288311] RBP: 000020000bb35de0 R08: 0000577bbe75e990 R09: 0000000000000008 [54346.168890] [ T288311] R10: 0000000000000030 R11: 0000000000000000 R12: 000020000bb35d80 [54346.169116] [ T288311] R13: 00002010409f9030 R14: 000020002757e980 R15: 00007f813ab44460 [54346.169339] [ T288311] </TASK> [54346.169520] [ T288311] Modules linked in: ublk_drv vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd mst_pciconf(OE) qrtr rpcrdma binfmt_misc sunrpc rdma_ucm ib_iser libiscsi scsi_transport_iscsi nls_iso8859_1 ib_umad rdma_cm ib_ipoib iw_cm intel_rapl_msr ib_cm intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ee1004 kvm ipmi_ssif rapl intel_cstate i2c_i801 ast mei_me i2c_smbus i2c_mux lpc_ich mei ioatdma acpi_power_meter ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler acpi_pad joydev input_leds mac_hid cfg80211 sch_fq_codel dm_multipath msr nvme_fabrics nvme_keyring efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear mlx5_ib ib_uverbs macsec ib_core mlx5_core hid_generic polyval_clmulni mlxfw polyval_generic ghash_clmulni_intel usbhid psample sha256_ssse3 hid sha1_ssse3 nvme tls igb pci_hyperv_intf nvme_core [54346.169599] [ T288311] ahci i2c_algo_bit libahci nvme_auth dca wmi aesni_intel crypto_simd cryptd [last unloaded: ublk_drv] [54346.172088] [ T288311] CR2: 0000000000000000 I would appreciate any assistance or guidance you can provide to help resolve this issue. Please let me know if you need any additional information or if there are specific tests you would like me to perform. Regards, Guy Eisenberg
Attachment:
dmesg.202504151009
Description: dmesg.202504151009