Hi, > Commit 378677eb8f44621ecc9ce659f7af61e5baa94d81 ("wifi: > mac80211: Purge vif txq in ieee80211_do_stop()") seems to > have made rsi_usb/rsi_91x cause a kernel panic when > removing the USB while the interface is up. So it's been a while ... is this still happening? > == USB disconnected == > [ 81.093884] [ T11] usb 1-2: USB disconnect, device number 4 > [ 81.145395] [ T11] BUG: unable to handle page fault for address: 000000009dff2338 > [ 81.145637] [ T11] #PF: supervisor read access in kernel mode > [ 81.145868] [ T11] #PF: error_code(0x0000) - not-present page > [ 81.146096] [ T11] PGD 0 P4D 0 > [ 81.146323] [ T11] Oops: Oops: 0000 [#1] SMP NOPTI > [ 81.146548] [ T11] CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Kdump: loaded Not tainted 6.15.0 #1 PREEMPT(voluntary) c74d5f1746d8801a78fe4695a51ca9b00b89ab7e > [ 81.146790] [ T11] Hardware name: Dell Inc. Latitude E7250/0TPHC4, BIOS A19 01/23/2018 > [ 81.147026] [ T11] Workqueue: usb_hub_wq hub_event > [ 81.147267] [ T11] RIP: 0010:fq_flow_reset.constprop.0+0x12/0x140 [mac80211] > [ 81.147608] [ T11] Code: 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 48 89 f5 53 <48> 8b 5e 18 4c 8b 3e 48 85 db 74 6a 4c 8d 6e 18 49 89 fc 49 39 dd > [ 81.148175] [ T11] RSP: 0018:ffffcb54c009b918 EFLAGS: 00010202 > [ 81.148466] [ T11] RAX: ffff89629dff2328 RBX: ffff89629dff2328 RCX: ffff89628cb49210 > [ 81.148757] [ T11] RDX: 000000009dff2328 RSI: 000000009dff2320 RDI: ffff89628cb489c0 This is just ... weird. RDX and RSI are pretty much holding truncated pointers of RAX and RBX respectively? How does that happen? The Code: seems to be the very first few instructions of static void fq_flow_reset(struct fq *fq, struct fq_flow *flow, fq_skb_free_t free_func) { struct fq_tin *tin = flow->tin; struct sk_buff *skb; while ((skb = fq_flow_dequeue(fq, flow))) and thus of static struct sk_buff *fq_flow_dequeue(struct fq *fq, struct fq_flow *flow) { struct sk_buff *skb; lockdep_assert_held(&fq->lock); // not compiled skb = __skb_dequeue(&flow->queue); and offset of queue inside flow is 0x18: 18: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 1d: 41 57 push %r15 1f: 41 56 push %r14 21: 41 55 push %r13 23: 41 54 push %r12 25: 55 push %rbp 26: 48 89 f5 mov %rsi,%rbp 29: 53 push %rbx 2a:* 48 8b 5e 18 mov 0x18(%rsi),%rbx <-- trapping instruction 2e: 4c 8b 3e mov (%rsi),%r15 I guess fq_flow_reset() didn't get inlined, but fq_flow_dequeue() did, so that the calling convention uses RDI, RSI, RDX for static void fq_flow_reset(struct fq *fq, struct fq_flow *flow, fq_skb_free_t free_func) respectively, so RSI for 'flow', which matches the crash on __skb_dequeue(&flow->queue) being 2a:* 48 8b 5e 18 mov 0x18(%rsi),%rbx <-- trapping instruction But how did RSI get truncated? And free_func being such a pointer? > [ 81.151984] [ T11] ieee80211_txq_purge+0x3f/0x130 [mac80211 43c4902366977cd272d3ef7b3fb48467d12f0d58] calls it via fq_tin_reset(): static void fq_tin_reset(struct fq *fq, struct fq_tin *tin, fq_skb_free_t free_func) { struct list_head *head; struct fq_flow *flow; for (;;) { head = &tin->new_flows; if (list_empty(head)) { head = &tin->old_flows; if (list_empty(head)) break; } flow = list_first_entry(head, struct fq_flow, flowchain); fq_flow_reset(fq, flow, free_func); but that's about as far as I get ... I'm probably just chasing ghosts and completely wrong about all of this... > I have kernel dumps, vmcore dumps, whatever you may need, > any help is appreciated! If it still happens, I guess it might help to put some noinline annotations on some of the functions involved to get a better handle on what exactly is being passed. And run it in a VM ;-) johannes