Re: [git bisect] rsi_usb oops

Johannes Berg <johannes@xxxxxxxxxxxxxxxx> · Tue, 22 Jul 2025 17:52:28 +0200

Hi,

> Commit 378677eb8f44621ecc9ce659f7af61e5baa94d81 ("wifi:
> mac80211: Purge vif txq in ieee80211_do_stop()") seems to
> have made rsi_usb/rsi_91x cause a kernel panic when
> removing the USB while the interface is up.

So it's been a while ... is this still happening?

> == USB disconnected ==
> [   81.093884] [     T11] usb 1-2: USB disconnect, device number 4
> [   81.145395] [     T11] BUG: unable to handle page fault for address: 000000009dff2338
> [   81.145637] [     T11] #PF: supervisor read access in kernel mode
> [   81.145868] [     T11] #PF: error_code(0x0000) - not-present page
> [   81.146096] [     T11] PGD 0 P4D 0 
> [   81.146323] [     T11] Oops: Oops: 0000 [#1] SMP NOPTI
> [   81.146548] [     T11] CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Kdump: loaded Not tainted 6.15.0 #1 PREEMPT(voluntary)  c74d5f1746d8801a78fe4695a51ca9b00b89ab7e
> [   81.146790] [     T11] Hardware name: Dell Inc. Latitude E7250/0TPHC4, BIOS A19 01/23/2018
> [   81.147026] [     T11] Workqueue: usb_hub_wq hub_event
> [   81.147267] [     T11] RIP: 0010:fq_flow_reset.constprop.0+0x12/0x140 [mac80211]
> [   81.147608] [     T11] Code: 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 48 89 f5 53 <48> 8b 5e 18 4c 8b 3e 48 85 db 74 6a 4c 8d 6e 18 49 89 fc 49 39 dd
> [   81.148175] [     T11] RSP: 0018:ffffcb54c009b918 EFLAGS: 00010202
> [   81.148466] [     T11] RAX: ffff89629dff2328 RBX: ffff89629dff2328 RCX: ffff89628cb49210
> [   81.148757] [     T11] RDX: 000000009dff2328 RSI: 000000009dff2320 RDI: ffff89628cb489c0

This is just ... weird. RDX and RSI are pretty much holding truncated
pointers of RAX and RBX respectively? How does that happen?

The Code: seems to be the very first few instructions of

static void fq_flow_reset(struct fq *fq,
                          struct fq_flow *flow,
                          fq_skb_free_t free_func)
{
        struct fq_tin *tin = flow->tin;
        struct sk_buff *skb;

        while ((skb = fq_flow_dequeue(fq, flow)))

and thus of

static struct sk_buff *fq_flow_dequeue(struct fq *fq,
                                       struct fq_flow *flow)
{
        struct sk_buff *skb;

        lockdep_assert_held(&fq->lock); // not compiled

        skb = __skb_dequeue(&flow->queue);

and offset of queue inside flow is 0x18:

  18:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  1d:	41 57                	push   %r15
  1f:	41 56                	push   %r14
  21:	41 55                	push   %r13
  23:	41 54                	push   %r12
  25:	55                   	push   %rbp
  26:	48 89 f5             	mov    %rsi,%rbp
  29:	53                   	push   %rbx
  2a:*	48 8b 5e 18          	mov    0x18(%rsi),%rbx		<-- trapping instruction
  2e:	4c 8b 3e             	mov    (%rsi),%r15

I guess fq_flow_reset() didn't get inlined, but fq_flow_dequeue() did,
so that the calling convention uses RDI, RSI, RDX for  

static void fq_flow_reset(struct fq *fq,
                          struct fq_flow *flow,
                          fq_skb_free_t free_func)

respectively, so RSI for 'flow', which matches the crash on
__skb_dequeue(&flow->queue) being 

  2a:*	48 8b 5e 18          	mov    0x18(%rsi),%rbx		<-- trapping instruction

But how did RSI get truncated? And free_func being such a pointer?

> [   81.151984] [     T11]  ieee80211_txq_purge+0x3f/0x130 [mac80211 43c4902366977cd272d3ef7b3fb48467d12f0d58]

calls it via fq_tin_reset():

static void fq_tin_reset(struct fq *fq,
                         struct fq_tin *tin,
                         fq_skb_free_t free_func)
{
        struct list_head *head;
        struct fq_flow *flow;

        for (;;) {
                head = &tin->new_flows;
                if (list_empty(head)) {
                        head = &tin->old_flows;
                        if (list_empty(head))
                                break;
                }

                flow = list_first_entry(head, struct fq_flow, flowchain);
                fq_flow_reset(fq, flow, free_func);

but that's about as far as I get ... I'm probably just chasing ghosts
and completely wrong about all of this...

> I have kernel dumps, vmcore dumps, whatever you may need,
> any help is appreciated!

If it still happens, I guess it might help to put some noinline
annotations on some of the functions involved to get a better handle on
what exactly is being passed. And run it in a VM ;-)

johannes