Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)

Daniele Palmas <dnlplm@xxxxxxxxx> · Mon, 18 Aug 2025 21:49:13 +0200

Hi Martin,

Il giorno dom 17 ago 2025 alle ore 19:01 Martin Maurer
<martin.maurer@xxxxxxxxx> ha scritto:
>
> Hi Daniele,
>
> many thanks for your reply!
>
> I can only partly open
>
> https://www.spinics.net
>
> pages, often pages time out...
>
> Have I understood correctly, that there is a known bug, but it was not
> fixed (from 2020 till now).
>
> But as workaround enabling qmux/qmimux could work?

If the problem is the same, it should. If your kernel version supports
the passthrough sysfs file, you can also use the rmnet module (much
better than the inbox qmap implementation).

If enabling QMAP is too complicated in your setup, you can just try to
increase the rx_urb_size by increasing the mtu (at least 2048).

Regards,
Daniele

> Best regards,
>
> Martin
>
>
>
> Am 17.08.2025 um 17:22 schrieb Daniele Palmas:
> > Hello Martin,
> >
> > Il giorno dom 17 ago 2025 alle ore 17:09 Martin Maurer
> > <martin.maurer@xxxxxxxxx> ha scritto:
> >> Hello Michał, hello Mathias at all,
> >>
> >> many thanks for your answers!
> >>
> >> I have tried if I can reproduce it with a AMD Linux PC, but
> >> unfortunately I was not able to reproduce (but setup is a bit different).
> >>
> >> So I went back to Raspberry Pi Compute Module 5, where I mainly
> >> connected the radio module (Quectel RM520N-GL) via USB3,
> >>
> >> and installed a Wifi access point. All data/all connections from Wifi
> >> access point are routed directly via wwan0 to radio module.
> >>
> >> This is currently my easiest setup to be able to reproduce the error.
> >> Mostly in a few seconds.
> >>
> >> My knowledge in area Linux Kernel + USB is unfortunately not yet enough
> >> to analyze and fix it by myself.
> >>
> >> But I used the help of ChatGPT-5 to create an usbmon and xhci kernel trace.
> >>
> >> I create an usbmon trace as well as a trace from xhci (both recorded in
> >> parallel):
> >>
> >> https://www.file-upload.net/en/download-15523936/usbmon_bus5_20250817-150158.log.html
> >>
> >> https://www.file-upload.net/en/download-15523937/xhci_20250817-150158.trace.html
> >>
> >> This was the last output, my ping in a shell has shown:
> >>
> >> 64 bytes from 8.8.8.8: icmp_seq=2323 ttl=112 time=26.0 ms
> >> 64 bytes from 8.8.8.8: icmp_seq=2324 ttl=112 time=25.0 ms
> >> 64 bytes from 8.8.8.8: icmp_seq=2325 ttl=112 time=29.1 ms
> >> 64 bytes from 8.8.8.8: icmp_seq=2326 ttl=112 time=37.8 ms
> >>
> >> In parallel created more data traffic, but with ping I see first when IP
> >> data connection does not work stable anymore.
> >>
> >> According to ChatGPT-5 the following places contain errors:
> >>
> >> *** USBMON ***
> >>
> >> In your usbmon_bus5_20250817-150158.log:
> >>
> >> First -71 (EPROTO) on the QMI Bulk-IN (Bi:5:005:14): line 2161,
> >> timestamp 493245744
> >>
> >> 2161: ffffff8003c8cb40 493245744 C Bi:5:005:14 -71 0
> >>
> >> Just before that, there’s a -75 (EOVERFLOW) on the same IN EP, which is
> >> often the first sign of trouble: line 2159, timestamp 493245221
> >>
> > I did not have the chance to look at the usbmon traces so I'm not sure
> > that this is really the same scenario, but you could take a look at
> > the whole thread at
> > https://www.spinics.net/lists/netdev/msg635944.html
> >
> > If it is the same issue, basically, if you setup the data connection
> > with QMAP you should not face the issue.
> >
> > Regards,
> > Daniele
> >
> >> 2159: ffffff8003c8cd80 493245221 C Bi:5:005:14 -75 1024 = ...
> >>
> >> So the sequence is: several good completions → EOVERFLOW (-75) → then a
> >> stream of EPROTO (-71) errors on Bi:5:005:14, which kills further ping
> >> replies after your last good seq (2326).
> >>
> >>
> >> *** XHCI TRACE ***
> >>
> >> I found the first failure in your xHCI trace.
> >>
> >> First error line: line 8216
> >>
> >> Timestamp: 758267.000115
> >>
> >> Event: xhci_handle_event … type 'Transfer Event' … 'Error' … slot 1 ep
> >> 29 … len 1472
> >>
> >> Why ep 29? In xHCI, the endpoint context index is ep_index = 2 *
> >> ep_number + (direction), where direction is 0=OUT, 1=IN.
> >> So for Bulk IN ep 14: 2*14+1 = 29 → that’s your IN 0x87 pipe.
> >>
> >> Right after that line you can see the driver react:
> >>
> >> xhci_handle_transfer … length 1472 … (the failed TD)
> >>
> >> xhci_queue_command: Reset Endpoint Command … ep 29 (host tries to recover)
> >>
> >> xhci_handle_event: … 'Command Completion Event' (reset completes)
> >>
> >> But from this point on, completions for that IN EP correspond to usbmon
> >> -71 (EPROTO) — matching what you saw.
> >>
> >>
> >> Does this give a clue, where it could be coming from?
> >>
> >> It is 100% reproduceable in a few seconds on Raspberry Pi Ccompute
> >> Module 5 (and I same behaviour on different kernel of i.MX8MP).
> >>
> >> Could it be a hardware problem? I already tried different radio module
> >> (all Qualcomm, X62/X65 and X72/X75),
> >>
> >> different cables (all same length, all from same source), different eval
> >> board for the M.2 radio modules (but from same source).
> >>
> >>
> >> Can you give me a hint, what to try next?
> >>
> >>
> >> ChatGPT-5 pinpoints me to try to disable LPM for USB3, could this be a
> >> next step? Or is it something  else?
> >>
> >>
> >> Many thanks for your help!
> >>
> >> Best regards,
> >>
> >> Martin
> >>
> >>