Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Michał, hello Mathias at all,

many thanks for your answers!

I have tried if I can reproduce it with a AMD Linux PC, but unfortunately I was not able to reproduce (but setup is a bit different).

So I went back to Raspberry Pi Compute Module 5, where I mainly connected the radio module (Quectel RM520N-GL) via USB3,

and installed a Wifi access point. All data/all connections from Wifi access point are routed directly via wwan0 to radio module.

This is currently my easiest setup to be able to reproduce the error. Mostly in a few seconds.

My knowledge in area Linux Kernel + USB is unfortunately not yet enough to analyze and fix it by myself.

But I used the help of ChatGPT-5 to create an usbmon and xhci kernel trace.

I create an usbmon trace as well as a trace from xhci (both recorded in parallel):

https://www.file-upload.net/en/download-15523936/usbmon_bus5_20250817-150158.log.html

https://www.file-upload.net/en/download-15523937/xhci_20250817-150158.trace.html

This was the last output, my ping in a shell has shown:

64 bytes from 8.8.8.8: icmp_seq=2323 ttl=112 time=26.0 ms
64 bytes from 8.8.8.8: icmp_seq=2324 ttl=112 time=25.0 ms
64 bytes from 8.8.8.8: icmp_seq=2325 ttl=112 time=29.1 ms
64 bytes from 8.8.8.8: icmp_seq=2326 ttl=112 time=37.8 ms

In parallel created more data traffic, but with ping I see first when IP data connection does not work stable anymore.

According to ChatGPT-5 the following places contain errors:

*** USBMON ***

In your usbmon_bus5_20250817-150158.log:

First -71 (EPROTO) on the QMI Bulk-IN (Bi:5:005:14): line 2161, timestamp 493245744

2161: ffffff8003c8cb40 493245744 C Bi:5:005:14 -71 0

Just before that, there’s a -75 (EOVERFLOW) on the same IN EP, which is often the first sign of trouble: line 2159, timestamp 493245221

2159: ffffff8003c8cd80 493245221 C Bi:5:005:14 -75 1024 = ...

So the sequence is: several good completions → EOVERFLOW (-75) → then a stream of EPROTO (-71) errors on Bi:5:005:14, which kills further ping replies after your last good seq (2326).


*** XHCI TRACE ***

I found the first failure in your xHCI trace.

First error line: line 8216

Timestamp: 758267.000115

Event: xhci_handle_event … type 'Transfer Event' … 'Error' … slot 1 ep 29 … len 1472

Why ep 29? In xHCI, the endpoint context index is ep_index = 2 * ep_number + (direction), where direction is 0=OUT, 1=IN.
So for Bulk IN ep 14: 2*14+1 = 29 → that’s your IN 0x87 pipe.

Right after that line you can see the driver react:

xhci_handle_transfer … length 1472 … (the failed TD)

xhci_queue_command: Reset Endpoint Command … ep 29 (host tries to recover)

xhci_handle_event: … 'Command Completion Event' (reset completes)

But from this point on, completions for that IN EP correspond to usbmon -71 (EPROTO) — matching what you saw.


Does this give a clue, where it could be coming from?

It is 100% reproduceable in a few seconds on Raspberry Pi Ccompute Module 5 (and I same behaviour on different kernel of i.MX8MP).

Could it be a hardware problem? I already tried different radio module (all Qualcomm, X62/X65 and X72/X75),

different cables (all same length, all from same source), different eval board for the M.2 radio modules (but from same source).


Can you give me a hint, what to try next?


ChatGPT-5 pinpoints me to try to disable LPM for USB3, could this be a next step? Or is it something  else?


Many thanks for your help!

Best regards,

Martin





[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux