On Wed, Apr 2, 2025 at 1:29 PM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote: > > On 4/2/25 15:40, Naveen Kumar P wrote: > > Questions: > > 1. What could cause periodic WRITE DMA timeouts followed by link resets? > > The drive not responding (drive failing for whatever reasons) > > > 2. Could this be a hardware-related issue (e.g., cabling, drive aging) > > or a kernel bug? > > It always can be a bug, but it looks like this is with a kernel version 4.4, > which is really old. Please try with the latest kernel. We do not debug older > kernels. > > Hardware (SSD) failing is more likely though. Your SSD is being operated with > NCQ turned off: Thank you for your response. Would re-enabling NCQ (if possible) help improve performance, or would it likely introduce instability? I observed that when the system is in this state (frequent WRITE DMA timeouts and link resets), running aplay from alsa-utils debian package results in the following error: aplay: pcm_write:2086: write error: Input/output error Could these storage timeouts be affecting aplay? Is there a known relationship between these two failures? > > ata2.00: FORCE: horkage modified (noncq) > > which indicates that it is somewhat buggy to start with... > > > 3. What additional debugging steps or kernel parameters would you > > recommend to further diagnose this issue? > > Try latest stable kernel (6.14). But I suspect you will see the same issue. > > > > > Complete dmesg log is attached to this email. > > I appreciate any guidance you can provide. Please let me know if > > further logs or details are required. > > The errors you see are timeouts. So it seems that your SSD is simply not > responding. This could thus be a hardware problem with your drive. > If you can, try testing with another (newer) SSD ? > > > -- > Damien Le Moal > Western Digital Research