On Thu, May 22, 2025 at 06:19:56PM +0200, Niklas Cassel wrote: > Hello there, > > > As you know the reset_slot() callback patches were merged recently. > > Wilfred and I (mostly Wilfred), have been debugging DMA issues after the > reset_slot() callback has been invoked. The issue is reproduced when MPS > configuration is set to performance, but might be applicable for other > MPS configurations as well. The problem appears to be that reset_slot() > feature does not respect/restore the MPS configuration. > > > When having two Rock 5Bs, one in RC mode, one in EP mode: > > # Boot with MPS bus performance > pci=pcie_bus_perf on kernel command line > > # lspci RC > lspci -vvvs 0000:00:00.0 | grep -E MaxPayload > DevCap: MaxPayload 256 bytes, PhantFunc 0 > DevCtl: MaxPayload 256 bytes, MaxReadReq 256 bytes > > # Run pcitests > All tests pass > > # Perform hot reset of EP > echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset > > # lspci RC > lspci -vvvs 0000:00:00.0 | grep -E MaxPayload > DevCap: MaxPayload 256 bytes, PhantFunc 0 > DevCtl: MaxPayload 128 bytes, MaxReadReq 512 bytes > > # Run pcitests > FAIL pci_ep_data_transfer.dma.READ_TEST > Compared to before, DMA read test now fails. > > Wilfred has been able to track this down to being because MPS in the RC > is different before/after a ->reset_slot(), and simply re-storing MPS using > a dirty hack, the DMA read issues go away. > > > > My first thinking was that pci_configure_device() (and thus pci_configure_mps()) > was not called, > so I added some prints in pci_configure_mps(). > pci_configure_mps() was not called for RC nor EP during ->reset_slot() > (I did not check if pcie_bus_configure_settings() was called.) > > > I tried Mani's earlier patch instead of what is queued up: > https://lore.kernel.org/linux-pci/20250221172309.120009-2-manivannan.sadhasivam@xxxxxxxxxx/ > > And in this version pci_configure_mps() does get called, > for both the RC and EP during ->retrain_link() > (I did not check if pcie_bus_configure_settings() was called.) > > But... MaxPayload was still incorrectly set after retrain_link(). > > > I think the solution is to add a call to add a pcie_bus_configure_settings() > call in pcie_do_recovery() / pci_host_recover_slots() / pci_host_reset_slot() / > pcibios_reset_secondary_bus(). > > > Or possibly a: > list_for_each_entry(child, &bus->children, node) > pcie_bus_configure_settings(child); > > as done in pci_host_probe(). > > I'm not sure if we need to make sure that pci_configure_device() / > pci_configure_mps() gets called as well. Since we did reset the endpoint, > it seems that calling pci_configure_mps() does make sense. > > > Normally, I would wait for Wilfred and myself to come up with a nice fix, > and test it, but, considering that the ->reset_slot() patches are queued > up already, and the merge window is opening soon, I'm sharing our findings > on the list, as it might take some time to come up with a nice solution. > Thanks Niklas for reporting and Wilfred for debugging! I will submit the patches for fixing this issue along with couple of other comments raised by Lukas. - Mani -- மணிவண்ணன் சதாசிவம்