On Tue, Apr 15, 2025 at 01:07:23PM +0530, Manivannan Sadhasivam wrote: > On Thu, Apr 03, 2025 at 09:51:31AM +0200, Niklas Cassel wrote: > > Hello Krishna, > > > > On Thu, Apr 03, 2025 at 09:26:08AM +0530, Krishna Chaitanya Chundru wrote: > > > On 4/2/2025 6:14 PM, Niklas Cassel wrote: > > > > > > > > Out of curiosity, I tried something similar to on pcie-dw-rockchip.c > > > > > > > > Simply having a ->shutdown() callback that only calls dw_pcie_host_deinit() > > > > was enough for me to produce: > > > > > > > > [ 40.209887] r8169 0004:41:00.0 eth0: Link is Down > > > > [ 40.216572] ------------[ cut here ]------------ > > > > [ 40.216986] called from state HALTED > > > > [ 40.217317] WARNING: CPU: 7 PID: 265 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0 > > > > [ 40.218024] Modules linked in: rk805_pwrkey hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_common vidf > > > > [ 40.220267] CPU: 7 UID: 0 PID: 265 Comm: init Not tainted 6.14.0+ #134 PREEMPT > > > > [ 40.220908] Hardware name: Radxa ROCK 5B (DT) > > > > [ 40.221289] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > > > [ 40.221899] pc : phy_stop+0x134/0x1a0 > > > > [ 40.222222] lr : phy_stop+0x134/0x1a0 > > > > [ 40.222546] sp : ffff800082213820 > > > > [ 40.222836] x29: ffff800082213820 x28: ffff45ec84b30000 x27: 0000000000000000 > > > > [ 40.223463] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbe8df7fde030 > > > > [ 40.224088] x23: ffff800082213990 x22: 0000000000000001 x21: ffff45ec80e10000 > > > > [ 40.224714] x20: ffff45ec82cb40c8 x19: ffff45ec82ccc000 x18: 0000000000000006 > > > > [ 40.225340] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0720072007200720 > > > > [ 40.225966] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 > > > > [ 40.226592] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffbe8df556469c > > > > [ 40.227217] x8 : 0000000000000268 x7 : ffffbe8df7a48648 x6 : ffffbe8df7a48648 > > > > [ 40.227842] x5 : 0000000000017fe8 x4 : 0000000000000000 x3 : 0000000000000000 > > > > [ 40.228468] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff45ec84b30000 > > > > [ 40.229093] Call trace: > > > > [ 40.229308] phy_stop+0x134/0x1a0 (P) > > > > [ 40.229634] rtl8169_down+0x34/0x280 > > > > [ 40.229952] rtl8169_close+0x64/0x100 > > > > [ 40.230275] __dev_close_many+0xbc/0x1f0 > > > > [ 40.230621] dev_close_many+0x94/0x160 > > > > [ 40.230951] unregister_netdevice_many_notify+0x14c/0x9c0 > > > > [ 40.231426] unregister_netdevice_queue+0xe4/0x100 > > > > [ 40.231848] unregister_netdev+0x2c/0x60 > > > > [ 40.232193] rtl_remove_one+0xa0/0xe0 > > > > [ 40.232517] pci_device_remove+0x4c/0xf8 > > > > [ 40.232864] device_remove+0x54/0x90 > > > > [ 40.233182] device_release_driver_internal+0x1d4/0x238 > > > > [ 40.233643] device_release_driver+0x20/0x38 > > > > [ 40.234019] pci_stop_bus_device+0x84/0xe0 > > > > [ 40.234381] pci_stop_bus_device+0x40/0xe0 > > > > [ 40.234741] pci_stop_root_bus+0x48/0x80 > > > > [ 40.235087] dw_pcie_host_deinit+0x34/0xe0 > > > > [ 40.235452] rockchip_pcie_shutdown+0x24/0x48 > > > > [ 40.235839] platform_shutdown+0x2c/0x48 > > > > [ 40.236187] device_shutdown+0x150/0x278 > > > > [ 40.236533] kernel_restart+0x4c/0xb8 > > > > [ 40.236859] __do_sys_reboot+0x178/0x280 > > > > [ 40.237206] __arm64_sys_reboot+0x2c/0x40 > > > > [ 40.237561] invoke_syscall+0x50/0x120 > > > > [ 40.237891] el0_svc_common.constprop.0+0x48/0xf0 > > > > [ 40.238305] do_el0_svc+0x24/0x38 > > > > [ 40.238597] el0_svc+0x30/0xd0 > > > > [ 40.238868] el0t_64_sync_handler+0x10c/0x138 > > > > [ 40.239251] el0t_64_sync+0x198/0x1a0 > > > > [ 40.239575] ---[ end trace 0000000000000000 ]--- > > > > > > > Hi Niklas, > > > > > > The issue which you are seeing with specific to the RealTek ethernet > > > driver and should be fixed by RealTek driver nothing to do with the host > > > controller. > > > > The warning comes from: > > drivers/net/phy/phy.c:phy_stop() > > so from the networking phylib. > > > > Doing a simple: > > $ git grep -p phy_stop > > > > shows that practially all Ethernet drivers call phy_stop() from the > > .ndo_stop() callback. > > > > So after your suggested patch, you should see this warning appear with > > any NIC, if connected to your PCIe controller. > > > > I think the issue here is that phy_stop() is called without calling > corresponding phy_start(). This means that either rtl8169_up() is never called > or rtl8169_down() is called twice. phy_start() is called. > > I don't think this issue is applicable to all drivers. It'd be worth > investigating what is going wrong with this r8169 driver. In case you are curious, I added a dump_stack() before phy_stop() is called: First phy_stop() call: [ 21.733574] platform fc400000.usb: deferred probe pending: dwc3: failed to initialize core [ 23.827753] CPU: 6 UID: 0 PID: 238 Comm: init Not tainted 6.15.0-rc2+ #149 PREEMPT [ 23.827762] Hardware name: Radxa ROCK 5B (DT) [ 23.827764] Call trace: [ 23.827765] show_stack+0x20/0x40 (C) [ 23.827774] dump_stack_lvl+0x60/0x80 [ 23.827778] dump_stack+0x18/0x24 [ 23.827782] rtl8169_down+0x30/0x2a0 [ 23.827788] rtl_shutdown+0xb0/0xc0 [ 23.827792] pci_device_shutdown+0x3c/0x88 [ 23.827797] device_shutdown+0x150/0x278 [ 23.827802] kernel_restart+0x4c/0xb8 [ 23.827807] __do_sys_reboot+0x178/0x280 [ 23.827811] __arm64_sys_reboot+0x2c/0x40 [ 23.827816] invoke_syscall+0x50/0x120 [ 23.827822] el0_svc_common.constprop.0+0x48/0xf0 [ 23.827826] do_el0_svc+0x24/0x38 [ 23.827828] el0_svc+0x30/0xd0 [ 23.827834] el0t_64_sync_handler+0x10c/0x138 [ 23.827837] el0t_64_sync+0x198/0x1a0 [ 23.827841] calling phy_stop() rtl8169_down:4844 [ 23.834789] r8169 0004:41:00.0 eth0: Link is Down Second phy_stop() call: [ 23.841458] CPU: 6 UID: 0 PID: 238 Comm: init Not tainted 6.15.0-rc2+ #149 PREEMPT [ 23.841467] Hardware name: Radxa ROCK 5B (DT) [ 23.841468] Call trace: [ 23.841470] show_stack+0x20/0x40 (C) [ 23.841478] dump_stack_lvl+0x60/0x80 [ 23.841483] dump_stack+0x18/0x24 [ 23.841486] rtl8169_down+0x30/0x2a0 [ 23.841492] rtl8169_close+0x64/0x100 [ 23.841496] __dev_close_many+0xbc/0x1f0 [ 23.841502] dev_close_many+0x94/0x160 [ 23.841505] unregister_netdevice_many_notify+0x160/0x9d0 [ 23.841510] unregister_netdevice_queue+0xf0/0x100 [ 23.841515] unregister_netdev+0x2c/0x58 [ 23.841519] rtl_remove_one+0xa0/0xe0 [ 23.841524] pci_device_remove+0x4c/0xf8 [ 23.841528] device_remove+0x54/0x90 [ 23.841534] device_release_driver_internal+0x1d4/0x238 [ 23.841539] device_release_driver+0x20/0x38 [ 23.841544] pci_stop_bus_device+0x84/0xe0 [ 23.841548] pci_stop_bus_device+0x40/0xe0 [ 23.841552] pci_stop_root_bus+0x48/0x80 [ 23.841555] dw_pcie_host_deinit+0x34/0xe0 [ 23.841559] rockchip_pcie_shutdown+0x20/0x38 [ 23.841565] platform_shutdown+0x2c/0x48 [ 23.841571] device_shutdown+0x150/0x278 [ 23.841575] kernel_restart+0x4c/0xb8 [ 23.841580] __do_sys_reboot+0x178/0x280 [ 23.841584] __arm64_sys_reboot+0x2c/0x40 [ 23.841588] invoke_syscall+0x50/0x120 [ 23.841595] el0_svc_common.constprop.0+0x48/0xf0 [ 23.841598] do_el0_svc+0x24/0x38 [ 23.841601] el0_svc+0x30/0xd0 [ 23.841605] el0t_64_sync_handler+0x10c/0x138 [ 23.841609] el0t_64_sync+0x198/0x1a0 [ 23.841613] calling phy_stop() rtl8169_down:4844 So it seems that the driver's rtl_shutdown() calls rtl8169_net_suspend(), which calls rtl8169_down() which calls phy_stop(). If I compare rtl8169_close() with e.g. e1000e_close(), e1000e_close() does have a guard: if (netif_device_present(netdev)) { around the call to e1000e_down(), while rtl8169_close() does not have a similar guard around the call to rtl8169_down(). So, I think you are right, this is probably a r8169 driver specific problem after all. Kind regards, Niklas