On Mon, Jul 07, 2025 at 02:09:23PM +0200, Andrew Lunn wrote: > On Mon, Jul 07, 2025 at 03:37:43PM +0800, Yibo Dong wrote: > > On Fri, Jul 04, 2025 at 08:25:12PM +0200, Andrew Lunn wrote: > > > > +/** > > > > + * mucse_fw_send_cmd_wait - Send cmd req and wait for response > > > > + * @hw: Pointer to the HW structure > > > > + * @req: Pointer to the cmd req structure > > > > + * @reply: Pointer to the fw reply structure > > > > + * > > > > + * mucse_fw_send_cmd_wait sends req to pf-cm3 mailbox and wait > > > > + * reply from fw. > > > > + * > > > > + * Returns 0 on success, negative on failure > > > > + **/ > > > > +static int mucse_fw_send_cmd_wait(struct mucse_hw *hw, > > > > + struct mbx_fw_cmd_req *req, > > > > + struct mbx_fw_cmd_reply *reply) > > > > +{ > > > > + int err; > > > > + int retry_cnt = 3; > > > > + > > > > + if (!hw || !req || !reply || !hw->mbx.ops.read_posted) > > > > > > Can this happen? > > > > > > If this is not supposed to happen, it is better the driver opps, so > > > you get a stack trace and find where the driver is broken. > > > > > Yes, it is not supposed to happen. So, you means I should remove this > > check in order to get opps when this condition happen? > > You should remove all defensive code. Let is explode with an Opps, so > you can find your bugs. > Got it. > > > > + return -EINVAL; > > > > + > > > > + /* if pcie off, nothing todo */ > > > > + if (pci_channel_offline(hw->pdev)) > > > > + return -EIO; > > > > > > What can cause it to go offline? Is this to do with PCIe hotplug? > > > > > Yes, I try to get a PCIe hotplug condition by 'pci_channel_offline'. > > If that happens, driver should never do bar-read/bar-write, so return > > here. > > I don't know PCI hotplug too well, but i assume the driver core will > call the .release function. Can this function be called as part of > release? What actually happens on the PCI bus when you try to access a > device which no longer exists? > This function maybe called as part of release: ->release -->unregister_netdev --->ndo_stop ---->this function Based on what I have come across, some devices return 0xffffffff, while others maybe hang when try to access a device which no longer exists. > How have you tested this? Do you have the ability to do a hot{un}plug? > I tested hot{un}plug with an ocp-card before. But I think all the codes related to pcie hot{un}plug should be in a separate patch, I should move it to that patch. > > > > + if (mutex_lock_interruptible(&hw->mbx.lock)) > > > > + return -EAGAIN; > > > > > > mutex_lock_interruptable() returns -EINTR, which is what you should > > > return, not -EAGAIN. > > > > > Got it, I should return '-EINTR' here. > > No, you should return whatever mutex_lock_interruptable() > returns. Whenever you call a function which returns an error code, you > should pass that error code up the call stack. Never replace one error > code with another. > Ok, I see. > > > > + if (reply->error_code) > > > > + return -reply->error_code; > > > > > > The mbox is using linux error codes? > > > > > It is used only between driver and fw, yay be just samply like this: > > 0 -- no error > > not 0 -- error > > So, it is not using linux error codes. > > Your functions should always use linux/POSIX error codes. So if your > firmware says an error has happened, turn it into a linux/POSIX error > code. EINVAL, TIMEDOUT, EIO, whatever makes the most sense. > > Andrew > Got it, I will turn it into a linux/POSIX error code. Thanks for your feedback.