Re: [PATCH 04/15] net: rnpgbe: Add get_capability mbx_fw ops support

Yibo Dong <dong100@xxxxxxxxx> · Tue, 8 Jul 2025 10:01:50 +0800

On Mon, Jul 07, 2025 at 02:09:23PM +0200, Andrew Lunn wrote:
> On Mon, Jul 07, 2025 at 03:37:43PM +0800, Yibo Dong wrote:
> > On Fri, Jul 04, 2025 at 08:25:12PM +0200, Andrew Lunn wrote:
> > > > +/**
> > > > + * mucse_fw_send_cmd_wait - Send cmd req and wait for response
> > > > + * @hw: Pointer to the HW structure
> > > > + * @req: Pointer to the cmd req structure
> > > > + * @reply: Pointer to the fw reply structure
> > > > + *
> > > > + * mucse_fw_send_cmd_wait sends req to pf-cm3 mailbox and wait
> > > > + * reply from fw.
> > > > + *
> > > > + * Returns 0 on success, negative on failure
> > > > + **/
> > > > +static int mucse_fw_send_cmd_wait(struct mucse_hw *hw,
> > > > +				  struct mbx_fw_cmd_req *req,
> > > > +				  struct mbx_fw_cmd_reply *reply)
> > > > +{
> > > > +	int err;
> > > > +	int retry_cnt = 3;
> > > > +
> > > > +	if (!hw || !req || !reply || !hw->mbx.ops.read_posted)
> > > 
> > > Can this happen?
> > > 
> > > If this is not supposed to happen, it is better the driver opps, so
> > > you get a stack trace and find where the driver is broken.
> > > 
> > Yes, it is not supposed to happen. So, you means I should remove this
> > check in order to get opps when this condition happen?
> 
> You should remove all defensive code. Let is explode with an Opps, so
> you can find your bugs.
> 

Got it.

> > > > +		return -EINVAL;
> > > > +
> > > > +	/* if pcie off, nothing todo */
> > > > +	if (pci_channel_offline(hw->pdev))
> > > > +		return -EIO;
> > > 
> > > What can cause it to go offline? Is this to do with PCIe hotplug?
> > > 
> > Yes, I try to get a PCIe hotplug condition by 'pci_channel_offline'.
> > If that happens, driver should never do bar-read/bar-write, so return
> > here.
> 
> I don't know PCI hotplug too well, but i assume the driver core will
> call the .release function. Can this function be called as part of
> release? What actually happens on the PCI bus when you try to access a
> device which no longer exists?
> 

This function maybe called as part of release:
->release
-->unregister_netdev
--->ndo_stop
---->this function
Based on what I have come across, some devices return 0xffffffff, while 
others maybe hang when try to access a device which no longer
exists.

> How have you tested this? Do you have the ability to do a hot{un}plug?
> 

I tested hot{un}plug with an ocp-card before.
But I think all the codes related to pcie hot{un}plug should be in a
separate patch, I should move it to that patch.

> > > > +	if (mutex_lock_interruptible(&hw->mbx.lock))
> > > > +		return -EAGAIN;
> > > 
> > > mutex_lock_interruptable() returns -EINTR, which is what you should
> > > return, not -EAGAIN.
> > > 
> > Got it, I should return '-EINTR' here.
> 
> No, you should return whatever mutex_lock_interruptable()
> returns. Whenever you call a function which returns an error code, you
> should pass that error code up the call stack. Never replace one error
> code with another.
> 

Ok, I see.

> > > > +	if (reply->error_code)
> > > > +		return -reply->error_code;
> > > 
> > > The mbox is using linux error codes? 
> > > 
> > It is used only between driver and fw, yay be just samply like this: 
> > 0     -- no error
> > not 0 -- error
> > So, it is not using linux error codes.
> 
> Your functions should always use linux/POSIX error codes. So if your
> firmware says an error has happened, turn it into a linux/POSIX error
> code. EINVAL, TIMEDOUT, EIO, whatever makes the most sense.
> 
> 	Andrew
> 

Got it, I will turn it into a linux/POSIX error code.

Thanks for your feedback.