Re: [PATCH 04/15] net: rnpgbe: Add get_capability mbx_fw ops support

Andrew Lunn <andrew@xxxxxxx> · Mon, 7 Jul 2025 14:09:23 +0200

On Mon, Jul 07, 2025 at 03:37:43PM +0800, Yibo Dong wrote:
> On Fri, Jul 04, 2025 at 08:25:12PM +0200, Andrew Lunn wrote:
> > > +/**
> > > + * mucse_fw_send_cmd_wait - Send cmd req and wait for response
> > > + * @hw: Pointer to the HW structure
> > > + * @req: Pointer to the cmd req structure
> > > + * @reply: Pointer to the fw reply structure
> > > + *
> > > + * mucse_fw_send_cmd_wait sends req to pf-cm3 mailbox and wait
> > > + * reply from fw.
> > > + *
> > > + * Returns 0 on success, negative on failure
> > > + **/
> > > +static int mucse_fw_send_cmd_wait(struct mucse_hw *hw,
> > > +				  struct mbx_fw_cmd_req *req,
> > > +				  struct mbx_fw_cmd_reply *reply)
> > > +{
> > > +	int err;
> > > +	int retry_cnt = 3;
> > > +
> > > +	if (!hw || !req || !reply || !hw->mbx.ops.read_posted)
> > 
> > Can this happen?
> > 
> > If this is not supposed to happen, it is better the driver opps, so
> > you get a stack trace and find where the driver is broken.
> > 
> Yes, it is not supposed to happen. So, you means I should remove this
> check in order to get opps when this condition happen?

You should remove all defensive code. Let is explode with an Opps, so
you can find your bugs.

> > > +		return -EINVAL;
> > > +
> > > +	/* if pcie off, nothing todo */
> > > +	if (pci_channel_offline(hw->pdev))
> > > +		return -EIO;
> > 
> > What can cause it to go offline? Is this to do with PCIe hotplug?
> > 
> Yes, I try to get a PCIe hotplug condition by 'pci_channel_offline'.
> If that happens, driver should never do bar-read/bar-write, so return
> here.

I don't know PCI hotplug too well, but i assume the driver core will
call the .release function. Can this function be called as part of
release? What actually happens on the PCI bus when you try to access a
device which no longer exists?

How have you tested this? Do you have the ability to do a hot{un}plug?

> > > +	if (mutex_lock_interruptible(&hw->mbx.lock))
> > > +		return -EAGAIN;
> > 
> > mutex_lock_interruptable() returns -EINTR, which is what you should
> > return, not -EAGAIN.
> > 
> Got it, I should return '-EINTR' here.

No, you should return whatever mutex_lock_interruptable()
returns. Whenever you call a function which returns an error code, you
should pass that error code up the call stack. Never replace one error
code with another.

> > > +	if (reply->error_code)
> > > +		return -reply->error_code;
> > 
> > The mbox is using linux error codes? 
> > 
> It is used only between driver and fw, yay be just samply like this: 
> 0     -- no error
> not 0 -- error
> So, it is not using linux error codes.

Your functions should always use linux/POSIX error codes. So if your
firmware says an error has happened, turn it into a linux/POSIX error
code. EINVAL, TIMEDOUT, EIO, whatever makes the most sense.

	Andrew