Re: [PATCH v2 2/2] cxl: add support for cxl reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


I'm sorry I missed your reply. Some comments below.


On 4/8/25 00:45, Srirangan Madhavan wrote:
Thank you so much for the comments Alejandro, Lukas and Bjorn.
My apologies for the late response. I was out of office for a few weeks.
I am picking up this patch again and adding my responses.

I think cxl_reset_start after the *_prepare call makes more sense here.
@Alejandro, Just for clarity,
do you mean I should rename the cxl_reset_init -> cxl_reset_start?


Yes, that's my view. init and prepare are similar, so prepare or init, then start.


I do not know how safe is this. IMO, this needs to be synchronized by
the accel driver which could imply to tell user space first. In our case
it would imply to stop rx queues in the netdev for CXL.cache, and tx
queues for CXL.mem. Doing it unconditionally could make current CXL
transactions to stall ... although it could be argued the reset event
implies something is broken, but let's try to do it properly if there is
a chance of the system not unreliable.
Regarding this, we feel that the reset framework already
has a *reset_prepare and is called by the Linux kernel, before initiating the steps
for acutal CXL reset. During this reset prepare is when the accel driver should
quiesce its device. In this case, that would imply stopping the rx & tx and
draining the queues. Is there any particular reason this wouldn't work/be sufficient?


I did not see the connection to the driver/device in the patch so I assumed it was not there, but after your comment, I studied the code and it is just a matter of implementing such reset prepare function in our driver, and I think that solves the problem I saw.


One thing this does not cover, and I do not know if it should, is the
fact that the device CXL regs will be reset, so the question is if the
old values should be restored or the device/driver should go through the
same initialization, if a hotplug device, or do it specifically if
already present at boot time and the BIOS doing that first
initialization. In one case the restoration needs to happen, in the
other the old values/objects need to be removed. I think the second case
is more problematic because this is likely involving CXL root complex
configuration performed by the BIOS ... Not trivial at all IMO.
Here again, we think that the accel driver is the one that should be doing this.
In our case, when the reset done call is made, the driver before making the device
available to be reopened and used, needs to restore the config space/DVSEC etc. to
their prior state. During the reset_prepare stage these need to be saved. Since this
is how SBR is handled currently even in the PCIe world, (for ex. AER/EEH handling framework)
where driver is responsible for save and restore of the regs, and it is also not possible
for the kernel to know exactly what to save and restore for each specific type 2 devices,
it seems logical to do the same here.


At the time I wrote that comment, it was not clear in my mind how this should work, and I had doubts about the BIOS.

But I can say now all this is fine and doable.


From you comment (not in the thread but through discord) about being the sfc driver the client for this functionality, I think it could be. I'm not sure if as part of the current Type2 patchset effort or a follow-up work. I'll do some work the next week for seeing the implications.


Thank you


+1  The reset-related content in drivers/pci/pci.c has been growing
recently.  Maybe we should consider moving it all to a reset.c file.
This makes sense. I'll prepare a patch to move the reset code and
compile out CXL specific parts while making any changes for the next version.

Thank you.

Regards,
Srirangan.
________________________________________
From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
Sent: Friday, February 21, 2025 4:13 PM
To: Lukas Wunner
Cc: Srirangan Madhavan; Davidlohr Bueso; Jonathan Cameron; Dave Jiang; Alison Schofield; Vishal Verma; Ira Weiny; Dan Williams; Zhi Wang; Vishal Aslot; Shanker Donthineni; linux-cxl@xxxxxxxxxxxxxxx; Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx
Subject: Re: [PATCH v2 2/2] cxl: add support for cxl reset

External email: Use caution opening links or attachments


On Fri, Feb 21, 2025 at 11:45:56AM +0100, Lukas Wunner wrote:
On Thu, Feb 20, 2025 at 08:39:06PM -0800, Srirangan Madhavan wrote:
Type 2 devices are being introduced and will require finer-grained
reset mechanisms beyond bus-wide reset methods.

Add support for CXL reset per CXL v3.2 Section 9.6/9.7

Signed-off-by: Srirangan Madhavan <smadhavan@xxxxxxxxxx>
---
  drivers/pci/pci.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++
drivers/pci/pci.c is basically a catch-all for anything that doesn't fit
in one of the other .c files in drivers/pci.  I'm slightly worried that
this (otherwise legitimate) patch increases the clutter in pci.c further,
rendering it unmaintainable in the long term.
+1  The reset-related content in drivers/pci/pci.c has been growing
recently.  Maybe we should consider moving it all to a reset.c file.

At the very least, I'm wondering if this can be #ifdef'ed to
CONFIG_CXL_PCI?

One idea would be to move this newly added reset method, as well as the
existing cxl_reset_bus_function(), to a new drivers/pci/cxl.c file.

I guess moving it to drivers/cxl/ isn't an option because cxl can be
modular.

Another idea would be to move all the reset handling (which makes up
a significant portion of pci.c) to a separate drivers/pci/reset.c.
This might be beyond the scope of your patch, but in the interim,
maybe at least an #ifdef can be added because the PCI core is also
used e.g. on memory-constrained wifi routers which don't care about
CXL at all.
Agree, we'll need some way to make this optional.

Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux