Re: [PATCH v2 2/2] cxl: add support for cxl reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you so much for the comments Alejandro, Lukas and Bjorn.
My apologies for the late response. I was out of office for a few weeks.
I am picking up this patch again and adding my responses.

> I think cxl_reset_start after the *_prepare call makes more sense here.

@Alejandro, Just for clarity, 
do you mean I should rename the cxl_reset_init -> cxl_reset_start?

> I do not know how safe is this. IMO, this needs to be synchronized by
> the accel driver which could imply to tell user space first. In our case
> it would imply to stop rx queues in the netdev for CXL.cache, and tx
> queues for CXL.mem. Doing it unconditionally could make current CXL
> transactions to stall ... although it could be argued the reset event
> implies something is broken, but let's try to do it properly if there is
> a chance of the system not unreliable.

Regarding this, we feel that the reset framework already
has a *reset_prepare and is called by the Linux kernel, before initiating the steps
for acutal CXL reset. During this reset prepare is when the accel driver should 
quiesce its device. In this case, that would imply stopping the rx & tx and
draining the queues. Is there any particular reason this wouldn't work/be sufficient?

> One thing this does not cover, and I do not know if it should, is the
> fact that the device CXL regs will be reset, so the question is if the
> old values should be restored or the device/driver should go through the
> same initialization, if a hotplug device, or do it specifically if
> already present at boot time and the BIOS doing that first
> initialization. In one case the restoration needs to happen, in the
> other the old values/objects need to be removed. I think the second case
> is more problematic because this is likely involving CXL root complex
> configuration performed by the BIOS ... Not trivial at all IMO.

Here again, we think that the accel driver is the one that should be doing this.
In our case, when the reset done call is made, the driver before making the device
available to be reopened and used, needs to restore the config space/DVSEC etc. to
their prior state. During the reset_prepare stage these need to be saved. Since this
is how SBR is handled currently even in the PCIe world, (for ex. AER/EEH handling framework)
where driver is responsible for save and restore of the regs, and it is also not possible
for the kernel to know exactly what to save and restore for each specific type 2 devices,
it seems logical to do the same here.

> +1  The reset-related content in drivers/pci/pci.c has been growing
> recently.  Maybe we should consider moving it all to a reset.c file.

This makes sense. I'll prepare a patch to move the reset code and
compile out CXL specific parts while making any changes for the next version.

Thank you.

Regards,
Srirangan.
________________________________________
From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
Sent: Friday, February 21, 2025 4:13 PM
To: Lukas Wunner
Cc: Srirangan Madhavan; Davidlohr Bueso; Jonathan Cameron; Dave Jiang; Alison Schofield; Vishal Verma; Ira Weiny; Dan Williams; Zhi Wang; Vishal Aslot; Shanker Donthineni; linux-cxl@xxxxxxxxxxxxxxx; Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx
Subject: Re: [PATCH v2 2/2] cxl: add support for cxl reset

External email: Use caution opening links or attachments


On Fri, Feb 21, 2025 at 11:45:56AM +0100, Lukas Wunner wrote:
> On Thu, Feb 20, 2025 at 08:39:06PM -0800, Srirangan Madhavan wrote:
> > Type 2 devices are being introduced and will require finer-grained
> > reset mechanisms beyond bus-wide reset methods.
> >
> > Add support for CXL reset per CXL v3.2 Section 9.6/9.7
> >
> > Signed-off-by: Srirangan Madhavan <smadhavan@xxxxxxxxxx>
> > ---
> >  drivers/pci/pci.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++
>
> drivers/pci/pci.c is basically a catch-all for anything that doesn't fit
> in one of the other .c files in drivers/pci.  I'm slightly worried that
> this (otherwise legitimate) patch increases the clutter in pci.c further,
> rendering it unmaintainable in the long term.

+1  The reset-related content in drivers/pci/pci.c has been growing
recently.  Maybe we should consider moving it all to a reset.c file.

> At the very least, I'm wondering if this can be #ifdef'ed to
> CONFIG_CXL_PCI?
>
> One idea would be to move this newly added reset method, as well as the
> existing cxl_reset_bus_function(), to a new drivers/pci/cxl.c file.
>
> I guess moving it to drivers/cxl/ isn't an option because cxl can be
> modular.
>
> Another idea would be to move all the reset handling (which makes up
> a significant portion of pci.c) to a separate drivers/pci/reset.c.
> This might be beyond the scope of your patch, but in the interim,
> maybe at least an #ifdef can be added because the PCI core is also
> used e.g. on memory-constrained wifi routers which don't care about
> CXL at all.

Agree, we'll need some way to make this optional.

Bjorn





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux