On 9/10/2025 10:07 AM, Gregory Price wrote: > Hi Terry, > > On Tue, Aug 26, 2025 at 08:35:38PM -0500, Terry Bowman wrote: >> Introduce cxl_mask_proto_interrupts() to call pci_aer_mask_internal_errors(). >> Add calls to cxl_mask_proto_interrupts() within CXL Port teardown for CXL >> Root Ports, CXL Downstream Switch Ports, CXL Upstream Switch Ports, and CXL >> Endpoints. Follow the same "bottom-up" approach used during CXL Port >> teardown. >> > ... >> @@ -1471,6 +1475,8 @@ static void cxl_detach_ep(void *data) >> { >> struct cxl_memdev *cxlmd = data; >> >> + cxl_mask_proto_interrupts(cxlmd->cxlds->dev); >> + >> for (int i = cxlmd->depth - 1; i >= 1; i--) { >> struct cxl_port *port, *parent_port; >> struct detach_ctx ctx = { > While testing v10 of this patch set, we found ourselves with a deadlock > on boot with the following stack in the hung task: > > [ 252.784440] <TASK> > [ 252.789090] schedule+0x5d6/0x1670 > [ 252.796629] ? schedule_preempt_disabled+0xa/0x10 > [ 252.807061] schedule_preempt_disabled+0xa/0x10 > [ 252.817108] __mutex_lock+0x245/0x7b0 > [ 252.825229] cxl_mask_proto_interrupts+0x23/0x50 > [ 252.835470] cxl_detach_ep+0x25/0x2e0 > > This occurs on a system which fails to probe ports fully due to the > duplicate id error resolved by the Delayed HB patch set. > > But it's concerning that there's a deadlock condition without that > patch set. Can you help try to eyeball this? I'm trying to get more > debug info, but testing system availability is limited. > > ~Gregory > Hi Greg, Thanks for pointing out. We saw this too and is reason the device lock was removed from v11's cxl_mask_proto_interrupts(). Looks like I need to force the EP detach. Do you have steps for recreating the duplicate port id ? Terry