On Mon, Sep 01, 2025 at 10:34:23AM +0200, Daniel Wagner wrote: > The test is removing the ports while the host driver is about to > reconnect and accesses a stale pointer. > > nvme_fc_create_association is calling nvme_fc_ctlr_inactive_on_rport in > the error path. The problem is that nvme_fc_create_association gets half > through the setup and then fails. In the cleanup path > > dev_warn(ctrl->ctrl.device, > "NVME-FC{%d}: create_assoc failed, assoc_id %llx ret %d\n", > ctrl->cnum, ctrl->association_id, ret); > > is issued and then nvme_fc_ctlr_inactive_on_rport is called. And there > is the log message above, so it's clear the error path is taken. > > But the thing is fcloop is not supposed to remove the ports when the > host driver is still using it. So there is a race window where it's > possible to enter nvme_fc_create_assocation and fcloop removing the > ports. > > So between nvme_fc_create_assocation and nvme_fc_ctlr_active_on_rport. I think the problem is that nvme_fc_create_association is not holding the rport locks when checking the port_state and marking the rport active. This races with nvme_fc_unregister_remoteport. diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 3e12d4683ac7..03987f497a5b 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -3032,11 +3032,17 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl) ++ctrl->ctrl.nr_reconnects; - if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) + spin_lock_irqsave(&ctrl->rport->lock, flags); + if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) { + spin_unlock_irqrestore(&ctrl->rport->lock, flags); return -ENODEV; + } - if (nvme_fc_ctlr_active_on_rport(ctrl)) + if (nvme_fc_ctlr_active_on_rport(ctrl)) { + spin_unlock_irqrestore(&ctrl->rport->lock, flags); return -ENOTUNIQ; + } + spin_unlock_irqrestore(&ctrl->rport->lock, flags); dev_info(ctrl->ctrl.device, "NVME-FC{%d}: create association : host wwpn 0x%016llx " I'll to reproduce it and see if this patch does make a difference.