On 2025/4/15 16:45, Shameerali Kolothum Thodi wrote: > > >> -----Original Message----- >> From: liulongfang <liulongfang@xxxxxxxxxx> >> Sent: Friday, April 11, 2025 4:59 AM >> To: alex.williamson@xxxxxxxxxx; jgg@xxxxxxxxxx; Shameerali Kolothum >> Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>; Jonathan Cameron >> <jonathan.cameron@xxxxxxxxxx> >> Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >> linuxarm@xxxxxxxxxxxxx; liulongfang <liulongfang@xxxxxxxxxx> >> Subject: [PATCH v7 5/6] hisi_acc_vfio_pci: bugfix live migration function >> without VF device driver >> >> If the VF device driver is not loaded in the Guest OS and we attempt to >> perform device data migration, the address of the migrated data will >> be NULL. >> The live migration recovery operation on the destination side will >> access a null address value, which will cause access errors. >> >> Therefore, live migration of VMs without added VF device drivers >> does not require device data migration. >> In addition, when the queue address data obtained by the destination >> is empty, device queue recovery processing will not be performed. >> >> Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live >> migration") >> Signed-off-by: Longfang Liu <liulongfang@xxxxxxxxxx> >> Reviewed-by: Shameer Kolothum >> <shameerali.kolothum.thodi@xxxxxxxxxx> >> --- >> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 22 +++++++++++++------ >> 1 file changed, 15 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >> index cadc82419dca..d12a350440d3 100644 >> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c >> @@ -426,13 +426,6 @@ static int vf_qm_check_match(struct >> hisi_acc_vf_core_device *hisi_acc_vdev, >> return -EINVAL; >> } >> >> - ret = qm_write_regs(vf_qm, QM_VF_STATE, &vf_data->vf_qm_state, >> 1); >> - if (ret) { >> - dev_err(dev, "failed to write QM_VF_STATE\n"); >> - return ret; >> - } >> - >> - hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state; >> hisi_acc_vdev->match_done = true; >> return 0; >> } >> @@ -498,6 +491,20 @@ static int vf_qm_load_data(struct >> hisi_acc_vf_core_device *hisi_acc_vdev, >> if (migf->total_length < sizeof(struct acc_vf_data)) >> return -EINVAL; >> >> + if (!vf_data->eqe_dma || !vf_data->aeqe_dma || >> + !vf_data->sqc_dma || !vf_data->cqc_dma) { >> + dev_info(dev, "resume dma addr is NULL!\n"); >> + hisi_acc_vdev->vf_qm_state = QM_NOT_READY; >> + return 0; >> + } > > > I am still not fully understood why we need the above check. The only case as > far as I can think of where this will happen is when the source VM Guest has > not loaded the ACC driver. And we do take care of that already using vf_qm_state and > checking total_length == QM_MATCH_SIZE. > > Have you seen this happening in any other scenario during your tests? > This is a problem that was discovered and fixed in previous abnormal scenario tests. In the abnormal tests, abnormal error handling was performed and real-time migration was executed. However, due to the lack of this check, the device became abnormal after migration. Thanks, Longfang. > Thanks, > Shameer > >> + >> + ret = qm_write_regs(qm, QM_VF_STATE, &vf_data->vf_qm_state, 1); >> + if (ret) { >> + dev_err(dev, "failed to write QM_VF_STATE\n"); >> + return -EINVAL; >> + } >> + hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state; >> + >> qm->eqe_dma = vf_data->eqe_dma; >> qm->aeqe_dma = vf_data->aeqe_dma; >> qm->sqc_dma = vf_data->sqc_dma; >> @@ -1531,6 +1538,7 @@ static int hisi_acc_vfio_pci_migrn_init_dev(struct >> vfio_device *core_vdev) >> hisi_acc_vdev->vf_id = pci_iov_vf_id(pdev) + 1; >> hisi_acc_vdev->pf_qm = pf_qm; >> hisi_acc_vdev->vf_dev = pdev; >> + hisi_acc_vdev->vf_qm_state = QM_NOT_READY; >> mutex_init(&hisi_acc_vdev->state_mutex); >> mutex_init(&hisi_acc_vdev->open_mutex); >> >> -- >> 2.24.0 > > . >