Ashish, On 7/17/2025 3:37 AM, Kalra, Ashish wrote: > Hello Vasant, > > On 7/16/2025 4:42 AM, Vasant Hegde wrote: >> >> >> On 7/16/2025 12:57 AM, Ashish Kalra wrote: >>> From: Ashish Kalra <ashish.kalra@xxxxxxx> >>> >>> After a panic if SNP is enabled in the previous kernel then the kdump >>> kernel boots with IOMMU SNP enforcement still enabled. >>> >>> IOMMU device table register is locked and exclusive to the previous >>> kernel. Attempts to copy old device table from the previous kernel >>> fails in kdump kernel as hardware ignores writes to the locked device >>> table base address register as per AMD IOMMU spec Section 2.12.2.1. >>> >>> This results in repeated "Completion-Wait loop timed out" errors and a >>> second kernel panic: "Kernel panic - not syncing: timer doesn't work >>> through Interrupt-remapped IO-APIC". >>> >>> Reuse device table instead of copying device table in case of kdump >>> boot and remove all copying device table code. >>> >>> Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx> >>> --- >>> drivers/iommu/amd/init.c | 97 ++++++++++++---------------------------- >>> 1 file changed, 28 insertions(+), 69 deletions(-) >>> >>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c >>> index 32295f26be1b..18bd869a82d9 100644 >>> --- a/drivers/iommu/amd/init.c >>> +++ b/drivers/iommu/amd/init.c >>> @@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu) >>> >>> BUG_ON(iommu->mmio_base == NULL); >>> >>> + if (is_kdump_kernel()) >> >> This is fine.. but its becoming too many places with kdump check! I don't know >> what is the better way here. >> Is it worth to keep it like this -OR- add say iommu ops that way during init we >> check is_kdump_kernel() and adjust the ops ? >> >> @Joerg, any preference? >> >> .../... >>> break; >>> } >>> @@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu) >>> * This function finally enables all IOMMUs found in the system after >>> * they have been initialized. >>> * >>> - * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy >>> - * the old content of device table entries. Not this case or copy failed, >>> + * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse >>> + * the old content of device table entries. Not this case or reuse failed, >>> * just continue as normal kernel does. >>> */ >>> static void early_enable_iommus(void) >>> @@ -2926,18 +2885,18 @@ static void early_enable_iommus(void) >>> struct amd_iommu *iommu; >>> struct amd_iommu_pci_seg *pci_seg; >>> >>> - if (!copy_device_table()) { >>> + if (!reuse_device_table()) { >> >> Hmmm. What happens if SNP enabled and reuse_device_table() couldn't setup >> previous DTE? >> In non-SNP case it works fine as we can rebuild new DTE. But in SNP case we >> should fail the kdump right? >> > > Which will happen automatically, if we can't setup previous DTE for SNP case > then IOMMU commands will time-out and subsequenly cause a panic as IRQ remapping > won't be setup. But what is the point is proceeding when we know its going to fail? I think its better to fail here so that at least we know where/why it failed. -Vasant