Hello Vasant and Sairaj, On 7/17/2025 1:05 AM, Vasant Hegde wrote: > Ashish, > > On 7/17/2025 3:37 AM, Kalra, Ashish wrote: >> Hello Vasant, >> >> On 7/16/2025 4:42 AM, Vasant Hegde wrote: >>> >>> >>> On 7/16/2025 12:57 AM, Ashish Kalra wrote: >>>> From: Ashish Kalra <ashish.kalra@xxxxxxx> >>>> >>>> After a panic if SNP is enabled in the previous kernel then the kdump >>>> kernel boots with IOMMU SNP enforcement still enabled. >>>> >>>> IOMMU device table register is locked and exclusive to the previous >>>> kernel. Attempts to copy old device table from the previous kernel >>>> fails in kdump kernel as hardware ignores writes to the locked device >>>> table base address register as per AMD IOMMU spec Section 2.12.2.1. >>>> >>>> This results in repeated "Completion-Wait loop timed out" errors and a >>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work >>>> through Interrupt-remapped IO-APIC". >>>> >>>> Reuse device table instead of copying device table in case of kdump >>>> boot and remove all copying device table code. >>>> >>>> Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx> >>>> --- >>>> drivers/iommu/amd/init.c | 97 ++++++++++++---------------------------- >>>> 1 file changed, 28 insertions(+), 69 deletions(-) >>>> >>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c >>>> index 32295f26be1b..18bd869a82d9 100644 >>>> --- a/drivers/iommu/amd/init.c >>>> +++ b/drivers/iommu/amd/init.c >>>> @@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu) >>>> >>>> BUG_ON(iommu->mmio_base == NULL); >>>> >>>> + if (is_kdump_kernel()) >>> >>> This is fine.. but its becoming too many places with kdump check! I don't know >>> what is the better way here. >>> Is it worth to keep it like this -OR- add say iommu ops that way during init we >>> check is_kdump_kernel() and adjust the ops ? >>> >>> @Joerg, any preference? >>> >>> > > .../... > >>>> break; >>>> } >>>> @@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu) >>>> * This function finally enables all IOMMUs found in the system after >>>> * they have been initialized. >>>> * >>>> - * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy >>>> - * the old content of device table entries. Not this case or copy failed, >>>> + * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse >>>> + * the old content of device table entries. Not this case or reuse failed, >>>> * just continue as normal kernel does. >>>> */ >>>> static void early_enable_iommus(void) >>>> @@ -2926,18 +2885,18 @@ static void early_enable_iommus(void) >>>> struct amd_iommu *iommu; >>>> struct amd_iommu_pci_seg *pci_seg; >>>> >>>> - if (!copy_device_table()) { >>>> + if (!reuse_device_table()) { >>> >>> Hmmm. What happens if SNP enabled and reuse_device_table() couldn't setup >>> previous DTE? >>> In non-SNP case it works fine as we can rebuild new DTE. But in SNP case we >>> should fail the kdump right? >>> >> >> Which will happen automatically, if we can't setup previous DTE for SNP case >> then IOMMU commands will time-out and subsequenly cause a panic as IRQ remapping >> won't be setup. > > But what is the point is proceeding when we know its going to fail? I think its > better to fail here so that at least we know where/why it failed. > Yes that makes sense. As Sairaj suggested, we can add a BUG_ON() if reuse_device_table() fails in case of SNP enabled. Thanks, Ashish