Ashish, On 7/17/2025 3:25 AM, Kalra, Ashish wrote: > Hello Vasant, > > On 7/16/2025 4:19 AM, Vasant Hegde wrote: >> Hi Ashish, >> >> >> On 7/16/2025 12:56 AM, Ashish Kalra wrote: >>> From: Ashish Kalra <ashish.kalra@xxxxxxx> >>> >>> After a panic if SNP is enabled in the previous kernel then the kdump >>> kernel boots with IOMMU SNP enforcement still enabled. >>> >>> IOMMU completion wait buffers (CWBs), command buffers and event buffer >>> registers remain locked and exclusive to the previous kernel. Attempts >>> to allocate and use new buffers in the kdump kernel fail, as hardware >>> ignores writes to the locked MMIO registers as per AMD IOMMU spec >>> Section 2.12.2.1. >>> >>> This results in repeated "Completion-Wait loop timed out" errors and a >>> second kernel panic: "Kernel panic - not syncing: timer doesn't work >>> through Interrupt-remapped IO-APIC" >>> >>> The following MMIO registers are locked and ignore writes after failed >>> SNP shutdown: >>> Command Buffer Base Address Register >>> Event Log Base Address Register >>> Completion Store Base Register/Exclusion Base Register >>> Completion Store Limit Register/Exclusion Limit Register >>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ >>> remapping, which is required for proper operation. >> >> There are couple of other registers in locked list. Can you please rephrase >> above paras? Also you don't need to callout indivisual registers here. You can >> just add link to IOMMU spec. > > Yes i will drop listing the individual registers here and just provide the link > to the IOMMU specs. May be you can rephrase a bit so that its clear that there are many register gets locked. This patch fixes few of them and following patches will fix remaining ones. > >> >> Unrelated to this patch : >> I went to some of the SNP related code in IOMMU driver. One thing confused me >> is in amd_iommu_snp_disable() code why Command buffer is not marked as shared? >> any idea? >> > > Yes that's interesting. > > This is as per the SNP Firmware ABI specs: > > from SNP_INIT_EX: > > The firmware initializes the IOMMU to perform RMP enforcement. The firmware also transitions > the event log, PPR log, and completion wait buffers of the IOMMU to an RMP page state that is > read only to the hypervisor and cannot be assigned to guests > > So during SNP_SHUTDOWN_EX, transitioning these same buffers back to shared state. > > But will investigate deeper and check why is command buffer not marked as FW/Reclaim state > by firmware ? Sure. > >> >>> >>> Reuse the pages of the previous kernel for completion wait buffers, >>> command buffers, event buffers and memremap them during kdump boot >>> and essentially work with an already enabled IOMMU configuration and >>> re-using the previous kernel’s data structures. >>> >>> Reusing of command buffers and event buffers is now done for kdump boot >>> irrespective of SNP being enabled during kdump. >>> >>> Re-use of completion wait buffers is only done when SNP is enabled as >>> the exclusion base register is used for the completion wait buffer >>> (CWB) address only when SNP is enabled. >>> >>> Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx> >>> --- >>> drivers/iommu/amd/amd_iommu_types.h | 5 + >>> drivers/iommu/amd/init.c | 163 ++++++++++++++++++++++++++-- >>> drivers/iommu/amd/iommu.c | 2 +- >>> 3 files changed, 157 insertions(+), 13 deletions(-) >>> >>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h >>> index 9b64cd706c96..082eb1270818 100644 >>> --- a/drivers/iommu/amd/amd_iommu_types.h >>> +++ b/drivers/iommu/amd/amd_iommu_types.h >>> @@ -791,6 +791,11 @@ struct amd_iommu { >>> u32 flags; >>> volatile u64 *cmd_sem; >>> atomic64_t cmd_sem_val; >>> + /* >>> + * Track physical address to directly use it in build_completion_wait() >>> + * and avoid adding any special checks and handling for kdump. >>> + */ >>> + u64 cmd_sem_paddr; >> >> With this we are tracking both physical and virtual address? Is that really >> needed? Can we just track PA and convert it into va? >> > > I believe it is simpler to keep/track cmd_sem and use it directly, instead of doing > phys_to_virt() calls everytime before using it. > >>> >>> #ifdef CONFIG_AMD_IOMMU_DEBUGFS >>> /* DebugFS Info */ >>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c >>> index cadb2c735ffc..32295f26be1b 100644 >>> --- a/drivers/iommu/amd/init.c >>> +++ b/drivers/iommu/amd/init.c >>> @@ -710,6 +710,23 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg) >>> pci_seg->alias_table = NULL; >>> } >>> >>> +static inline void *iommu_memremap(unsigned long paddr, size_t size) >>> +{ >>> + phys_addr_t phys; >>> + >>> + if (!paddr) >>> + return NULL; >>> + >>> + /* >>> + * Obtain true physical address in kdump kernel when SME is enabled. >>> + * Currently, IOMMU driver does not support booting into an unencrypted >>> + * kdump kernel. >> >> You mean production kernel w/ SME and kdump kernel with non-SME is not supported? >> > > Yes. Then can you please rephrase above comment? > >> >>> + */ >>> + phys = __sme_clr(paddr); >>> + >>> + return ioremap_encrypted(phys, size); >> >> You are clearing C bit and then immediately remapping using encrypted mode. Also >> existing code checks for C bit before calling ioremap_encrypted(). So I am not >> clear why you do this. >> >> > > We need to clear the C-bit to get the correct physical address for remapping. > > Which existing code checks for C-bit before calling ioremap_encrypted() ? > > After getting the correct physical address we call ioremap_encrypted() which > which map it with C-bit enabled if SME is enabled or else it will map it > without C-bit (so it handles both SME and non-SME cases). > > Earlier we used to check for CC_ATTR_HOST_MEM_ENCRYPT flag and if set > then call ioremap_encrypted() or otherwise call memremap(), but then > as mentioned above ioremap_encrypted() works for both cases - SME or > non-SME, hence we use that approach. If you want to keep it in current way then it needs better comment. I'd say add CC_ATTR_HOST_MEM_ENCRYPT check so that its easy to read. > >> >>> +} >>> + >>> /* >>> * Allocates the command buffer. This buffer is per AMD IOMMU. We can >>> * write commands to that buffer later and the IOMMU will execute them >>> @@ -942,8 +959,105 @@ static int iommu_init_ga_log(struct amd_iommu *iommu) >>> static int __init alloc_cwwb_sem(struct amd_iommu *iommu) >>> { >>> iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1); >>> + if (!iommu->cmd_sem) >>> + return -ENOMEM; >>> + iommu->cmd_sem_paddr = iommu_virt_to_phys((void *)iommu->cmd_sem); >>> + return 0; >>> +} >>> + >>> +static int __init remap_event_buffer(struct amd_iommu *iommu) >>> +{ >>> + u64 paddr; >>> + >>> + pr_info_once("Re-using event buffer from the previous kernel\n"); >>> + /* >>> + * Read-back the event log base address register and apply >>> + * PM_ADDR_MASK to obtain the event log base address. >>> + */ >>> + paddr = readq(iommu->mmio_base + MMIO_EVT_BUF_OFFSET) & PM_ADDR_MASK; >>> + iommu->evt_buf = iommu_memremap(paddr, EVT_BUFFER_SIZE); >>> + >>> + return iommu->evt_buf ? 0 : -ENOMEM; >>> +} >>> + >>> +static int __init remap_command_buffer(struct amd_iommu *iommu) >>> +{ >>> + u64 paddr; >>> + >>> + pr_info_once("Re-using command buffer from the previous kernel\n"); >>> + /* >>> + * Read-back the command buffer base address register and apply >>> + * PM_ADDR_MASK to obtain the command buffer base address. >>> + */ >>> + paddr = readq(iommu->mmio_base + MMIO_CMD_BUF_OFFSET) & PM_ADDR_MASK; >>> + iommu->cmd_buf = iommu_memremap(paddr, CMD_BUFFER_SIZE); >>> + >>> + return iommu->cmd_buf ? 0 : -ENOMEM; >>> +} >>> + >>> +static int __init remap_cwwb_sem(struct amd_iommu *iommu) >>> +{ >>> + u64 paddr; >>> + >>> + if (check_feature(FEATURE_SNP)) { >>> + /* >>> + * When SNP is enabled, the exclusion base register is used for the >>> + * completion wait buffer (CWB) address. Read and re-use it. >>> + */ >>> + pr_info_once("Re-using CWB buffers from the previous kernel\n"); >>> + /* >>> + * Read-back the exclusion base register and apply PM_ADDR_MASK >>> + * to obtain the exclusion range base address. >>> + */ >>> + paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET) & PM_ADDR_MASK; >>> + iommu->cmd_sem = iommu_memremap(paddr, PAGE_SIZE); >>> + if (!iommu->cmd_sem) >>> + return -ENOMEM; >>> + iommu->cmd_sem_paddr = paddr; >>> + } else { >>> + return alloc_cwwb_sem(iommu); >> >> I understand this one is different from command/event buffer. But calling >> function name as remap_*() and then allocating memory internally is bit odd. >> Also this differs from previous functions. >> > > Yes i agree, but then what do we name it ? > > remap_or_alloc_cwb_sem() does that sound Ok ? May be. -Vasant