On 20/08/2025 00:32, Borislav Petkov wrote: > On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote: >> Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine >> check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of >> valid physical address bits within the machine check bank address register. >> >> This is particularly needed in the case of errors in TDX/SEAM non-root mode >> because the reported address contains the TDX KeyID. Refer to TDX and >> TME-MK documentation for more information about KeyIDs. >> >> Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM >> non-root mode") uses the address to mark the affected page as poisoned, but >> omits to use the aforementioned mask. >> >> Investigation of user space expectations has concluded it would be more >> correct for the address to contain only address bits in the first place. >> Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@xxxxxxxxx >> >> Mask the address when it is read from the machine check bank address >> register. Do not use MCI_ADDR_PHYSADDR because that will be removed in a >> later patch. > > Why is this patch talking about TDX-something but doing "global" changes to > mce.addr? It falls a bit into the category of: easier to maintain a global way of doing things than have lots of special-cases. > > Why don't you simply do a TDX-specific masking out when you're running on > in TDX env and leave the rest as is? > It was kinda like that in V1: https://lore.kernel.org/r/20250618120806.113884-2-adrian.hunter@xxxxxxxxx/ where the code change was dealing with SEAM_NR in the block starting: } else if (m->mcgstatus & MCG_STATUS_SEAM_NR) { Then Dave asked about changing addr itself: https://lore.kernel.org/all/487c5e63-07d3-41ad-bfc0-bda14b3c435e@xxxxxxxxx/ https://lore.kernel.org/all/79eca29a-8ba4-4ad9-b2e0-54d8e668f731@xxxxxxxxx/ And it seems like user space does expect addr to be a physical address: https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@xxxxxxxxx Something like below would work, but doesn't answer Dave's question of why not do it in mce_read_aux() diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 4da4eab56c81..53c7ea3d0464 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1655,28 +1655,30 @@ noinstr void do_machine_check(struct pt_regs *regs) } else if (m->mcgstatus & MCG_STATUS_SEAM_NR) { /* * Saved RIP on stack makes it look like the machine check * was taken in the kernel on the instruction following * the entry to SEAM mode. But MCG_STATUS_SEAM_NR indicates * that the machine check was taken inside SEAM non-root * mode. CPU core has already marked that guest as dead. * It is OK for the kernel to resume execution at the * apparent point of the machine check as the fault did * not occur there. Mark the page as poisoned so it won't * be added to free list when the guest is terminated. */ if (mce_usable_address(m)) { - struct page *p = pfn_to_online_page(m->addr >> PAGE_SHIFT); + struct page *p; + m->addr &= MCI_ADDR_PHYSADDR; + p = pfn_to_online_page(m->addr >> PAGE_SHIFT); if (p) SetPageHWPoison(p); } } else {