Hi Jiaqi, On Wed, Jun 04, 2025 at 05:08:56AM +0000, Jiaqi Yan wrote: > When APEI fails to handle a stage-2 synchronous external abort (SEA), > today KVM directly injects an async SError to the VCPU then resumes it, > which usually results in unpleasant guest kernel panic. > > One major situation of guest SEA is when vCPU consumes recoverable > uncorrected memory error (UER). Although SError and guest kernel panic > effectively stops the propagation of corrupted memory, there is room > to recover from an UER in a more graceful manner. > > Alternatively KVM can redirect the synchronous SEA event to VMM to > - Reduce blast radius if possible. VMM can inject a SEA to VCPU via > KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison > consumption or fault is not from guest kernel, blast radius can be > limited to the triggering thread in guest userspace, so VM can > keep running. > - VMM can protect from future memory poison consumption by unmapping > the page from stage-2, or interrupt guest of the poisoned guest page > so guest kernel can unmap it from stage-1. > - VMM can also track SEA events that VM customers care about, restart > VM when certain number of distinct poison events have happened, > provide observability to customers in log management UI. > > Introduce an userspace-visible feature to enable VMM to handle SEA: > - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior > when host APEI fails to claim a SEA, userspace can opt in this new > capability to let KVM exit to userspace during SEA if it is not > caused by access on memory of stage-2 translation table. > - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this. > KVM fills kvm_run.arm_sea with as much as possible information about > the SEA, enabling VMM to emulate SEA to guest by itself. > - Sanitized ESR_EL2. The general rule is to keep only the bits > useful for userspace and relevant to guest memory. See code > comments for why bits are hidden/reported. > - If faulting guest virtual and physical addresses are available. > - Faulting guest virtual address if available. > - Faulting guest physical address if available. > > Signed-off-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx> I was reviewing this locally and wound up making enough changes where it just made more sense to share the diff. General comments: - Avoid adding helpers to headers when they're used in a single callsite / compilation unit - Add some detail about FEAT_RAS where we may still exit to userspace for host-controlled memory, as we cannot differentiate between a stage-1 or stage-2 TTW SEA when taken on the descriptor PA - Explicitly handle SEAs due to VNCR (I have a separate prereq patch)