Re: [PATCH v4 4/5] KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 23 Apr 2025 08:30:16 -0700

On Wed, Apr 16, 2025, Xiaoyao Li wrote:
> On 3/24/2025 9:02 PM, Manali Shukla wrote:
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 5fe84f2427b5..f7c925aa0c4f 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7909,6 +7909,25 @@ apply some other policy-based mitigation. When exiting to userspace, KVM sets
> >   KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason
> >   to KVM_EXIT_X86_BUS_LOCK.
> > +Note! KVM_CAP_X86_BUS_LOCK_EXIT on AMD CPUs with the Bus Lock Threshold is close
> > +enough  to INTEL's Bus Lock Detection VM-Exit to allow using
> > +KVM_CAP_X86_BUS_LOCK_EXIT for AMD CPUs.
> > +
> > +The biggest difference between the two features is that Threshold (AMD CPUs) is
> > +fault-like i.e. the bus lock exit to user space occurs with RIP pointing at the
> > +offending instruction, whereas Detection (Intel CPUs) is trap-like i.e. the bus
> > +lock exit to user space occurs with RIP pointing at the instruction right after
> > +the guilty one.
> > 
> 
> 
> > +The bus lock threshold on AMD CPUs provides a per-VMCB counter which is
> > +decremented every time a bus lock occurs, and a VM-Exit is triggered if and only
> > +if the bus lock counter is '0'.
> > +
> > +To provide Detection-like semantics for AMD CPUs, the bus lock counter has been
> > +initialized to '0', i.e. exit on every bus lock, and when re-executing the
> > +guilty instruction, the bus lock counter has been set to '1' to effectively step
> > +past the instruction.
> 
> From the perspective of API, I don't think the last two paragraphs matter
> much to userspace.
> 
> It should describe what userspace can/should do. E.g., when exit to
> userspace due to bus lock on AMD platform, the RIP points at the instruction
> which causes the bus lock. Userspace can advance the RIP itself before
> re-enter the guest to make progress. If userspace doesn't change the RIP,
> KVM internal can handle it by making the re-execution of the instruction
> doesn't trigger bus lock VM exit to allow progress.

Agreed.  It's not just the last two paragraphs, it's the entire doc update.

The existing documentation very carefully doesn't say anything about *how* the
feature is implemented on Intel, so I don't see any reason to mention or compare
Bus Lock Threshold vs. Bus Lock Detection.  As Xiaoyao said, simply state what
is different.

And I would definitely not say anything about whether or not userspace can advance
RIP, as doing so will likely crash/corrupt the guest.  KVM sets bus_lock_counter
to allow forward progress, KVM does NOT skip RIP.

All in all, I think the only that needs to be called out is that RIP will point
to the next instruction on Intel, but the offending instruction on Intel.

Unless I'm missing a detail, I think it's just this:

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 5fe84f2427b5..d9788f9152f1 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7909,6 +7909,11 @@ apply some other policy-based mitigation. When exiting to userspace, KVM sets
 KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason
 to KVM_EXIT_X86_BUS_LOCK.
 
+Due to differences in the underlying hardware implementation, the vCPU's RIP at
+the time of exit diverges between Intel and AMD.  On Intel hosts, RIP points at
+the next instruction, i.e. the exit is trap-like.  On AMD hosts, RIP points at
+the offending instruction, i.e. the exit is fault-like.
+
 Note! Detected bus locks may be coincident with other exits to userspace, i.e.
 KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if
 userspace wants to take action on all detected bus locks.