On Tue, May 27, 2025 at 09:23:05AM +0530, Anup Patel wrote: > On Mon, May 26, 2025 at 8:09 PM Andrew Jones <ajones@xxxxxxxxxxxxxxxx> wrote: > > > > On Mon, May 26, 2025 at 06:12:19PM +0530, Anup Patel wrote: > > > On Mon, May 26, 2025 at 2:52 PM Andrew Jones <ajones@xxxxxxxxxxxxxxxx> wrote: > > > > > > > > On Fri, May 23, 2025 at 01:33:49PM +0200, Radim Krčmář wrote: > > > > > The new capability allows userspace to implement SBI extensions that KVM > > > > > does not handle. This allows userspace to implement any SBI ecall as > > > > > userspace already has the ability to disable acceleration of selected > > > > > SBI extensions. > > > > > The base extension is made controllable as well, but only with the new > > > > > capability, because it was previously handled specially for some reason. > > > > > *** The related compatibility TODO in the code needs addressing. *** > > > > > > > > > > This is a VM capability, because userspace will most likely want to have > > > > > the same behavior for all VCPUs. We can easily make it both a VCPU and > > > > > a VM capability if there is demand in the future. > > > > > > > > > > Signed-off-by: Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> > > > > > --- > > > > > v4: > > > > > * forward base extension as well > > > > > * change the id to 242, because 241 is already taken in linux-next > > > > > * QEMU example: https://github.com/radimkrcmar/qemu/tree/mp_state_reset > > > > > v3: new > > > > > --- > > > > > Documentation/virt/kvm/api.rst | 11 +++++++++++ > > > > > arch/riscv/include/asm/kvm_host.h | 3 +++ > > > > > arch/riscv/include/uapi/asm/kvm.h | 1 + > > > > > arch/riscv/kvm/vcpu_sbi.c | 17 ++++++++++++++--- > > > > > arch/riscv/kvm/vm.c | 5 +++++ > > > > > include/uapi/linux/kvm.h | 1 + > > > > > 6 files changed, 35 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > > > > index e107694fb41f..c9d627d13a5e 100644 > > > > > --- a/Documentation/virt/kvm/api.rst > > > > > +++ b/Documentation/virt/kvm/api.rst > > > > > @@ -8507,6 +8507,17 @@ given VM. > > > > > When this capability is enabled, KVM resets the VCPU when setting > > > > > MP_STATE_INIT_RECEIVED through IOCTL. The original MP_STATE is preserved. > > > > > > > > > > +7.44 KVM_CAP_RISCV_USERSPACE_SBI > > > > > +-------------------------------- > > > > > + > > > > > +:Architectures: riscv > > > > > +:Type: VM > > > > > +:Parameters: None > > > > > +:Returns: 0 on success, -EINVAL if arg[0] is not zero > > > > > + > > > > > +When this capability is enabled, KVM forwards ecalls from disabled or unknown > > > > > +SBI extensions to userspace. > > > > > + > > > > > 8. Other capabilities. > > > > > ====================== > > > > > > > > > > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h > > > > > index 85cfebc32e4c..6f17cd923889 100644 > > > > > --- a/arch/riscv/include/asm/kvm_host.h > > > > > +++ b/arch/riscv/include/asm/kvm_host.h > > > > > @@ -122,6 +122,9 @@ struct kvm_arch { > > > > > > > > > > /* KVM_CAP_RISCV_MP_STATE_RESET */ > > > > > bool mp_state_reset; > > > > > + > > > > > + /* KVM_CAP_RISCV_USERSPACE_SBI */ > > > > > + bool userspace_sbi; > > > > > }; > > > > > > > > > > struct kvm_cpu_trap { > > > > > diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h > > > > > index 5f59fd226cc5..dd3a5dc53d34 100644 > > > > > --- a/arch/riscv/include/uapi/asm/kvm.h > > > > > +++ b/arch/riscv/include/uapi/asm/kvm.h > > > > > @@ -204,6 +204,7 @@ enum KVM_RISCV_SBI_EXT_ID { > > > > > KVM_RISCV_SBI_EXT_DBCN, > > > > > KVM_RISCV_SBI_EXT_STA, > > > > > KVM_RISCV_SBI_EXT_SUSP, > > > > > + KVM_RISCV_SBI_EXT_BASE, > > > > > KVM_RISCV_SBI_EXT_MAX, > > > > > }; > > > > > > > > > > diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c > > > > > index 31fd3cc98d66..497d5b023153 100644 > > > > > --- a/arch/riscv/kvm/vcpu_sbi.c > > > > > +++ b/arch/riscv/kvm/vcpu_sbi.c > > > > > @@ -39,7 +39,7 @@ static const struct kvm_riscv_sbi_extension_entry sbi_ext[] = { > > > > > .ext_ptr = &vcpu_sbi_ext_v01, > > > > > }, > > > > > { > > > > > - .ext_idx = KVM_RISCV_SBI_EXT_MAX, /* Can't be disabled */ > > > > > + .ext_idx = KVM_RISCV_SBI_EXT_BASE, > > > > > .ext_ptr = &vcpu_sbi_ext_base, > > > > > }, > > > > > { > > > > > @@ -217,6 +217,11 @@ static int riscv_vcpu_set_sbi_ext_single(struct kvm_vcpu *vcpu, > > > > > if (!sext || scontext->ext_status[sext->ext_idx] == KVM_RISCV_SBI_EXT_STATUS_UNAVAILABLE) > > > > > return -ENOENT; > > > > > > > > > > + // TODO: probably remove, the extension originally couldn't be > > > > > + // disabled, but it doesn't seem necessary > > > > > + if (!vcpu->kvm->arch.userspace_sbi && sext->ext_id == KVM_RISCV_SBI_EXT_BASE) > > > > > + return -ENOENT; > > > > > + > > > > > > > > I agree that we don't need to babysit userspace and it's even conceivable > > > > to have guests that don't need SBI. KVM should only need checks in its > > > > UAPI to protect itself from userspace and to enforce proper use of the > > > > API. It's not KVM's place to ensure userspace doesn't violate the SBI spec > > > > or create broken guests (userspace is the boss, even if it's a boss that > > > > doesn't make sense) > > > > > > > > So, I vote we drop the check. > > > > > > > > > scontext->ext_status[sext->ext_idx] = (reg_val) ? > > > > > KVM_RISCV_SBI_EXT_STATUS_ENABLED : > > > > > KVM_RISCV_SBI_EXT_STATUS_DISABLED; > > > > > @@ -471,8 +476,14 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run) > > > > > #endif > > > > > ret = sbi_ext->handler(vcpu, run, &sbi_ret); > > > > > } else { > > > > > - /* Return error for unsupported SBI calls */ > > > > > - cp->a0 = SBI_ERR_NOT_SUPPORTED; > > > > > + if (vcpu->kvm->arch.userspace_sbi) { > > > > > + next_sepc = false; > > > > > + ret = 0; > > > > > + kvm_riscv_vcpu_sbi_forward(vcpu, run); > > > > > + } else { > > > > > + /* Return error for unsupported SBI calls */ > > > > > + cp->a0 = SBI_ERR_NOT_SUPPORTED; > > > > > + } > > > > > goto ecall_done; > > > > > } > > > > > > > > > > diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c > > > > > index b27ec8f96697..0b6378b83955 100644 > > > > > --- a/arch/riscv/kvm/vm.c > > > > > +++ b/arch/riscv/kvm/vm.c > > > > > @@ -217,6 +217,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) > > > > > return -EINVAL; > > > > > kvm->arch.mp_state_reset = true; > > > > > return 0; > > > > > + case KVM_CAP_RISCV_USERSPACE_SBI: > > > > > + if (cap->flags) > > > > > + return -EINVAL; > > > > > + kvm->arch.userspace_sbi = true; > > > > > + return 0; > > > > > default: > > > > > return -EINVAL; > > > > > } > > > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > > > > index 454b7d4a0448..bf23deb6679e 100644 > > > > > --- a/include/uapi/linux/kvm.h > > > > > +++ b/include/uapi/linux/kvm.h > > > > > @@ -931,6 +931,7 @@ struct kvm_enable_cap { > > > > > #define KVM_CAP_X86_GUEST_MODE 238 > > > > > #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239 > > > > > #define KVM_CAP_RISCV_MP_STATE_RESET 240 > > > > > +#define KVM_CAP_RISCV_USERSPACE_SBI 242 > > > > > > > > > > struct kvm_irq_routing_irqchip { > > > > > __u32 irqchip; > > > > > -- > > > > > 2.49.0 > > > > > > > > > > > > > Otherwise, > > > > > > > > Reviewed-by: Andrew Jones <ajones@xxxxxxxxxxxxxxxx> > > > > > > We are not going ahead with this approach for the reasons > > > mentioned in v3 series [1]. > > > > IIUC, the main concern in that thread is that userspace won't know what to > > do with some of the exits it gets or that it'll try to take control of > > extensions that it can't emulate. I feel like not exiting to userspace in > > those cases is trying to second guess it, i.e. KVM is trying to enforce a > > policy on userspace. But, KVM shouldn't be doing that, as userspace should > > be the policy maker. If userspace uses this capability to opt into getting > > all the SBI exits (which it doesn't want KVM to handle), then it should be > > allowed to get them -- and, if userspace doesn't know what it's doing, > > then it can keep all the pieces. > > The userspace already has a mechanism to opt-in for select SBI exits > which it can implement such as SBI DBCN and SBI SUSP. With SBI v3.0, > userspace will be able implement SBI MPXY but SBI SSE, FWFT, and > DBTR will be in kernel space due to reasons mentioned in the v3 series. > > There is no point in forwarding all SBI exits to userspace when userspace > has no mechanism to implement many critical SBI extensions. Userspace can implement all truly disabled extensions, it just returns not-supported. So, while I may be contriving this example a bit, enabling userspace to log attempts to use disabled extensions would be one reason to forward them. Thanks, drew