Re: [PATCH v3 0/2] RISC-V: KVM: VCPU reset fixes

Atish Patra <atish.patra@xxxxxxxxx> · Fri, 23 May 2025 10:44:37 -0700

On 5/23/25 2:20 AM, Radim Krčmář wrote:
2025-05-23T13:38:26+05:30, Anup Patel <apatel@xxxxxxxxxxxxxxxx>:
On Fri, May 23, 2025 at 12:47 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote:
2025-05-22T14:43:40-07:00, Atish Patra <atish.patra@xxxxxxxxx>:
On 5/15/25 7:37 AM, Radim KrÄmÃ¡Å wrote:
Hello,

the design still requires a discussion.

[v3 1/2] removes most of the additional changes that the KVM capability
was doing in v2.  [v3 2/2] is new and previews a general solution to the
lack of userspace control over KVM SBI.

I am still missing the motivation behind it. If the motivation is SBI
HSM suspend, the PATCH2 doesn't achieve that as it forwards every call
to the user space. Why do you want to control hsm start/stop from the
user space ?
HSM needs fixing, because KVM doesn't know what the state after
sbi_hart_start should be.
For example, we had a discussion about scounteren and regardless of what
default we choose in KVM, the userspace might want a different value.
I don't think that HSM start/stop is a hot path, so trapping to
userspace seems better than adding more kernel code.
There are no implementation specific S-mode CSR reset values
required at the moment.
Jessica mentioned that BSD requires scounteren to be non-zero, so
userspace should be able to provide that value.

Jessica admitted that it was a bug which should be fixed.

I would prefer if KVM could avoid getting into those discussions.
We can just just let userspace be as crazy as it wants.

The scounteren state you mentioned is already fixed now.

I would prefer to do this if there are more of these issues. Otherwise,
we may gain little by just delegating more work to the userspace for no 
reason.

                         Whenever the need arises, we will extend
the ONE_REG interface so that user space can specify custom
CSR reset values at Guest/VM creation time. We don't need to
forward SBI HSM calls to user space for custom S-mode CSR
reset values.
The benefits of adding a new ONE_REG interface seem very small compared
to the drawbacks of having extra kernel code.

How ? The extra kernel code is just few lines where it just registers a 
SBI extension and forwards
it to the userspace. That's for the entire extension.

For extensions like HSM, only selective functions that should be 
forwarded to the userspace which
defeats the purpose.

Let's not try to fix something that is not broken yet.

If userspace would want to reset or setup new multi-VCPUs VMs often, we
could add an interface that loads the whole register state from
userspace in a single IOCTL, because ONE_REG is not the best interface
for bulk data transfer either.

Forwarding all the unimplemented SBI ecalls shouldn't be a performance
issue, because S-mode software would hopefully learn after the first
error and stop trying again.

Allowing userspace to fully implement the ecall instruction one of the
motivations as well -- SBI is not a part of RISC-V ISA, so someone might
be interested in accelerating a different M-mode software with KVM.

I'll send v4 later today -- there is a missing part in [2/2], because
userspace also needs to be able to emulate the base SBI extension.

[...]          The best approach is to selectively forward SBI
calls to user space where needed (e.g. SBI system reset,
SBI system suspend, SBI debug console, etc.).
That is exactly what my proposal does, it's just that the userspace says
what is "needed".

If we started with this mechanism, KVM would not have needed to add
SRST/SUSP/DBCN SBI emulation at all -- they would be forwarded as any
other unhandled ecall.