2025-04-30T10:56:35+05:30, Anup Patel <anup@xxxxxxxxxxxxxx>: > On Wed, Apr 30, 2025 at 9:52 AM Anup Patel <anup@xxxxxxxxxxxxxx> wrote: >> >> On Tue, Apr 29, 2025 at 9:51 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote: >> > >> > 2025-04-29T20:31:18+05:30, Anup Patel <anup@xxxxxxxxxxxxxx>: >> > > On Tue, Apr 29, 2025 at 3:55 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote: >> > >> >> > >> 2025-04-29T11:25:35+05:30, Anup Patel <apatel@xxxxxxxxxxxxxxxx>: >> > >> > On Mon, Apr 28, 2025 at 11:15 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote: >> > >> >> >> > >> >> 2025-04-28T17:52:25+05:30, Anup Patel <anup@xxxxxxxxxxxxxx>: >> > >> >> > On Thu, Apr 3, 2025 at 5:02 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote: >> > >> >> >> For a cleaner solution, we should add interfaces to perform the KVM-SBI >> > >> >> >> reset request on userspace demand. I think it would also be much better >> > >> >> >> if userspace was in control of the post-reset state. >> > >> >> > >> > >> >> > Apart from breaking KVM user-space, this patch is incorrect and >> > >> >> > does not align with the: >> > >> >> > 1) SBI spec >> > >> >> > 2) OS boot protocol. >> > >> >> > >> > >> >> > The SBI spec only defines the entry state of certain CPU registers >> > >> >> > (namely, PC, A0, and A1) when CPU enters S-mode: >> > >> >> > 1) Upon SBI HSM start call from some other CPU >> > >> >> > 2) Upon resuming from non-retentive SBI HSM suspend or >> > >> >> > SBI system suspend >> > >> >> > >> > >> >> > The S-mode entry state of the boot CPU is defined by the >> > >> >> > OS boot protocol and not by the SBI spec. Due to this, reason >> > >> >> > KVM RISC-V expects user-space to set up the S-mode entry >> > >> >> > state of the boot CPU upon system reset. >> > >> >> >> > >> >> We can handle the initial state consistency in other patches. >> > >> >> What needs addressing is a way to trigger the KVM reset from userspace, >> > >> >> even if only to clear the internal KVM state. >> > >> >> >> > >> >> I think mp_state is currently the best signalization that KVM should >> > >> >> reset, so I added it there. >> > >> >> >> > >> >> What would be your preferred interface for that? >> > >> >> >> > >> > >> > >> > Instead of creating a new interface, I would prefer that VCPU >> > >> > which initiates SBI System Reset should be resetted immediately >> > >> > in-kernel space before forwarding the system reset request to >> > >> > user space. >> > >> >> > >> The initiating VCPU might not be the boot VCPU. >> > >> It would be safer to reset all of them. >> > > >> > > I meant initiating VCPU and not the boot VCPU. Currently, the >> > > non-initiating VCPUs are already resetted by VCPU requests >> > > so nothing special needs to be done. >> >> There is no designated boot VCPU for KVM so let us only use the >> term "initiating" or "non-initiating" VCPUs in context of system reset. That is exactly how I use it. Some VCPU will be the boot VCPU (the VCPU made runnable by KVM_SET_MP_STATE) and loaded with state from userspace. RISC-V doesn't guarantee that the boot VCPU is the reset initiating VCPU, so I think KVM should allow it. >> > Currently, we make the request only for VCPUs brought up by HSM -- the >> > non-boot VCPUs. There is a single VCPU not being reset and resetting >> > the reset initiating VCPU changes nothing. e.g. >> > >> > 1) VCPU 1 initiates the reset through an ecall. >> > 2) All VCPUs are stopped and return to userspace. >> >> When all VCPUs are stopped, all VCPUs except VCPU1 >> (in this example) will SLEEP because we do >> "kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP)" >> so none of the VCPUs except VCPU1 (in this case) will >> return to userspace. Userspace should be able to do whatever it likes -- in my example, all the VCPUs are brought to userspace and a different boot VCPU is selected. (Perhaps userspace wanted to record their reset pre-reset state, or maybe it really wants to boot with a designated VCPU.) >> > 3) Userspace prepares VCPU 0 as the boot VCPU. >> > 4) VCPU 0 executes without going through KVM reset paths. >> >> Userspace will see a system reset event exit for the >> initiating VCPU by that time all other VCPUs are already >> sleeping with mp_state == KVM_MP_STATE_STOPPED. >> >> > >> > The point of this patch is to reset the boot VCPU, so we reset the VCPU >> > that is made runnable by the KVM_SET_MP_STATE IOCTL. >> >> Like I said before, we don't need to do this. The initiating VCPU >> can be resetted just before exiting to user space for system reset >> event exit. You assume initiating VCPU == boot VCPU. We should prevent KVM_SET_MP_STATE IOCTL for all non-initiating VCPUs if we decide to accept the assumption. I'd rather choose a different design, though. How about a new userspace interface for IOCTL reset? (Can be capability toggle for KVM_SET_MP_STATE or a straight new IOCTL.) That wouldn't "fix" current userspaces, but would significantly improve the sanity of the KVM interface. > Below is what I am suggesting. This change completely removes > dependency of kvm_sbi_hsm_vcpu_start() on "reset" structures. I'd keep the reset structure in this series -- it's small enough and locklessly accessing the state of another VCPU needs a lot of consideration to prevent all possible race conditions. Thanks.