On Wed, Apr 30, 2025 at 9:52 AM Anup Patel <anup@xxxxxxxxxxxxxx> wrote: > > On Tue, Apr 29, 2025 at 9:51 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote: > > > > 2025-04-29T20:31:18+05:30, Anup Patel <anup@xxxxxxxxxxxxxx>: > > > On Tue, Apr 29, 2025 at 3:55 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote: > > >> > > >> 2025-04-29T11:25:35+05:30, Anup Patel <apatel@xxxxxxxxxxxxxxxx>: > > >> > On Mon, Apr 28, 2025 at 11:15 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote: > > >> >> > > >> >> 2025-04-28T17:52:25+05:30, Anup Patel <anup@xxxxxxxxxxxxxx>: > > >> >> > On Thu, Apr 3, 2025 at 5:02 PM Radim Krčmář <rkrcmar@xxxxxxxxxxxxxxxx> wrote: > > >> >> >> For a cleaner solution, we should add interfaces to perform the KVM-SBI > > >> >> >> reset request on userspace demand. I think it would also be much better > > >> >> >> if userspace was in control of the post-reset state. > > >> >> > > > >> >> > Apart from breaking KVM user-space, this patch is incorrect and > > >> >> > does not align with the: > > >> >> > 1) SBI spec > > >> >> > 2) OS boot protocol. > > >> >> > > > >> >> > The SBI spec only defines the entry state of certain CPU registers > > >> >> > (namely, PC, A0, and A1) when CPU enters S-mode: > > >> >> > 1) Upon SBI HSM start call from some other CPU > > >> >> > 2) Upon resuming from non-retentive SBI HSM suspend or > > >> >> > SBI system suspend > > >> >> > > > >> >> > The S-mode entry state of the boot CPU is defined by the > > >> >> > OS boot protocol and not by the SBI spec. Due to this, reason > > >> >> > KVM RISC-V expects user-space to set up the S-mode entry > > >> >> > state of the boot CPU upon system reset. > > >> >> > > >> >> We can handle the initial state consistency in other patches. > > >> >> What needs addressing is a way to trigger the KVM reset from userspace, > > >> >> even if only to clear the internal KVM state. > > >> >> > > >> >> I think mp_state is currently the best signalization that KVM should > > >> >> reset, so I added it there. > > >> >> > > >> >> What would be your preferred interface for that? > > >> >> > > >> > > > >> > Instead of creating a new interface, I would prefer that VCPU > > >> > which initiates SBI System Reset should be resetted immediately > > >> > in-kernel space before forwarding the system reset request to > > >> > user space. > > >> > > >> The initiating VCPU might not be the boot VCPU. > > >> It would be safer to reset all of them. > > > > > > I meant initiating VCPU and not the boot VCPU. Currently, the > > > non-initiating VCPUs are already resetted by VCPU requests > > > so nothing special needs to be done. > > There is no designated boot VCPU for KVM so let us only use the > term "initiating" or "non-initiating" VCPUs in context of system reset. > > > > > Currently, we make the request only for VCPUs brought up by HSM -- the > > non-boot VCPUs. There is a single VCPU not being reset and resetting > > the reset initiating VCPU changes nothing. e.g. > > > > 1) VCPU 1 initiates the reset through an ecall. > > 2) All VCPUs are stopped and return to userspace. > > When all VCPUs are stopped, all VCPUs except VCPU1 > (in this example) will SLEEP because we do > "kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP)" > so none of the VCPUs except VCPU1 (in this case) will > return to userspace. > > > 3) Userspace prepares VCPU 0 as the boot VCPU. > > 4) VCPU 0 executes without going through KVM reset paths. > > Userspace will see a system reset event exit for the > initiating VCPU by that time all other VCPUs are already > sleeping with mp_state == KVM_MP_STATE_STOPPED. > > > > > The point of this patch is to reset the boot VCPU, so we reset the VCPU > > that is made runnable by the KVM_SET_MP_STATE IOCTL. > > Like I said before, we don't need to do this. The initiating VCPU > can be resetted just before exiting to user space for system reset > event exit. > Below is what I am suggesting. This change completely removes dependency of kvm_sbi_hsm_vcpu_start() on "reset" structures. diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 0e9c2fab6378..6bd12469852d 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -396,6 +396,7 @@ int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu, int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); +void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu); int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu); diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 02635bac91f1..801c6a1a1aef 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -51,7 +51,7 @@ const struct kvm_stats_header kvm_vcpu_stats_header = { sizeof(kvm_vcpu_stats_desc), }; -static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu) +void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu) { struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr; @@ -689,6 +689,9 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu) struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); if (kvm_request_pending(vcpu)) { + if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu)) + kvm_riscv_reset_vcpu(vcpu); + if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) { kvm_vcpu_srcu_read_unlock(vcpu); rcuwait_wait_event(wait, @@ -705,9 +708,6 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu) } } - if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu)) - kvm_riscv_reset_vcpu(vcpu); - if (kvm_check_request(KVM_REQ_UPDATE_HGATP, vcpu)) kvm_riscv_gstage_update_hgatp(vcpu); diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c index d1c83a77735e..79477e7f240a 100644 --- a/arch/riscv/kvm/vcpu_sbi.c +++ b/arch/riscv/kvm/vcpu_sbi.c @@ -146,9 +146,15 @@ void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu, spin_lock(&vcpu->arch.mp_state_lock); WRITE_ONCE(tmp->arch.mp_state.mp_state, KVM_MP_STATE_STOPPED); spin_unlock(&vcpu->arch.mp_state_lock); + if (tmp != vcpu) { + kvm_make_request(KVM_REQ_SLEEP | KVM_REQ_VCPU_RESET, vcpu); + kvm_vcpu_kick(vcpu); + } else { + kvm_make_request(KVM_REQ_SLEEP, vcpu); + } } - kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP); + kvm_riscv_reset_vcpu(vcpu); memset(&run->system_event, 0, sizeof(run->system_event)); run->system_event.type = type; run->system_event.ndata = 1; diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c index 3070bb31745d..30d7d59db5a5 100644 --- a/arch/riscv/kvm/vcpu_sbi_hsm.c +++ b/arch/riscv/kvm/vcpu_sbi_hsm.c @@ -15,15 +15,15 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu) { - struct kvm_cpu_context *reset_cntx; - struct kvm_cpu_context *cp = &vcpu->arch.guest_context; - struct kvm_vcpu *target_vcpu; + struct kvm_cpu_context *target_cp, *cp = &vcpu->arch.guest_context; unsigned long target_vcpuid = cp->a0; + struct kvm_vcpu *target_vcpu; int ret = 0; target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, target_vcpuid); if (!target_vcpu) return SBI_ERR_INVALID_PARAM; + target_cp = &target_vcpu->arch.guest_context; spin_lock(&target_vcpu->arch.mp_state_lock); @@ -32,17 +32,12 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu) goto out; } - spin_lock(&target_vcpu->arch.reset_cntx_lock); - reset_cntx = &target_vcpu->arch.guest_reset_context; /* start address */ - reset_cntx->sepc = cp->a1; + target_cp->sepc = cp->a1; /* target vcpu id to start */ - reset_cntx->a0 = target_vcpuid; + target_cp->a0 = target_vcpuid; /* private data passed from kernel */ - reset_cntx->a1 = cp->a2; - spin_unlock(&target_vcpu->arch.reset_cntx_lock); - - kvm_make_request(KVM_REQ_VCPU_RESET, target_vcpu); + target_cp->a1 = cp->a2; __kvm_riscv_vcpu_power_on(target_vcpu); @@ -63,6 +58,7 @@ static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu) goto out; } + kvm_make_request(KVM_REQ_VCPU_RESET, vcpu); __kvm_riscv_vcpu_power_off(vcpu); out: Regards, Anup