On Wed, May 21, 2025 at 11:19 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, May 16, 2025, Dionna Amalie Glaze wrote: > > > > @@ -4015,6 +4042,12 @@ static int snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_ > > > > > > > > mutex_lock(&sev->guest_req_mutex); > > > > > > > > + if (!__ratelimit(&sev->snp_guest_msg_rs)) { > > > > + svm_vmgexit_no_action(svm, SNP_GUEST_ERR(SNP_GUEST_VMM_ERR_BUSY, 0)); > > > > + ret = 1; > > > > + goto out_unlock; > > > > > > Can you (or anyone) explain what a well-behaved guest will do in in response to > > > BUSY? And/or explain why KVM injecting an error into the guest is better than > > > exiting to userspace. > > > > Ah, I missed this question. The guest is meant to back off and try again > > after waiting a bit. This is the behavior added in > > https://lore.kernel.org/all/20230214164638.1189804-2-dionnaglaze@xxxxxxxxxx/ > > Nice, it's already landed and considered legal VMM behavior. > > > If KVM returns to userspace with an exit type that the guest request was > > throttled, then what is user space supposed to do with that? > > The userspace exit doesn't have to notify userspace that the guest was throttled, > e.g. KVM could exit on _every_ request and let userspace do its own throttling. > > I have no idea whether or not that's sane/useful, which is why I'm asking. The > cover letter, changelog, and documentation are all painfully sparse with respect > to explaining why *this* uAPI is the right uAPI. > > > It could wait a bit before trying KVM_RUN again, but with the enlightened > > method, the guest could at least work on other kernel tasks while it waits > > for its turn to get an attestation report. > > Nothing prevents KVM from providing userspace a way to communicate VMM_ERR_BUSY, > e.g. as done for KVM_EXIT_SNP_REQ_CERTS: > > https://lore.kernel.org/all/20250428195113.392303-2-michael.roth@xxxxxxx > > > Perhaps this is me not understanding the preferred KVM way of doing things. > > The only real preference at play is to not end up with uAPI and ABI that doesn't > fit "everyone's" needs. It's impossible to fully future-proof KVM's ABI, but we > can at least perform due diligence to ensure we didn't simply pick the the path > of least resistance. > > The bar gets lowered a tiny bit if we go with a module param (which I think we > should do), but I'd still like an explanation of why a fairly simple ratelimiting > mechanism is the best overall approach. Before I send out a revised patchset with changed commit text, what do you think about the following The AMD-SP is a precious resource that doesn't have a scheduler other than a mutex lock queue. To avoid customers from causing a DoS, a kernel module parameter for rate limiting guest requests is added. [Addition:] The kernel module parameter is a lower bound kernel-imposed rate limit for any SEV-SNP VM-initiated guest request. This does not preclude the addition of a new KVM exit type for SEV-SNP guest requests for userspace to impose any additional throttling logic. The default value of 0 maintains the previous behavior that there is no imposed rate limit on guest requests. We could still ask Michael to change KVM_EXIT_SNP_REQ_CERTS to KVM_EXIT_SNP_GUEST_REQ and for the exit structure to include msg_type as well as the gfn+npages when the kind is an extended request for an attestation report so that we don't need to have two exit types. Regardless of that change for additional throttling opportunities, I think the system-wide imposed lower bound is important for quelling noisy neighbors to some degree. -- -Dionna Glaze, PhD, CISSP, CCSP (she/her)