Re: [PATCH 06/17] KVM: arm64: Introduce method to partition the PMU

Oliver Upton <oliver.upton@xxxxxxxxx> · Wed, 4 Jun 2025 13:57:49 -0700

On Wed, Jun 04, 2025 at 08:10:27PM +0000, Colton Lewis wrote:
> Thank you Oliver for the additional explanation.
> 
> Oliver Upton <oliver.upton@xxxxxxxxx> writes:
> 
> > On Tue, Jun 03, 2025 at 09:32:41PM +0000, Colton Lewis wrote:
> > > Oliver Upton <oliver.upton@xxxxxxxxx> writes:
> 
> > > > On Mon, Jun 02, 2025 at 07:26:51PM +0000, Colton Lewis wrote:
> > > > >   static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
> > > > >   {
> > > > > +	u8 hpmn = vcpu->kvm->arch.arm_pmu->hpmn;
> > > > > +
> > > > >   	preempt_disable();
> 
> > > > >   	/*
> > > > >   	 * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
> > > > >   	 * to disable guest access to the profiling and trace buffers
> > > > >   	 */
> > > > > -	vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
> > > > > -					 *host_data_ptr(nr_event_counters));
> > > > > -	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> > > > > +	vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
> > > > > +	vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD |
> > > > > +				MDCR_EL2_TPM |
> 
> > > > This isn't safe, as there's no guarantee that kvm_arch::arm_pmu is
> > > > pointing that the PMU for this CPU. KVM needs to derive HPMN from some
> > > > per-CPU state, not anything tied to the VM/vCPU.
> 
> > > I'm confused. Isn't this function preparing to run the vCPU on this
> > > CPU? Why would it be pointing at a different PMU?
> 
> > Because arm64 is a silly ecosystem and system designers can glue
> > together heterogenous CPU implementations. The arm_pmu that KVM is
> > pointing at might only match a subset of CPUs, but vCPUs migrate at the
> > whim of the scheduler (and userspace).
> 
> That means the arm_pmu field might at any time point to data that
> doesn't represent the current CPU. I'm surprised that's not swapped out
> anywhere. Seems like it would be useful to have an arch struct be a
> reliable source of information about the current arch.

There's no way to accomplish that. It is per-VM data, and you could have
vCPUs on a mix of physical CPUs.

This is mitigated somewhat when the VMM explicitly selects a PMU
implementation, as we prevent vCPUs from actually entering the guest on
an unsupported CPU (see ON_SUPPORTED_CPU flag).

> > There are two *very* distinct functions w.r.t. partitioning:
> 
> >   1) Partitioning of a particular arm_pmu that says how many counters the
> >   host can use
> 
> >   2) VMM intentions to present a subset of the KVM-owned counter
> >   partition to its guest
> 
> > #1 is modifying *global* state, we really can't mess with that in the
> > context of a single VM...
> 
> I see the distinction more clearly now. Since KVM can only control the
> number of counters presented to the guest through HPMN, why would the
> VMM ever choose a subset? If the host PMU is globally partitioned to not
> use anything in the guest range, presenting fewer counters to a guest is
> just leaving some counters in the middle of the range unused.

You may not want to give a 'full' PMU to all VMs running on a system,
but some OSes (Windows) expect to have at least the fixed CPU cycle
counter present. In this case the VMM would deliberately expose fewer
counters. FEAT_HPMN0 didn't get added to the architecture by accident...

Thanks,
Oliver