On Wed, Jun 04, 2025 at 08:10:27PM +0000, Colton Lewis wrote: > Thank you Oliver for the additional explanation. > > Oliver Upton <oliver.upton@xxxxxxxxx> writes: > > > On Tue, Jun 03, 2025 at 09:32:41PM +0000, Colton Lewis wrote: > > > Oliver Upton <oliver.upton@xxxxxxxxx> writes: > > > > > On Mon, Jun 02, 2025 at 07:26:51PM +0000, Colton Lewis wrote: > > > > > static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) > > > > > { > > > > > + u8 hpmn = vcpu->kvm->arch.arm_pmu->hpmn; > > > > > + > > > > > preempt_disable(); > > > > > > /* > > > > > * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK > > > > > * to disable guest access to the profiling and trace buffers > > > > > */ > > > > > - vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, > > > > > - *host_data_ptr(nr_event_counters)); > > > > > - vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | > > > > > + vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn); > > > > > + vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD | > > > > > + MDCR_EL2_TPM | > > > > > This isn't safe, as there's no guarantee that kvm_arch::arm_pmu is > > > > pointing that the PMU for this CPU. KVM needs to derive HPMN from some > > > > per-CPU state, not anything tied to the VM/vCPU. > > > > I'm confused. Isn't this function preparing to run the vCPU on this > > > CPU? Why would it be pointing at a different PMU? > > > Because arm64 is a silly ecosystem and system designers can glue > > together heterogenous CPU implementations. The arm_pmu that KVM is > > pointing at might only match a subset of CPUs, but vCPUs migrate at the > > whim of the scheduler (and userspace). > > That means the arm_pmu field might at any time point to data that > doesn't represent the current CPU. I'm surprised that's not swapped out > anywhere. Seems like it would be useful to have an arch struct be a > reliable source of information about the current arch. There's no way to accomplish that. It is per-VM data, and you could have vCPUs on a mix of physical CPUs. This is mitigated somewhat when the VMM explicitly selects a PMU implementation, as we prevent vCPUs from actually entering the guest on an unsupported CPU (see ON_SUPPORTED_CPU flag). > > There are two *very* distinct functions w.r.t. partitioning: > > > 1) Partitioning of a particular arm_pmu that says how many counters the > > host can use > > > 2) VMM intentions to present a subset of the KVM-owned counter > > partition to its guest > > > #1 is modifying *global* state, we really can't mess with that in the > > context of a single VM... > > I see the distinction more clearly now. Since KVM can only control the > number of counters presented to the guest through HPMN, why would the > VMM ever choose a subset? If the host PMU is globally partitioned to not > use anything in the guest range, presenting fewer counters to a guest is > just leaving some counters in the middle of the range unused. You may not want to give a 'full' PMU to all VMs running on a system, but some OSes (Windows) expect to have at least the fixed CPU cycle counter present. In this case the VMM would deliberately expose fewer counters. FEAT_HPMN0 didn't get added to the architecture by accident... Thanks, Oliver