On Tue, May 13, 2025, James Houghton wrote: > On Tue, May 13, 2025 at 7:13 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Mon, May 12, 2025, James Houghton wrote: > > > On Thu, May 8, 2025 at 7:11 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > --- > > > > virt/kvm/dirty_ring.c | 10 ++++++++++ > > > > 1 file changed, 10 insertions(+) > > > > > > > > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c > > > > index e844e869e8c7..97cca0c02fd1 100644 > > > > --- a/virt/kvm/dirty_ring.c > > > > +++ b/virt/kvm/dirty_ring.c > > > > @@ -134,6 +134,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring, > > > > > > > > ring->reset_index++; > > > > (*nr_entries_reset)++; > > > > + > > > > + /* > > > > + * While the size of each ring is fixed, it's possible for the > > > > + * ring to be constantly re-dirtied/harvested while the reset > > > > + * is in-progress (the hard limit exists only to guard against > > > > + * wrapping the count into negative space). > > > > + */ > > > > + if (!first_round) > > > > + cond_resched(); > > > > > > Should we be dropping slots_lock here? > > > > Could we? Yes. Should we? Eh. I don't see any value in doing so, because in > > practice, it's extremely unlikely anything will be waiting on slots_lock. > > > > kvm_vm_ioctl_reset_dirty_pages() operates on all vCPUs, i.e. there won't be > > competing calls to reset other rings. A well-behaved userspace won't be modifying > > memslots or dirty logs, and won't be toggling nx_huge_pages. > > > > That leaves kvm_vm_ioctl_set_mem_attributes(), kvm_inhibit_apic_access_page(), > > kvm_assign_ioeventfd(), snp_launch_update(), and coalesced IO/MMIO (un)registration. > > Except for snp_launch_update(), those are all brutally slow paths, e.g. require > > SRCU synchronization and/or zapping of SPTEs. And snp_launch_update() is probably > > fairly slow too. > > Okay, that makes sense. As discussed offlist, dropping slots_lock would also be functionally problematic, as concurrent calls to KVM_RESET_DIRTY_RINGS could get interwoven, which could result in one of the calls returning to userspace without actually completing the reset, i.e. if a different task has reaped entries but not yet called kvm_reset_dirty_gfn(). > > And dropping slots_lock only makes any sense for non-preemptible kernels, because > > preemptible kernels include an equivalent check in KVM_MMU_UNLOCK(). > > I'm not really sure what "equivalent check" you mean, sorry. For preemptible > kernels, we could reschedule at any time, so dropping the slots_lock on a > cond_resched() wouldn't do much, yeah. Hopefully that's partially what you > mean. Ya, that's essentially what I mean. What I was referencing with KVM_MMU_UNLOCK() is the explicit check for NEED_RESCHED that happens when the preempt count hits '0' on preemptible kernels. > > It's also possible that dropping slots_lock in this case could be a net negative. > > I don't think it's likely, but I don't think it's any more or less likely that > > droppings slots_lock is a net positive. Without performance data to guide us, > > it'd be little more than a guess, and I really, really don't want to set a > > precedence of dropping a mutex on cond_resched() without a very strong reason > > for doing so. > > Fair enough. > > Also, while we're at it, could you add a > `lockdep_assert_held(&kvm->slots_lock)` to this function? :) Not necessarily > in this patch. Heh, yep, my mind jumped to that as well. I'll tack on a patch to add a lockdep assertion, along with a comment explaining what all it protects.