On Mon, 2025-05-19 at 10:06 -0700, Sean Christopherson wrote: > On Mon, May 19, 2025, Rick P Edgecombe wrote: > > On Mon, 2025-05-19 at 06:33 -0700, Sean Christopherson wrote: > > > Was this hit by a real VMM? If so, why is a TDX VMM removing a memslot without > > > kicking vCPUs out of KVM? > > > > > > Regardless, I would prefer not to add a new RET_PF_* flag for this. At a glance, > > > KVM can simply drop and reacquire SRCU in the relevant paths. > > > > During the initial debugging and kicking around stage, this is the first > > direction we looked. But kvm_gmem_populate() doesn't have scru locked, so then > > kvm_tdp_map_page() tries to unlock without it being held. (although that version > > didn't check r == RET_PF_RETRY like you had). Yan had the following concerns and > > came up with the version in this series, which we held review on for the list: > > Ah, I missed the kvm_gmem_populate() => kvm_tdp_map_page() chain. > > > > However, upon further consideration, I am reluctant to implement this fix for > > Which fix? The one considered when debugging the issue. It was pretty much exactly like you first suggested, but with the scru lock taken in tdx_gmem_post_populate(). Since it was pretty much the same I just shared Yan's comment on it. In case it looks like internal code review, here is some more history: 1. Reinette reports bug from internal test, wondering if it's valid userspace behavior 2. I suggest scru root cause 3. Yan provides specific diff (pretty much what you suggested) for Reinette to test, who finds the post_populate case generates a warning 4. Yan looks at fixing up post_populate case, but decides she doesn't like it (the quoted blurb) and develops the alternative in this series > > > the following reasons: > > > - kvm_gmem_populate() already holds the kvm->slots_lock. > > > - While retrying with srcu unlock and lock can workaround the > > > KVM_MEMSLOT_INVALID deadlock, it results in each kvm_vcpu_pre_fault_memory() > > > and tdx_handle_ept_violation() faulting with different memslot layouts. > > This behavior has existed since pretty much the beginning of KVM time. > The non-tdx issues today are related to the pre-fault memory stuff, which doesn't enter the guest for a different reason. > TDX is the > oddball that doesn't re-enter the guest. All other flavors re-enter the guest on > RET_PF_RETRY, which means dropping and reacquiring SRCU. Which is why I don't like > RET_PF_RETRY_INVALID_SLOT; it's simply handling the case we know about. > > Arguably, _TDX_ is buggy by not providing this behavior. Long term the zero step issue needs to be resolved. So yea we should avoid building around it for anything not-critical (like this). But I don't see why prefault is not in the same category of oddness. > > > I'm not sure why the second one is really a problem. For the first one I think > > that path could just take the scru lock in the proper order with kvm- > > > slots_lock? > > Acquiring SRCU inside slots_lock should be fine. The reserve order would be > problematic, as KVM synchronizes SRCU while holding slots_lock. > > That said, I don't love the idea of grabbing SRCU, because it's so obviously a > hack. What about something like this? Like Reinette noticed it doesn't pass the included test. It looks like synchronize_srcu_expedited() is not waiting any longer, but there is some other RET_PF_RETRY loop not caused by KVM_MEMSLOT_INVALID. Must be some side effect. Will debug further.