> On 22. Jul 2025, at 13:06, Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote: > > On 7/22/2025 6:35 PM, Mohamed Mediouni wrote: >>> On 22. Jul 2025, at 12:27, Xiaoyao Li<xiaoyao.li@xxxxxxxxx> wrote: >>> >>> On 7/22/2025 5:21 PM, Mathias Krause wrote: >>>> On 22.07.25 05:45, Xiaoyao Li wrote: >>>>> On 6/20/2025 3:42 AM, Mathias Krause wrote: >>>>>> KVM has a weird behaviour when a guest executes VMCALL on an AMD system >>>>>> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode >>>>>> exception (#UD) as they are just the wrong instruction for the CPU >>>>>> given. But instead of forwarding the exception to the guest, KVM tries >>>>>> to patch the guest instruction to match the host's actual hypercall >>>>>> instruction. That is doomed to fail as read-only code is rather the >>>>>> standard these days. But, instead of letting go the patching attempt and >>>>>> falling back to #UD injection, KVM injects the page fault instead. >>>>>> >>>>>> That's wrong on multiple levels. Not only isn't that a valid exception >>>>>> to be generated by these instructions, confusing attempts to handle >>>>>> them. It also destroys guest state by doing so, namely the value of CR2. >>>>>> >>>>>> Sean attempted to fix that in KVM[1] but the patch was never applied. >>>>>> >>>>>> Later, Oliver added a quirk bit in [2] so the behaviour can, at least, >>>>>> conceptually be disabled. Paolo even called out to add this very >>>>>> functionality to disable the quirk in QEMU[3]. So lets just do it. >>>>>> >>>>>> A new property 'hypercall-patching=on|off' is added, for the very >>>>>> unlikely case that there are setups that really need the patching. >>>>>> However, these would be vulnerable to memory corruption attacks freely >>>>>> overwriting code as they please. So, my guess is, there are exactly 0 >>>>>> systems out there requiring this quirk. >>>>> The default behavior is patching the hypercall for many years. >>>>> >>>>> If you desire to change the default behavior, please at least keep it >>>>> unchanged for old machine version. i.e., introduce compat_property, >>>>> which sets KVMState->hypercall_patching_enabled to true. >>>> Well, the thing is, KVM's patching is done with the effective >>>> permissions of the guest which means, if the code in question isn't >>>> writable from the guest's point of view, KVM's attempt to modify it will >>>> fail. This failure isn't transparent for the guest as it sees a #PF >>>> instead of a #UD, and that's what I'm trying to fix by disabling the quirk. >>>> The hypercall patching was introduced in Linux commit 7aa81cc04781 >>>> ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then >>>> it was based on a dedicated hypercall page that was handled by KVM to >>>> use the proper instruction of the KVM module in use (VMX or SVM). >>>> Patching code was fine back then, but the introduction of DEBUG_RO_DATA >>>> made the patching attempts fail and, ultimately, lead to Paolo handle >>>> this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL >>>> vs. VMMCALL if kernel text is read-only"). >>>> However, his change still doesn't account for the cross-vendor live >>>> migration case (Intel<->AMD), which will still be broken, causing the >>>> before mentioned bogus #PF, which will just lead to misleading Oops >>>> reports, confusing the poor souls, trying to make sense of it. >>>> IMHO, there is no valid reason for still having the patching in place as >>>> the .text of non-ancient kernel's will be write-protected, making >>>> patching attempts fail. And, as they fail with a #PF instead of #UD, the >>>> guest cannot even handle them appropriately, as there was no memory >>>> write attempt from its point of view. Therefore the default should be to >>>> disable it, IMO. This won't prevent guests making use of the wrong >>>> instruction from trapping, but, at least, now they'll get the correct >>>> exception vector and can handle it appropriately. >>> But you don't accout for the case that guest kernel is built without CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for whatever reason the guest's text is not readonly, and the VM needs to be migrated among different vendors (Intel <-> AMD). >>> >>> Before this patch, the above usecase works well. But with this patch, the guest will gets #UD after migrated to different vendors. >>> >>> I heard from some small CSPs that they do want to the ability to live migrate VMs among Intel and AMD host. >>> >> Is there a particular reason to not handle that #UD as a hypercall to support that scenario? > > do you mean KVM handles the first hardware #UD due to the wrong opcode of hypercall by directly emulate the hypercall instead of going to emulator to patch the guest memory with modifying the guest memory? > > If so, I agree with it. I thought the same solution and had no bandwidth and motivation to raise the idea to KVM community. > Yes, I think that’d be the best way to go in this case… -Mohamed >> Especially with some guest OS kernels having kernel patch protection with periodic scrubbing of r/o memory (ie Windows), doesn’t sound great to override anything in a way the guest OS kernel can notice… > >