From: Michael Kelley <mhklinux@xxxxxxxxxxx> Sent: Friday, April 25, 2025 7:32 AM > > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Sent: Friday, April 25, 2025 7:04 AM > > > > On Mon, Apr 21, 2025 at 06:28:42PM +0000, Michael Kelley wrote: > > > > > > #ifdef CONFIG_X86_64 > > > > +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2) > > > > +{ > > > > + return U64_MAX; > > > > +} > > > > + > > > > +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail); > > > > + > > > > u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > > > > { > > > > u64 hv_status; > > > > > > > > + asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall) > > > > : "=a" (hv_status), ASM_CALL_CONSTRAINT, > > > > "+c" (control), "+d" (param1) > > > > + : "r" (__r8) > > > > : "cc", "memory", "r9", "r10", "r11"); > > > > > > > > return hv_status; > > > > } > > > > + > > > > +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2); > > > > + > > > > +static inline void hv_set_hypercall_pg(void *ptr) > > > > +{ > > > > + hv_hypercall_pg = ptr; > > > > + > > > > + if (!ptr) > > > > + ptr = &__hv_hyperfail; > > > > + static_call_update(__hv_hypercall, (hv_hypercall_f)ptr); > > > > +} > > > > ^ kept for reference, as I try and explain how static_call() works > > below. > > > > > > -skip_hypercall_pg_init: > > > > - /* > > > > - * Some versions of Hyper-V that provide IBT in guest VMs have a bug > > > > - * in that there's no ENDBR64 instruction at the entry to the > > > > - * hypercall page. Because hypercalls are invoked via an indirect call > > > > - * to the hypercall page, all hypercall attempts fail when IBT is > > > > - * enabled, and Linux panics. For such buggy versions, disable IBT. > > > > - * > > > > - * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall > > > > - * page, so if future Linux kernel versions enable IBT for 32-bit > > > > - * builds, additional hypercall page hackery will be required here > > > > - * to provide an ENDBR32. > > > > - */ > > > > -#ifdef CONFIG_X86_KERNEL_IBT > > > > - if (cpu_feature_enabled(X86_FEATURE_IBT) && > > > > - *(u32 *)hv_hypercall_pg != gen_endbr()) { > > > > - setup_clear_cpu_cap(X86_FEATURE_IBT); > > > > - pr_warn("Disabling IBT because of Hyper-V bug\n"); > > > > - } > > > > -#endif > > > > > > With this patch set, it's nice to see IBT working in a Hyper-V guest! > > > I had previously tested IBT with some hackery to the hypercall page > > > to add the missing ENDBR64, and didn't see any problems. Same > > > after these changes -- no complaints from IBT. > > > > No indirect calls left, no IBT complaints ;-) > > > > > > + hv_set_hypercall_pg(hv_hypercall_pg); > > > > > > > > +skip_hypercall_pg_init: > > > > /* > > > > * hyperv_init() is called before LAPIC is initialized: see > > > > * apic_intr_mode_init() -> x86_platform.apic_post_init() and > > > > @@ -658,7 +658,7 @@ void hyperv_cleanup(void) > > > > * let hypercall operations fail safely rather than > > > > * panic the kernel for using invalid hypercall page > > > > */ > > > > - hv_hypercall_pg = NULL; > > > > + hv_set_hypercall_pg(NULL); > > > > > > This causes a hang getting into the kdump kernel after a panic. > > > hyperv_cleanup() is called after native_machine_crash_shutdown() > > > has done crash_smp_send_stop() on all the other CPUs. I don't know > > > the details of how static_call_update() works, > > > > Right, so let me try and explain this :-) > > > > So we get the compiler to emit direct calls (CALL/JMP) to symbols > > prefixed with "__SCT__", in this case from asm, but more usually by > > means of the static_call() macro mess. > > > > Meanwhile DEFINE_STATIC_CALL() ensures such a symbol actually exists. > > This symbol is a little trampoline that redirects to the actual > > target function given to DEFINE_STATIC_CALL() -- __hv_hyperfail() in the > > above case. > > > > Then objtool runs through the resulting object file and stores the > > location of every call to these __STC__ prefixed symbols in a custom > > section. > > > > This enables static_call init (boot time) to go through the section and > > rewrite all the trampoline calls to direct calls to the target. > > Subsequent static_call_update() calls will again rewrite the direct call > > to point elsewhere. > > > > So very much how static_branch() does a NOP/JMP rewrite to toggle > > branches, static_call() rewrites (direct) call targets. > > > > > but it's easy to imagine that > > > it wouldn't work when the kernel is in such a state. > > > > > > The original code setting hv_hypercall_pg to NULL is just tidiness. > > > Other CPUs are stopped and can't be making hypercalls, and this CPU > > > shouldn't be making hypercalls either, so setting it to NULL more > > > cleanly catches some erroneous hypercall (vs. accessing the hypercall > > > page after Hyper-V has been told to reset it). > > > > So if you look at (retained above) hv_set_hypercall_pg(), when given > > NULL, the call target is set to __hv_hyperfail(), which does an > > unconditional U64_MAX return. > > > > Combined with the fact that the thing *should* not be doing hypercalls > > anymore at this point, something is iffy. > > > > I can easily remove it, but it *should* be equivalent to before, where > > it dynamicall checked for hv_hypercall_pg being NULL. > > I agree that setting the call target to __hv_hyperfail() should be good. > But my theory is that static_call_update() is hanging when trying to > do the rewrite, because of the state of the other CPUs. I don't think > control is ever returning from static_call_update() when invoked > through hyperv_cleanup(). Wouldn't static_call_update() need to park > the other CPUs temporarily and/or flush instruction caches to make > everything consistent? > > But that's just my theory. I'll run a few more experiments to confirm > if control ever returns from static_call_update() in this case. > Indeed, control never returns from static_call_update(). Prior to hyperv_cleanup() running, crash_smp_send_stop() has been called to stop all the other CPUs, and it does not update cpu_online_mask to reflect the other CPUs being stopped. static_call_update() runs this call sequence: arch_static_call_transform() __static_call_transform() smp_text_poke_single() smp_text_poke_batch_finish() smp_text_poke_sync_each_cpu() smp_text_poke_sync_each_cpu() sends an IPI to each CPU in cpu_online_mask, and of course the other CPUs never respond, so it waits forever. Michael