On Tue, Jul 8, 2025 at 1:58 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Tue, Jul 08, 2025, Srikanth Aithal wrote: > > Hello all, > > KVM unit test suite for SVM is regressing on the AMD EPYC Turin platform > > (Zen 5) for a while now, even on latest linux-next[https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tag/?h= > > next-20250704]. The same seem to work fine with linux-next tag > > next-20250505. > > The TSC delay test fails intermittently (approximately once in three runs) > > with an unexpected result (expected: 50, actual: 49). This test passed > > consistently on earlier tags (e.g., next-20250505) and on non-Turin > > platforms. > > Stating the obvious to some extent, I suspect it's something to do with Turin, > not a KVM issue. This fails on our Turin hosts as far back as v6.12, i.e. long > before next-20250505 (I haven't bothered checking earlier builds), and AFAICT > the KUT test isn't doing anything to actually stress KVM itself. I.e. I would > expect KVM bugs to manifest as blatant, 100% reproducible failures, not random > TSC slop. I think the final test case is broken, actually. The test case is: svm_tsc_scale_run_testcase(50, 0.0001, rdrand()); So, guest_tsc_delay_value is (u64)((50 << 24) * 0.0001), which is 83886. Note that this is 83886.080000000002 truncated. If L2 exits after 83886 scaled TSC cycles, the "duration" spent in L2 will be (u64)(83886 / 0.0001) >> 24, which is 49. To get up to 50, we have to accumulate an additional (0.080000000002 / 0.0001 = 800.0000000199999) cycles between the two rdtsc() operations bracketing the svm_vmrun() in L1 . The test probably passes on other CPUs because emulated VMRUN and #VMEXIT add those 800 cycles. Instead of truncating ((50 << 24) * 0.0001), I think we should calculate guest_tsc_delay_value as ceil((50 << 24) * 0.0001). Something like this: diff --git a/x86/svm_tests.c b/x86/svm_tests.c index 9358c1f0383a..1bfe11045bd1 100644 --- a/x86/svm_tests.c +++ b/x86/svm_tests.c @@ -891,6 +891,8 @@ static void svm_tsc_scale_run_testcase(u64 duration, u64 start_tsc, actual_duration; guest_tsc_delay_value = (duration << TSC_SHIFT) * tsc_scale; + if (guest_tsc_delay_value < (duration << TSC_SHIFT) * tsc_scale) + guest_tsc_delay_value++; test_set_guest(svm_tsc_scale_guest); vmcb->control.tsc_offset = tsc_offset; Even then, equality of duration and actual_duration is only guaranteed if there are no significant delays during the measurement.