+David On Wed, Mar 19, 2025, Ming Lin wrote: > Hi, > > After performing a live migration on a QEMU guest OS that had been running for over 30 days, > we noticed that the guest OS time was more than 2 seconds behind the actual time. > > After extensive debugging, we found that this issue is related to master_kernel_ns and master_cycle_now. > > When the guest OS starts, the host initializes a pair of master_kernel_ns and master_cycle_now values. > After live migration, the host updates these values. > > Our debugging showed that if the host does not update master_kernel_ns/master_cycle_now, > the guest OS time remains correct. > > To illustrate how updating master_kernel_ns/master_cycle_now leads to the guest OS time drifting backward, > we applied the following debug patch: > > The patch adds a KVM debugfs entry to trigger time calculations and print the results. > The patch runs on the host side, but we use __pvclock_read_cycles() to simulate the guest OS updating its time. > > Example Output: > > # cat /sys/kernel/debug/kvm/946-13/pvclock > old: master_kernel_ns: 15119778316 > old: master_cycle_now: 37225912658 > old: ns: 1893199569691 > new: master_kernel_ns: 1908210098649 > new: master_cycle_now: 4391329912268 > new: ns: 1893199548401 > > tsc 4391329912368 > kvmclock_offset -15010550291 > diff: ns: 21290 > > Explanation of Parameters: > > Input: > "old: master_kernel_ns:" The master_kernel_ns value recorded when the guest OS started (remains unchanged during testing). > "old: master_cycle_now:" The master_cycle_now value recorded when the guest OS started (remains unchanged during testing). > "new: master_kernel_ns:" The latest master_kernel_ns value at the time of reading. > "new: master_cycle_now:" The latest master_cycle_now value at the time of reading. > tsc: The rdtsc() value at the time of reading. > kvmclock_offset: The offset recorded by KVM_SET_CLOCK when the guest OS started (remains unchanged during testing). > > Output: > "old: ns:" Time in nanoseconds calculated using the old master_kernel_ns/master_cycle_now. > "new: ns:" Time in nanoseconds calculated using the new master_kernel_ns/master_cycle_now. > "diff: ns:" (old ns - new ns), representing the time drift relative to the guest OS start time. > > Test Script: > #!/bin/bash > > qemu_pid=$(pidof qemu-system-x86_64) > > while [ 1 ] ; do > echo "=====================================" > echo "Guest OS running time: $(ps -p $qemu_pid -o etime= | awk '{print $1}')" > cat /sys/kernel/debug/kvm/*/pvclock > echo > sleep 10 > done > > Test Results: > Below are the first and last parts of a >2-hour test run. > As time progresses, the time drift calculated using the latest master_kernel_ns/master_cycle_now increases monotonically. > > After 2 hours and 18 minutes, the guest OS time drifted by approximately 93 milliseconds. > > I have uploaded an image for a more intuitive visualization of the time drift: > https://postimg.cc/crCDWtD7 > > Is this a real problem? David can confirm, but I'm pretty sure the drift you are observing is addressed by David's series to fix a plethora of kvmclock warts. https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@xxxxxxxxxxxxx > If there is any fix patch, I’d be happy to test it. Thanks! > > > 1 ===================================== > 2 guest os running time: 00:50 > 3 old: master_kernel_ns: 15119778316 > 4 old: master_cycle_now: 37225912658 > 5 old: ns: 48092694964 > 6 new: master_kernel_ns: 63103244699 > 7 new: master_cycle_now: 147587790614 > 8 new: ns: 48092694425 > 9 > 10 tsc 147587790654 > 11 kvmclock_offset -15010550291 > 12 diff: ns: 539 > 13 > 14 ===================================== > 15 guest os running time: 01:00 > 16 old: master_kernel_ns: 15119778316 > 17 old: master_cycle_now: 37225912658 > 18 old: ns: 58139026532 > 19 new: master_kernel_ns: 73149576143 > 20 new: master_cycle_now: 170694333104 > 21 new: ns: 58139025879 > 22 > 23 tsc 170694333168 > 24 kvmclock_offset -15010550291 > 25 diff: ns: 653 > 26 > 27 ===================================== > 28 guest os running time: 01:10 > 29 old: master_kernel_ns: 15119778316 > 30 old: master_cycle_now: 37225912658 > 31 old: ns: 68183772122 > 32 new: master_kernel_ns: 83194321616 > 33 new: master_cycle_now: 193797227862 > 34 new: ns: 68183771357 > 35 > 36 tsc 193797227936 > 37 kvmclock_offset -15010550291 > 38 diff: ns: 765 > 39 > 40 ===================================== > 41 guest os running time: 01:20 > 42 old: master_kernel_ns: 15119778316 > 43 old: master_cycle_now: 37225912658 > 44 old: ns: 78225289157 > 45 new: master_kernel_ns: 93235838545 > 46 new: master_cycle_now: 216892696976 > 47 new: ns: 78225288279 > 48 > 49 tsc 216892697034 > 50 kvmclock_offset -15010550291 > 51 diff: ns: 878 > 52 > 53 ===================================== > 54 guest os running time: 01:30 > 55 old: master_kernel_ns: 15119778316 > 56 old: master_cycle_now: 37225912658 > 57 old: ns: 88268955340 > 58 new: master_kernel_ns: 103279504612 > 59 new: master_cycle_now: 239993109102 > 60 new: ns: 88268954349 > 61 > 62 tsc 239993109168 > 63 kvmclock_offset -15010550291 > 64 diff: ns: 991 > 65 > 66 ===================================== > 67 guest os running time: 01:40 > 68 old: master_kernel_ns: 15119778316 > 69 old: master_cycle_now: 37225912658 > 70 old: ns: 98313212581 > 71 new: master_kernel_ns: 113323761740 > 72 new: master_cycle_now: 263094880668 > 73 new: ns: 98313211476 > 74 > 75 tsc 263094880732 > 76 kvmclock_offset -15010550291 > 77 diff: ns: 1105 > ..... > ..... > 10160 ===================================== > 10161 guest os running time: 02:17:11 > 10162 old: master_kernel_ns: 15119778316 > 10163 old: master_cycle_now: 37225912658 > 10164 old: ns: 8229817213297 > 10165 new: master_kernel_ns: 8244827670997 > 10166 new: master_cycle_now: 18965537819524 > 10167 new: ns: 8229817120748 > 10168 > 10169 tsc 18965537819622 > 10170 kvmclock_offset -15010550291 > 10171 diff: ns: 92549 > 10172 ===================================== > 10173 guest os running time: 02:17:21 > 10174 old: master_kernel_ns: 15119778316 > 10175 old: master_cycle_now: 37225912658 > 10176 old: ns: 8239861074959 > 10177 new: master_kernel_ns: 8254871532564 > 10178 new: master_cycle_now: 18988638681302 > 10179 new: ns: 8239860982297 > 10180 > 10181 tsc 18988638681358 > 10182 kvmclock_offset -15010550291 > 10183 diff: ns: 92662 > 10184 ===================================== > 10185 guest os running time: 02:17:31 > 10186 old: master_kernel_ns: 15119778316 > 10187 old: master_cycle_now: 37225912658 > 10188 old: ns: 8249904622988 > 10189 new: master_kernel_ns: 8264915080459 > 10190 new: master_cycle_now: 19011738821632 > 10191 new: ns: 8249904530213 > 10192 > 10193 tsc 19011738821736 > 10194 kvmclock_offset -15010550291 > 10195 diff: ns: 92775^@ > 10196 ===================================== > 10197 guest os running time: 02:17:41 > 10198 old: master_kernel_ns: 15119778316 > 10199 old: master_cycle_now: 37225912658 > 10200 old: ns: 8259949369203 > 10201 new: master_kernel_ns: 8274959826576 > 10202 new: master_cycle_now: 19034841717872 > 10203 new: ns: 8259949276315 > 10204 > 10205 tsc 19034841717942 > 10206 kvmclock_offset -15010550291 > 10207 diff: ns: 92888 > 10208 ===================================== > 10209 guest os running time: 02:17:51 > 10210 old: master_kernel_ns: 15119778316 > 10211 old: master_cycle_now: 37225912658 > 10212 old: ns: 8269996849598 > 10213 new: master_kernel_ns: 8285007306846 > 10214 new: master_cycle_now: 19057950902658 > 10215 new: ns: 8269996756597 > 10216 > 10217 tsc 19057950902756 > 10218 kvmclock_offset -15010550291 > 10219 diff: ns: 93001^@ > 10220 ===================================== > 10221 guest os running time: 02:18:02 > 10222 old: master_kernel_ns: 15119778316 > 10223 old: master_cycle_now: 37225912658 > 10224 old: ns: 8280039094317 > 10225 new: master_kernel_ns: 8295049551453 > 10226 new: master_cycle_now: 19081048045430 > 10227 new: ns: 8280039001203 > 10228 > 10229 tsc 19081048045526 > 10230 kvmclock_offset -15010550291 > 10231 diff: ns: 93114^@ > > > > pvclock debugfs patch > --- > arch/x86/include/asm/kvm_host.h | 4 +++ > arch/x86/kvm/x86.c | 29 +++++++++++++++- > b.sh | 1 + > virt/kvm/kvm_main.c | 75 +++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 108 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 32ae3aa50c7e..5a82a69bfe7a 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1411,6 +1411,10 @@ struct kvm_arch { > struct delayed_work kvmclock_update_work; > struct delayed_work kvmclock_sync_work; > + u64 old_master_kernel_ns; > + u64 old_master_cycle_now; > + s64 old_kvmclock_offset; > + > struct kvm_xen_hvm_config xen_hvm_config; > /* reads protected by irq_srcu, writes by irq_lock */ > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4b64ab350bcd..a56511ed8c5b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2819,7 +2819,7 @@ static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp, > * As with get_kvmclock_base_ns(), this counts from boot time, at the > * frequency of CLOCK_MONOTONIC_RAW (hence adding gtos->offs_boot). > */ > -static int do_kvmclock_base(s64 *t, u64 *tsc_timestamp) > +int do_kvmclock_base(s64 *t, u64 *tsc_timestamp) > { > struct pvclock_gtod_data *gtod = &pvclock_gtod_data; > unsigned long seq; > @@ -2861,6 +2861,27 @@ static int do_monotonic(s64 *t, u64 *tsc_timestamp) > return mode; > } > +u64 mydebug_get_kvmclock_ns(u64 master_kernel_ns, u64 master_cycle_now, s64 kvmclock_offset, u64 tsc) > +{ > + struct pvclock_vcpu_time_info hv_clock; > + u64 ret; > + > + hv_clock.tsc_timestamp = master_cycle_now; > + hv_clock.system_time = master_kernel_ns + kvmclock_offset; > + > + /* both __this_cpu_read() and rdtsc() should be on the same cpu */ > + get_cpu(); > + > + kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, > + &hv_clock.tsc_shift, > + &hv_clock.tsc_to_system_mul); > + ret = __pvclock_read_cycles(&hv_clock, tsc); > + > + put_cpu(); > + > + return ret; > +} > + > static int do_realtime(struct timespec64 *ts, u64 *tsc_timestamp) > { > struct pvclock_gtod_data *gtod = &pvclock_gtod_data; > @@ -2988,6 +3009,10 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) > host_tsc_clocksource = kvm_get_time_and_clockread( > &ka->master_kernel_ns, > &ka->master_cycle_now); > + ka->old_master_kernel_ns = ka->master_kernel_ns; > + ka->old_master_cycle_now = ka->master_cycle_now; > + printk("MYDEBUG: old_master_kernel_ns = %llu, old_master_cycle_now = %llu\n", > + ka->old_master_kernel_ns, ka->old_master_cycle_now); > ka->use_master_clock = host_tsc_clocksource && vcpus_matched > && !ka->backwards_tsc_observed > @@ -6989,6 +7014,8 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) > else > now_raw_ns = get_kvmclock_base_ns(); > ka->kvmclock_offset = data.clock - now_raw_ns; > + ka->old_kvmclock_offset = ka->kvmclock_offset; > + printk("MYDEBUG: old_kvmclock_offset = %lld\n", ka->old_kvmclock_offset); > kvm_end_pvclock_update(kvm); > return 0; > } > diff --git a/b.sh b/b.sh > new file mode 120000 > index 000000000000..0ff9a93fd53f > --- /dev/null > +++ b/b.sh > @@ -0,0 +1 @@ > +/home/mlin/build.upstream/b.sh > \ No newline at end of file > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index ba0327e2d0d3..d6b9a6e7275e 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -399,6 +399,7 @@ int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capacity, > return mc->nobjs >= min ? 0 : -ENOMEM; > mc->objects[mc->nobjs++] = obj; > } > + > return 0; > } > @@ -998,6 +999,78 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm) > } > } > +extern int do_kvmclock_base(s64 *t, u64 *tsc_timestamp); > +extern u64 mydebug_get_kvmclock_ns(u64 master_kernel_ns, u64 master_cycle_now, s64 kvmclock_offset, u64 tsc); > + > +static ssize_t kvm_mydebug_pvclock_read(struct file *file, char __user *buf, > + size_t len, loff_t *ppos) > +{ > + struct kvm *kvm = file->private_data; > + struct kvm_arch *ka; > + char buffer[256]; > + ssize_t ret, copied; > + u64 new_master_kernel_ns; > + u64 new_master_cycle_now; > + u64 old_ns, new_ns; > + u64 tsc; > + > + if (!kvm) { > + pr_err("file->private_data is NULL\n"); > + return -EINVAL; > + } > + > + ka = &kvm->arch; > + > + do_kvmclock_base(&new_master_kernel_ns, &new_master_cycle_now); > + > + tsc = rdtsc(); > + > + old_ns = mydebug_get_kvmclock_ns(ka->old_master_kernel_ns, ka->old_master_cycle_now, ka->old_kvmclock_offset, tsc); > + new_ns = mydebug_get_kvmclock_ns(new_master_kernel_ns, new_master_cycle_now, ka->old_kvmclock_offset, tsc); > + > + ret = snprintf(buffer, sizeof(buffer), > + "old: master_kernel_ns: %llu\n" > + "old: master_cycle_now: %llu\n" > + "old: ns: %llu\n" > + "new: master_kernel_ns: %llu\n" > + "new: master_cycle_now: %llu\n" > + "new: ns: %llu\n\n" > + "tsc %llu\n" > + "kvmclock_offset %lld\n" > + "diff: ns: %lld\n", > + ka->old_master_kernel_ns, ka->old_master_cycle_now, old_ns, > + new_master_kernel_ns, new_master_cycle_now, new_ns, > + tsc, ka->old_kvmclock_offset, > + old_ns - new_ns > + ); > + > + if (ret < 0) > + return ret; > + > + if ((size_t)ret > sizeof(buffer)) > + ret = sizeof(buffer); > + > + if (*ppos >= ret) > + return 0; /* EOF */ > + > + copied = min(len, (size_t)(ret - *ppos)); > + > + if (copy_to_user(buf, buffer + *ppos, copied)) { > + pr_err("copy_to_user failed\n"); > + return -EFAULT; > + } > + > + *ppos += copied; > + > + return copied; > +} > + > +static const struct file_operations kvm_pvclock_fops = { > + .owner = THIS_MODULE, > + .read = kvm_mydebug_pvclock_read, > + .open = simple_open, > +}; > + > static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname) > { > static DEFINE_MUTEX(kvm_debugfs_lock); > @@ -1063,6 +1136,8 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname) > &stat_fops_per_vm); > } > + debugfs_create_file("pvclock", 0444, kvm->debugfs_dentry, kvm, &kvm_pvclock_fops); > + > kvm_arch_create_vm_debugfs(kvm); > return 0; > out_err: >