KVM: SMP vm created on host with unstable TSC; guest TSC will not be reliable

ieronymous

Member
Apr 1, 2019
225
10
23
42
Hi

As the title clearly states, I noticed that message when I started a Vm with qm start command and noticed that at syslog (probably it is going on for a while or since the start of the VM creation).

Searching for this error, doesn t really help a lot. To me it seems like an incompatibility between the virtual cpu and the real one? Like the xeon e2600 v3 series has a bug or something? Anyway I found only one source with steps of what to check which was the below....

<<<<Well it turns out that issue is the clock source used by kvm on the host system. Have a look at the output of:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
if it's tcs or tcs_early you have found the culprit, change it to one of the other available clock sources on your system: (Mine was hpet though - high performance event timer)

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
For example hpet (high performance event timer): (Mine gives as available solutions hpet and acpi_pm)

$ echo hpet | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource >>>>

So why am I getting this message?

Thank you
 
Last edited:
  • Like
Reactions: xionoix

xionoix

Member
Jul 20, 2016
2
1
23
are you clustered? I had one box on hpet and the other on tsc... no idea why.
I got the error while I was live migrating a VM. The machine that I created it on has an E3-1220v3 (Haswell) and I migrated it to an E5-2630v2 (Ivy Bridge). Both of these CPUs should be new enough to use TSC and surprisingly the newer Ivy Bridge only shows hpet and acpi_pm.

I don't remember seeing this message previously and I just updated to 7.2 as the Ivy Bridge is a fresh install.

This link says that it is related to how the CPU handles the timing, and although TSC is better it isn't always stable for older cores referencing 1st gen Core architecture and these are old but not that old. I didn't get through the whole thread:
https://community.intel.com/t5/Soft...e/TSC-Synchronization-Across-Cores/m-p/932561

I initially changed the older host to hpet thinking it was referencing where the VM was created. When I migrated it, although the vm disk moved, I think the VM is actually recreated with the same ID and configuration and that because the newer box is on hpet, it had an issue referencing tsc. The Guest is using kvm-clock not sure how that syncs with either.

I tried setting the newer one to tsc and this is what happened:
root@monster:~# echo tsc | tee /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
root@monster:~#
root@monster:~#
root@monster:~#
root@monster:~# dmesg | grep -i tsc
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2593.493 MHz processor
[ 0.078328] TSC deadline timer available
[ 0.229643] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2562392c760, max_idle_ns: 440795222818 ns
[ 0.241649] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.241649] Measured 213348 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.241649] tsc: Marking TSC unstable due to check_tsc_sync_source failed
root@monster:~# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet


I checked and the motherboard is running a super old firmware, I updated it and ...

root@monster:~# dmesg | grep -i tsc
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2593.631 MHz processor
[ 0.078247] TSC deadline timer available
[ 0.229559] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2562bb84805, max_idle_ns: 440795244892 ns
[ 0.245565] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.245565] Measured 196192 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.245565] tsc: Marking TSC unstable due to check_tsc_sync_source failed

I give up... for now. HPET is good enough for me. I think. :)
 

ieronymous

Member
Apr 1, 2019
225
10
23
42
are you clustered?
Nope

Both of these CPUs should be new enough to use TSC
How is the assumption TSC is the newer

TSC is better it isn't always stable for older cores
Mine is an E5-2660 V3 so not old at all, even a few generations back

The Guest is using kvm-clock
Mine since is a WinServ VM that acts as an NTP as well, from it's options I have set it to NO for using local time for RTC (probably irrelevant)

I checked and the motherboard is running a super old firmware
Mine runs the latest
 
Last edited:

xionoix

Member
Jul 20, 2016
2
1
23
How is the assumption TSC is the newer
The CPU is new enough to use TSC. Apparently Nehalem introduced better handling of TSC. I am understanding that it is preferred for virtualization, and for multi-core CPUs due to the way HPET is processed it will require more overhead.

"On Nehalem and later cpus, the rdtscp instruction returns the TSC and an identifier indicating on which cpu you read the TSC. RDTSCP is a serializing instruction... unlike the regular rdtsc instruction."
https://community.intel.com/t5/Soft...e/TSC-Synchronization-Across-Cores/m-p/932561

Mine since is a WinServ VM that acts as an NTP as well, from it's options I have set it to NO for using local time for RTC (probably irrelevant)
TSC/HPET are for the CPU cycle timing so that all of the CPU cycles don't step on each other and multi-threaded processes can actually function. The CPU clock will function just fine if it the the host cores and guest cores are all in sync, but have no idea what year it is. Setting the time helps it interact with other systems outside of the bus and for us to relate effectively to it. I am sure this explanation is missing some technicalities, but yes they are not related.

I am not even sure that this supposed overhead tax of using HPET over TSC is really that impactful on these classes of CPUs. I really hope someone who understands this sees our thread and chimes in.

BTW while reading more in depth on this I see tons of links of older linux kernels struggling with TSC. For whatever that's worth. I didn't need another tangent on this topic.

I will keep an eye on this thread, but I am going to not worry about using HPET as things are functioning fine enough for now, and whatever performance I am missing I am fairly sure I won't get back learning and fixing this on my environment! :).

If you find a fix for your's or learn more please update.
 
  • Like
Reactions: ieronymous

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!