Marking TSC unstable due to clocksource watchdog

onepamopa · Aug 17, 2022

I'm getting "Marking TSC unstable due to clocksource watchdog" after some hours of uptime.

System is Threadripper 3960x, Asus TRX40-pro with the latest available bios, 8x8G 3200 sticks.

```
[1198064.004082] clocksource: timekeeping watchdog on CPU46: hpet retried 2 times before success
[1198417.504375] clocksource: timekeeping watchdog on CPU33: hpet retried 2 times before success
[1205730.408893] clocksource: timekeeping watchdog on CPU19: hpet retried 2 times before success
[1208535.367247] clocksource: timekeeping watchdog on CPU13: hpet retried 2 times before success
[1210111.861844] clocksource: timekeeping watchdog on CPU46: hpet retried 2 times before success
[1211693.859893] clocksource: timekeeping watchdog on CPU42: hpet retried 2 times before success
[1212112.349543] clocksource: timekeeping watchdog on CPU15: hpet retried 2 times before success
[1212222.331692] clocksource: timekeeping watchdog on CPU43: hpet retried 2 times before success
[1216504.296297] clocksource: timekeeping watchdog on CPU15: hpet retried 2 times before success
[1219421.252785] clocksource: timekeeping watchdog on CPU41: hpet retried 2 times before success
[1220879.219092] clocksource: timekeeping watchdog on CPU29: hpet retried 3 times before success
[1223252.214573] clocksource: timekeeping watchdog on CPU23: hpet retried 2 times before success
[1223270.710066] clocksource: timekeeping watchdog on CPU12: hpet retried 2 times before success
[1223329.717788] clocksource: timekeeping watchdog on CPU34: hpet retried 2 times before success
[1223361.205306] clocksource: timekeeping watchdog on CPU1: hpet retried 2 times before success
[1223394.196860] clocksource: timekeeping watchdog on CPU19: hpet retried 2 times before success
[1225092.672343] clocksource: timekeeping watchdog on CPU8: hpet retried 2 times before success
[1226989.672841] clocksource: timekeeping watchdog on CPU10: hpet retried 2 times before success
[1229157.645192] clocksource: timekeeping watchdog on CPU26: hpet retried 2 times before success
[1229317.647249] clocksource: timekeeping watchdog on CPU10: hpet retried 2 times before success
[1230445.628034] clocksource: timekeeping watchdog on CPU10: hpet retried 2 times before success
[1230577.627173] clocksource: timekeeping watchdog on CPU34: hpet retried 2 times before success
[1231853.612183] clocksource: timekeeping watchdog on CPU42: hpet retried 2 times before success
[1236160.536165] clocksource: timekeeping watchdog on CPU16: hpet retried 2 times before success
[1236387.541082] clocksource: timekeeping watchdog on CPU38: hpet retried 2 times before success
[1237260.522970] clocksource: timekeeping watchdog on CPU8: hpet retried 2 times before success
[1238370.525093] clocksource: timekeeping watchdog on CPU20: hpet retried 2 times before success
[1238988.505826] clocksource: timekeeping watchdog on CPU8: hpet retried 2 times before success
[1239261.523627] clocksource: timekeeping watchdog on CPU26: hpet retried 2 times before success
[1239493.008794] clocksource: timekeeping watchdog on CPU9: hpet retried 3 times before success
[1240244.488131] clocksource: timekeeping watchdog on CPU24: hpet retried 2 times before success
[1240439.493726] clocksource: timekeeping watchdog on CPU30: hpet retried 2 times before success
[1241095.485836] clocksource: timekeeping watchdog on CPU46: hpet read-back delay of 118590ns, attempt 4, marking unstable
[1241095.485844] tsc: Marking TSC unstable due to clocksource watchdog
[1241095.485854] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[1241095.485856] sched_clock: Marking unstable (1241111549654842, -16063798926)<-(1241095594330567, -108480761)
[1241095.486154] clocksource: Switched to clocksource hpet
```

After this occurs - the VMs start working very sluggish.

PVE 6.4-15
```
proxmox-ve: 6.4-1 (running kernel: 5.4.195-1-pve) pve-manager: 6.4-15 (running version: 6.4-15/af7986e6) pve-kernel-5.4: 6.4-19 pve-kernel-helper: 6.4-19 pve-kernel-5.3: 6.1-6 pve-kernel-5.0: 6.0-11 pve-kernel-5.4.195-1-pve: 5.4.195-1 pve-kernel-5.4.189-2-pve: 5.4.189-2 pve-kernel-5.4.178-1-pve: 5.4.178-1 pve-kernel-5.4.174-2-pve: 5.4.174-2 pve-kernel-5.4.166-1-pve: 5.4.166-1 pve-kernel-5.4.162-1-pve: 5.4.162-2 pve-kernel-5.4.157-1-pve: 5.4.157-1 pve-kernel-5.4.143-1-pve: 5.4.143-1 pve-kernel-5.4.140-1-pve: 5.4.140-1 pve-kernel-5.4.128-1-pve: 5.4.128-2 pve-kernel-5.4.124-1-pve: 5.4.124-2 pve-kernel-5.4.119-1-pve: 5.4.119-1 pve-kernel-5.4.114-1-pve: 5.4.114-1 pve-kernel-5.4.106-1-pve: 5.4.106-1 pve-kernel-5.4.103-1-pve: 5.4.103-1 pve-kernel-5.4.101-1-pve: 5.4.101-1 pve-kernel-5.4.98-1-pve: 5.4.98-1 pve-kernel-5.4.78-2-pve: 5.4.78-2 pve-kernel-5.4.78-1-pve: 5.4.78-1 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.0.21-5-pve: 5.0.21-10 pve-kernel-5.0.15-1-pve: 5.0.15-1 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.5-pve2~bpo10+1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: not correctly installed ifupdown2: 3.0.0-1+pve4~bpo10 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.22-pve2~bpo10+1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.1.0-1 libpve-access-control: 6.4-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-5 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-5 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 openvswitch-switch: 2.12.3-1 proxmox-backup-client: 1.1.14-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.6-2 pve-cluster: 6.4-1 pve-container: 3.3-6 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.3-2 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-8 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.7-pve1
```

onepamopa · Aug 17, 2022

Updated kernel to 5.11.22-5-pve to see if there'd be any difference ....

janssensm · Aug 21, 2022

Hi, pve 6.4 is EOL. see [0]

So you would better plan to upgrade to 7.x.
Current is proxmox-ve: 7.2-1 (running kernel: 5.15.39-1-pve)

There also have been some clocksource related changes in 5.15, see [1].

[0] https://forum.proxmox.com/threads/proxmox-ve-support-lifecycle.35755/post-175311
[1] https://forum.proxmox.com/threads/o...r-proxmox-ve-7-x-available.100936/post-463986

onepamopa · Aug 21, 2022

janssensm said:
Hi, pve 6.4 is EOL. see [0]

So you would better plan to upgrade to 7.x.
Current is proxmox-ve: 7.2-1 (running kernel: 5.15.39-1-pve)

There also have been some clocksource related changes in 5.15, see [1].

[0] https://forum.proxmox.com/threads/proxmox-ve-support-lifecycle.35755/post-175311
[1] https://forum.proxmox.com/threads/o...r-proxmox-ve-7-x-available.100936/post-463986

I know 6.4 is EOL, I haven't had time to upgrade mostly because I'm also planning on removing the current PVE install (a HDD) and replacing with a pair of SSD's in raid 0. The main concern here is that my VMs storage is LVM on an nvme ssd and I have no idea if that LVM would be immediately visible/available upon installing a fresh PVE 7. I also have a "directory" disks that I know how to transition (by copying the relevant configs from the old to the new install), but no idea on the LVM thing... Any ideas?

janssensm · Aug 21, 2022

Ah, ok. The best way to guarantee you're safe is by maintaining backups on another physical system, f.e. with a pbs instance. Backup and restoring is easy and preserves all settings.
If you don't have that in place and/or no other redundancy (mirrored disks f.e., not raid0) there will always be a big question if you are able to get your system up and running again in case of failure (could also be a hardware failure).
Sometimes it's also easier and quicker to install pve from scratch and restoring all vm's from backup, especially when main storage layout changes and f.e. motherboard, instead of fiddling with the current setup.

From the admin guide:

Proxmox VE uses a rolling release model and using the latest stable version is always recommended.

So although there are major release numbers, largely the system can still be seen as rolling, so no need to postpone upgrading too long.
If in doubt you could use a nested pve 6.4 install with cpu type host and see if you can simulate your current experience. And then upgrading it to 7.x.

On the lvm part, I would assume that on a fresh install the lvm groups and volumes would be scanned automatically , and thus found in <node> -> Disks -> LVM. As long as your nvme drive is detected of course.
But you probably would have to add the storage again in your Datacenter -> Storage.
So documenting and backing up your current config is also wise (simulating the upgrade in a nested install will help too).

I had a 6.4 nested install laying around and spun that up, and saw that pve-kernel-5.11 package is the latest, so 5.15 was only made available for 7.x.
Perhaps @t.lamprecht can tell you how to get the current 5.15 kernel installed in 6.4, and if that is a reasonable way for testing your setup in your path to 7.x.

Search

Search

Marking TSC unstable due to clocksource watchdog

onepamopa

Well-Known Member

onepamopa

Well-Known Member

janssensm

Famous Member

onepamopa

Well-Known Member

janssensm

Famous Member

We value your privacy