Server 100% CPU and cannot access storage

mikiudon

New Member
Apr 13, 2022
14
1
3
Hi guys, server went 100% CPU, all VMs unable to access - though it still was active. Seems the network gave up. Pls see log below.

Code:
Jun  7 22:51:49 imm4 pvestatd[1691]: status update time (12.329 seconds)
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable
Jun  7 22:51:53 imm4 systemd[1]: session-430.scope: Succeeded.
Jun  7 22:51:53 imm4 systemd[1]: session-430.scope: Consumed 1min 34.922s CPU time.
Jun  7 22:51:54 imm4 systemd[1]: Started Session 431 of user root.
Jun  7 22:51:54 imm4 systemd[1]: session-431.scope: Succeeded.
Jun  7 22:51:55 imm4 pvestatd[1691]: storage 'vmware2' is not online
Jun  7 22:51:57 imm4 systemd[1]: Started Session 432 of user root.
Jun  7 22:51:58 imm4 pvestatd[1691]: storage 'nfstb4' is not online
Jun  7 22:51:58 imm4 systemd[1]: session-432.scope: Succeeded.
Jun  7 22:51:58 imm4 systemd[1]: session-432.scope: Consumed 1.632s CPU time.
Jun  7 22:51:58 imm4 pmxcfs[1556]: [status] notice: received log
Jun  7 22:52:00 imm4 kernel: [756723.914202] nfs: server 10.254.10.211 not responding, timed out
Jun  7 22:52:01 imm4 pvestatd[1691]: storage 'smb' is not online
Jun  7 22:52:01 imm4 pvestatd[1691]: status update time (12.013 seconds)
Jun  7 22:52:04 imm4 pvestatd[1691]: storage 'nfstb4' is not online
Jun  7 22:52:08 imm4 systemd[1]: Stopping User Manager for UID 0...
Jun  7 22:52:08 imm4 systemd[3114009]: Stopped target Main User Target.
Jun  7 22:52:08 imm4 systemd[3114009]: Stopped target Basic System.
Jun  7 22:52:08 imm4 systemd[3114009]: Stopped target Paths.
Jun  7 22:52:08 imm4 systemd[3114009]: Stopped target Sockets.
Jun  7 22:52:08 imm4 systemd[3114009]: Stopped target Timers.
Jun  7 22:52:08 imm4 systemd[3114009]: dirmngr.socket: Succeeded.
Jun  7 22:52:08 imm4 systemd[3114009]: Closed GnuPG network certificate management daemon.
Jun  7 22:52:08 imm4 systemd[3114009]: gpg-agent-browser.socket: Succeeded.
Jun  7 22:52:08 imm4 systemd[3114009]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun  7 22:52:08 imm4 systemd[3114009]: gpg-agent-extra.socket: Succeeded.
Jun  7 22:52:08 imm4 systemd[3114009]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jun  7 22:52:08 imm4 systemd[3114009]: gpg-agent-ssh.socket: Succeeded.
Jun  7 22:52:08 imm4 systemd[3114009]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jun  7 22:52:08 imm4 systemd[3114009]: gpg-agent.socket: Succeeded.
Jun  7 22:52:08 imm4 systemd[3114009]: Closed GnuPG cryptographic agent and passphrase cache.
Jun  7 22:52:08 imm4 systemd[3114009]: Removed slice User Application Slice.
Jun  7 22:52:08 imm4 systemd[3114009]: Reached target Shutdown.
Jun  7 22:52:08 imm4 systemd[3114009]: systemd-exit.service: Succeeded.
Jun  7 22:52:08 imm4 systemd[3114009]: Finished Exit the Session.
Jun  7 22:52:08 imm4 systemd[3114009]: Reached target Exit the Session.
Jun  7 22:52:08 imm4 systemd[1]: user@0.service: Succeeded.
Jun  7 22:52:08 imm4 systemd[1]: Stopped User Manager for UID 0.
Jun  7 22:52:08 imm4 systemd[1]: Stopping User Runtime Directory /run/user/0...
Jun  7 22:52:08 imm4 systemd[1]: run-user-0.mount: Succeeded.
Jun  7 22:52:08 imm4 systemd[1]: user-runtime-dir@0.service: Succeeded.
Jun  7 22:52:08 imm4 systemd[1]: Stopped User Runtime Directory /run/user/0.
Jun  7 22:52:08 imm4 systemd[1]: Removed slice User Slice of UID 0.
Jun  7 22:52:08 imm4 systemd[1]: user-0.slice: Consumed 1min 39.806s CPU time.

After this log, all the network went down and storage complained not responsive

Code:
Jun  7 22:52:10 imm4 kernel: [756733.642238] nfs: server 10.254.10.211 not responding, timed out
Jun  7 22:52:10 imm4 pvestatd[1691]: storage 'vmware2' is not online
Jun  7 22:52:13 imm4 pvestatd[1691]: storage 'smb' is not online
Jun  7 22:52:14 imm4 pvestatd[1691]: status update time (12.529 seconds)
Jun  7 22:52:18 imm4 kernel: [756741.582281] nfs: server 10.254.10.211 not responding, timed out
Jun  7 22:52:19 imm4 pvestatd[1691]: storage 'vmware2' is not online
Jun  7 22:52:23 imm4 pvestatd[1691]: storage 'nfstb4' is not online

Then this

Code:
Jun  7 23:05:44 imm4 pvestatd[1691]: VM 163 qmp command failed - VM 163 qmp command 'query-proxmox-support' failed - unable to connect to VM 163 qmp socket - timeout after 31 retries
Jun  7 23:05:47 imm4 pvestatd[1691]: VM 171 qmp command failed - VM 171 qmp command 'query-proxmox-support' failed - got timeout
Jun  7 23:05:50 imm4 pvestatd[1691]: storage 'smb' is not online
Jun  7 23:05:53 imm4 pvestatd[1691]: got timeout
Jun  7 23:05:53 imm4 pvestatd[1691]: unable to activate storage 't16v2' - directory '/mnt/pve/t16v2' does not exist or is unreachable
Jun  7 23:05:54 imm4 kernel: [757556.941891] nfs: server 10.254.10.211 not responding, timed out
Jun  7 23:05:54 imm4 pvestatd[1691]: storage 'nfstb4' is not online
Jun  7 23:06:00 imm4 pvestatd[1691]: storage 'vmware2' is not online
Jun  7 23:06:00 imm4 pvestatd[1691]: status update time (21.502 seconds)
Jun  7 23:06:03 imm4 kernel: [757566.413892] nfs: server 10.254.10.211 not responding, timed out
Jun  7 23:06:08 imm4 kernel: [757571.533927] nfs: server 10.254.10.211 not responding, timed out
Jun  7 23:06:12 imm4 pvestatd[1691]: VM 130 qmp command failed - VM 130 qmp command 'query-proxmox-support' failed - got timeout
Jun  7 23:06:13 imm4 kernel: [757576.909969] nfs: server 10.254.10.211 not responding, timed out
Jun  7 23:06:15 imm4 pvestatd[1691]: VM 171 qmp command failed - VM 171 qmp command 'query-proxmox-support' failed - unable to connect to VM 171 qmp socket - timeout after 31 retries
Jun  7 23:06:18 imm4 pvestatd[1691]: VM 163 qmp command failed - VM 163 qmp command 'query-proxmox-support' failed - unable to connect to VM 163 qmp socket - timeout after 31 retries
Jun  7 23:06:19 imm4 kernel: [757582.030002] nfs: server 10.254.10.211 not responding, timed out
Jun  7 23:06:21 imm4 pvestatd[1691]: storage 'smb' is not online
Jun  7 23:06:23 imm4 pvestatd[1691]: got timeout
Jun  7 23:06:23 imm4 pvestatd[1691]: unable to activate storage 't16v2' - directory '/mnt/pve/t16v2' does not exist or is unreachable
Jun  7 23:06:24 imm4 kernel: [757587.153989] nfs: server 10.254.10.211 not responding, timed out
Jun  7 23:06:24 imm4 pvestatd[1691]: storage 'nfstb4' is not online
Jun  7 23:06:30 imm4 pvestatd[1691]: storage 'vmware2' is not online
Jun  7 23:06:30 imm4 pvestatd[1691]: status update time (30.726 seconds)
Jun  7 23:06:34 imm4 kernel: [757597.390033] nfs: server 10.254.10.211 not responding, timed out
Jun  7 23:06:35 imm4 pmxcfs[1556]: [status] notice: received log
Jun  7 23:06:39 imm4 kernel: [757602.510081] nfs: server 10.254.10.211 not responding, timed out
Jun  7 23:06:43 imm4 pvestatd[1691]: VM 130 qmp command failed - VM 130 qmp command 'query-proxmox-support' failed - unable to connect to VM 130 qmp socket - timeout after 31 retries

Tried to 'reboot' and it took a while because all of these weren't responding. When it came back up, it booted all the VMs and it looks fine now.

I have 2 more hosts which are on the same settings but all good during this time.

Any idea what's going which caused it to down the network?

pve-manager/7.2-3/c743d6c1
Server has 512GB RAM, 56 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2 Sockets)

Ignore the messages nfstb4, vmware2,smb not online ... those are old storage which I am meaning to clean up. The real one is t16v2. NFS backend to Truenas.

The ones I would like to understand is

Jun 7 22:51:53 imm4 QEMU[3114093]: kvm: warning: TSC frequency mismatch between VM (2499999 kHz) and host (2596991 kHz), and TSC scaling unavailable


How come this one will then trigger a network collapse and stop all user services etc?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!