Proxmox Host Freeze Up

CawlidgeHawkey

New Member
Apr 25, 2017
9
0
1
30
Hello,

I'm seeing an issue where an entire Proxmox server freezes up and becomes unresponsive.

For the last two months the server sat mostly idle and only had one KVM VM on it. The server did not freeze or have any problems during that time. Earlier this week I installed some more KVM VMs on it. Today I find that the server randomly become totally unresponsive. When I attached a keyboard and mouse to the server the screen was totally black and did not respond to anything. The only way to fix the problem was to do a hard restart.

I have checked the logs and there are no signs of issues (I will paste log snippets below). The CPU on the server has been running at about 40-50% usage. For what it's worth the VM that sat idle on the server for two months was Linux and the new VMs were all Windows.

Info & Hardware:
  • Proxmox 4.4-15/7599e35a
  • Kernel: Linux 4.4.67-1-pve #1 SMP PVE 4.4.67-92 (Fri, 23 Jun 2017 08:22:06 +0200)
  • Dual Intel E5-2630 v4
  • Supermicro X10DRL-i
  • 128GB Ram
  • 4 x 2TB Samsung 850 Pro + LSI 9271-8i
  • Local LVM storage is used for VMs
Here are the settings for the Windows VMs - https://gyazo.com/28cb009d1e996180840a6da930bab5a2

Code:
root@node1:~# pveversion -v
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.62-1-pve: 4.4.62-88
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80

Here is a snipped from the syslog. System looks to have frozen after 16:02:37 and the next entry is at 18:20:59 when I start the server up again

Code:
Aug 10 15:23:42 node1 smartd[2710]: Device: /dev/bus/0 [megaraid_disk_12] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 77 to 76
Aug 10 15:23:42 node1 smartd[2710]: Device: /dev/bus/0 [megaraid_disk_13] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 78 to 77
Aug 10 15:27:12 node1 systemd-timesyncd[2370]: interval/delta/delay/jitter/drift 2048s/+0.000s/0.023s/0.000s/-5ppm
Aug 10 15:32:35 node1 pvedaemon[7102]: <root@pam> successful auth for user 'root@pam'
Aug 10 15:47:36 node1 pvedaemon[12927]: <root@pam> successful auth for user 'root@pam'
Aug 10 16:01:20 node1 systemd-timesyncd[2370]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.023s/0.000s/-5ppm
Aug 10 16:02:37 node1 pvedaemon[7102]: <root@pam> successful auth for user 'root@pam'
Aug 10 18:20:59 node1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2594" x-info="http://www.rsyslog.com"] start
Aug 10 18:20:59 node1 systemd-modules-load[1006]: Module 'fuse' is builtin
Aug 10 18:20:59 node1 systemd-modules-load[1006]: Inserted module 'ipmi_devintf'
Aug 10 18:20:59 node1 systemd-modules-load[1006]: Inserted module 'ipmi_poweroff'
Aug 10 18:20:59 node1 systemd[1]: Started Create Static Device Nodes in /dev.
Aug 10 18:20:59 node1 systemd[1]: Starting udev Kernel Device Manager...
Aug 10 18:20:59 node1 systemd[1]: Started udev Kernel Device Manager.
Aug 10 18:20:59 node1 systemd[1]: Starting LSB: Set preliminary keymap...
Aug 10 18:20:59 node1 systemd[1]: Starting LSB: Tune IDE hard disks...
Aug 10 18:20:59 node1 systemd[1]: Started udev Coldplug all Devices.
Aug 10 18:20:59 node1 systemd[1]: Starting udev Wait for Complete Device Initialization...
Aug 10 18:20:59 node1 systemd[1]: Started LSB: Tune IDE hard disks.
Aug 10 18:20:59 node1 hdparm[1037]: Setting parameters of disc: (none).
Aug 10 18:20:59 node1 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Aug 10 18:20:59 node1 kernel: [    0.000000] Initializing cgroup subsys cpuset
Aug 10 18:20:59 node1 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Aug 10 18:20:59 node1 kernel: [    0.000000] Initializing cgroup subsys cpu
Aug 10 18:20:59 node1 kernel: [    0.000000] Initializing cgroup subsys cpuacct
Aug 10 18:20:59 node1 kernel: [    0.000000] Linux version 4.4.67-1-pve (root@nora) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP PVE 4.4.67-92 (Fri, 23 Jun 2017 08:22:06 +0200) ()
Aug 10 18:20:59 node1 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.67-1-pve root=/dev/mapper/pve-root ro intel_idle.max_cstate=0 processor.max_cstate=1 quiet
Aug 10 18:20:59 node1 kernel: [    0.000000] KERNEL supported cpus:
Aug 10 18:20:59 node1 systemd[1]: Starting LVM2 PV scan on device 8:3...

Really lost with this one, so any help is much appreciated.
 
To me it sounds like a hardware issue. I would make sure the CPU/Case/PS fans are working normally, and run memtest a few hours (without any hard drives or HBA controller connected). If it's fine, try swapping out the power supply. If it still freezes, you could have a bad motherboard or HBA card. Do you have any HBA or MB spares?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!