Hello,
I'm seeing an issue where an entire Proxmox server freezes up and becomes unresponsive.
For the last two months the server sat mostly idle and only had one KVM VM on it. The server did not freeze or have any problems during that time. Earlier this week I installed some more KVM VMs on it. Today I find that the server randomly become totally unresponsive. When I attached a keyboard and mouse to the server the screen was totally black and did not respond to anything. The only way to fix the problem was to do a hard restart.
I have checked the logs and there are no signs of issues (I will paste log snippets below). The CPU on the server has been running at about 40-50% usage. For what it's worth the VM that sat idle on the server for two months was Linux and the new VMs were all Windows.
Info & Hardware:
Here is a snipped from the syslog. System looks to have frozen after 16:02:37 and the next entry is at 18:20:59 when I start the server up again
Really lost with this one, so any help is much appreciated.
I'm seeing an issue where an entire Proxmox server freezes up and becomes unresponsive.
For the last two months the server sat mostly idle and only had one KVM VM on it. The server did not freeze or have any problems during that time. Earlier this week I installed some more KVM VMs on it. Today I find that the server randomly become totally unresponsive. When I attached a keyboard and mouse to the server the screen was totally black and did not respond to anything. The only way to fix the problem was to do a hard restart.
I have checked the logs and there are no signs of issues (I will paste log snippets below). The CPU on the server has been running at about 40-50% usage. For what it's worth the VM that sat idle on the server for two months was Linux and the new VMs were all Windows.
Info & Hardware:
- Proxmox 4.4-15/7599e35a
- Kernel: Linux 4.4.67-1-pve #1 SMP PVE 4.4.67-92 (Fri, 23 Jun 2017 08:22:06 +0200)
- Dual Intel E5-2630 v4
- Supermicro X10DRL-i
- 128GB Ram
- 4 x 2TB Samsung 850 Pro + LSI 9271-8i
- Local LVM storage is used for VMs
Code:
root@node1:~# pveversion -v
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.62-1-pve: 4.4.62-88
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
Here is a snipped from the syslog. System looks to have frozen after 16:02:37 and the next entry is at 18:20:59 when I start the server up again
Code:
Aug 10 15:23:42 node1 smartd[2710]: Device: /dev/bus/0 [megaraid_disk_12] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 77 to 76
Aug 10 15:23:42 node1 smartd[2710]: Device: /dev/bus/0 [megaraid_disk_13] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 78 to 77
Aug 10 15:27:12 node1 systemd-timesyncd[2370]: interval/delta/delay/jitter/drift 2048s/+0.000s/0.023s/0.000s/-5ppm
Aug 10 15:32:35 node1 pvedaemon[7102]: <root@pam> successful auth for user 'root@pam'
Aug 10 15:47:36 node1 pvedaemon[12927]: <root@pam> successful auth for user 'root@pam'
Aug 10 16:01:20 node1 systemd-timesyncd[2370]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.023s/0.000s/-5ppm
Aug 10 16:02:37 node1 pvedaemon[7102]: <root@pam> successful auth for user 'root@pam'
Aug 10 18:20:59 node1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2594" x-info="http://www.rsyslog.com"] start
Aug 10 18:20:59 node1 systemd-modules-load[1006]: Module 'fuse' is builtin
Aug 10 18:20:59 node1 systemd-modules-load[1006]: Inserted module 'ipmi_devintf'
Aug 10 18:20:59 node1 systemd-modules-load[1006]: Inserted module 'ipmi_poweroff'
Aug 10 18:20:59 node1 systemd[1]: Started Create Static Device Nodes in /dev.
Aug 10 18:20:59 node1 systemd[1]: Starting udev Kernel Device Manager...
Aug 10 18:20:59 node1 systemd[1]: Started udev Kernel Device Manager.
Aug 10 18:20:59 node1 systemd[1]: Starting LSB: Set preliminary keymap...
Aug 10 18:20:59 node1 systemd[1]: Starting LSB: Tune IDE hard disks...
Aug 10 18:20:59 node1 systemd[1]: Started udev Coldplug all Devices.
Aug 10 18:20:59 node1 systemd[1]: Starting udev Wait for Complete Device Initialization...
Aug 10 18:20:59 node1 systemd[1]: Started LSB: Tune IDE hard disks.
Aug 10 18:20:59 node1 hdparm[1037]: Setting parameters of disc: (none).
Aug 10 18:20:59 node1 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Aug 10 18:20:59 node1 kernel: [ 0.000000] Initializing cgroup subsys cpuset
Aug 10 18:20:59 node1 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Aug 10 18:20:59 node1 kernel: [ 0.000000] Initializing cgroup subsys cpu
Aug 10 18:20:59 node1 kernel: [ 0.000000] Initializing cgroup subsys cpuacct
Aug 10 18:20:59 node1 kernel: [ 0.000000] Linux version 4.4.67-1-pve (root@nora) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP PVE 4.4.67-92 (Fri, 23 Jun 2017 08:22:06 +0200) ()
Aug 10 18:20:59 node1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.67-1-pve root=/dev/mapper/pve-root ro intel_idle.max_cstate=0 processor.max_cstate=1 quiet
Aug 10 18:20:59 node1 kernel: [ 0.000000] KERNEL supported cpus:
Aug 10 18:20:59 node1 systemd[1]: Starting LVM2 PV scan on device 8:3...
Really lost with this one, so any help is much appreciated.