Host Randomly reboots

Nicholas Barraco

New Member
May 9, 2017
4
0
1
37
Greetings,

We have an issue with one of our hosts. Its a brand new system running the latest version. The host itself runs Intel i3, 8GB of memory, 2 WD HDD (Running ZFS Raid1). There are currently 2 VMS running on the machine. pfsense box with 512MB of Memory and a Windows 7 x64 with 4GB of Memory allocated. We had another linux box, but it did not work with what we needed.... So we installed Windows. All the issues started when introducing Windows.

proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
lvm2: 2.02.116-pve3 corosync-pve: 2.4.0-1
libqb0: 1.0-1 pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1 pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80

Checking into the /var/logs/syslog - Nothing really stands out.... At least to me. Here is a snip it before it rebooted.


May 9 06:24:57 HV01 pvestatd[2367]: No balloon device has been activated
May 9 06:24:57 HV01 pvestatd[2367]: No balloon device has been activated
May 9 06:25:01 HV01 CRON[5818]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
May 9 06:25:02 HV01 systemd[1]: Stopping PVE API Proxy Server...
May 9 06:25:03 HV01 pveproxy[2392]: received signal TERM
May 9 06:25:03 HV01 pveproxy[2392]: server closing
May 9 06:25:03 HV01 pveproxy[2393]: worker exit
May 9 06:25:03 HV01 pveproxy[2394]: worker exit
May 9 06:25:03 HV01 pveproxy[2395]: worker exit
May 9 06:25:03 HV01 pveproxy[2392]: worker 2395 finished
May 9 06:25:03 HV01 pveproxy[2392]: worker 2394 finished
May 9 06:25:03 HV01 pveproxy[2392]: worker 2393 finished
May 9 06:25:03 HV01 pveproxy[2392]: server stopped
May 9 06:25:04 HV01 systemd[1]: Starting PVE API Proxy Server...
May 9 06:25:05 HV01 pveproxy[5938]: starting server
May 9 06:25:05 HV01 pveproxy[5938]: starting 3 worker(s)
May 9 06:25:05 HV01 pveproxy[5938]: worker 5939 started
May 9 06:25:05 HV01 pveproxy[5938]: worker 5940 started
May 9 06:25:05 HV01 pveproxy[5938]: worker 5941 started
May 9 06:25:05 HV01 systemd[1]: Started PVE API Proxy Server.
May 9 06:25:05 HV01 systemd[1]: Stopping PVE SPICE Proxy Server...
May 9 06:25:05 HV01 spiceproxy[2402]: received signal TERM
May 9 06:25:05 HV01 spiceproxy[2402]: server closing
May 9 06:25:05 HV01 spiceproxy[2403]: worker exit

Yesterday, I thought it had something to do with Memory management, and disabled ballooning. (Which kept the system stable for 16 hours or so)
 
your running kernel: 4.4.35-1-pve is affected by a major bug regarding memory management, upgrade to latest.
 
Running apt-get update and apt-get upgrade. Doesn't look like there is a kernel update. Do I need to add something to the repos?

Thank you for your help!
 
Updated...
root@HV01:~# uname -a
Linux HV01 4.4.59-1-pve #1 SMP PVE 4.4.59-87 (Tue, 25 Apr 2017 09:01:58 +0200) x86_64 GNU/Linux

However, we are experiencing the Same issue. Any more suggestions?
 
Greetings,

We have an issue with one of our hosts. Its a brand new system running the latest version. The host itself runs Intel i3, 8GB of memory, 2 WD HDD (Running ZFS Raid1). There are currently 2 VMS running on the machine. pfsense box with 512MB of Memory and a Windows 7 x64 with 4GB of Memory allocated. ....

You have 8 GB ram, by default 50 % will be used for ZFS. If you assign 4 GB to a windows VM, you are in trouble.

=> Limit ZFS memory usage

See also:

https://pve.proxmox.com/wiki/ZFS_on_Linux
 
Last edited: