It crashes and reboots when backing up a specific VM (102) and almost always on Saturday or Sunday, though it has happened outside of those days. During the Proxmox backup window, there is almost nothing else going on.
This is an HP Proliant ML350p running the latest firmwares as of 2015-09-21. HP tells me there is no newer firmware than that. They don't see anything in the ahs logs. They have replaced the motherboard and SSDs and the problem remains.
After HP replaced the motherboard, I ran the insight diagnostics for 4 straight days, without error.
This problem has existed for about a year now. I usually just disable the Proxmox backup and use other things (zfs, tar, backuppc). I had the same problem on 3.4.
I just enabled kexec-tools and crash dump so I'll get more info on that after it crashes.
ZFS Root from the installer. ZFS Swap is disabled. Swap is on RAID 1 on SSDs.
The server crashed 2016-03-06 4:04a. Here is the pastebin of the syslog: http://pastebin.com/JeMauzhc
What else do I need?
root@cb-prox1:~# pveversion -v
proxmox-ve: 4.1-37 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-13 (running version: 4.1-13/cfb599fb)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-37
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-32
qemu-server: 4.0-55
pve-firmware: 1.1-7
libpve-common-perl: 4.0-48
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-40
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-5
pve-container: 1.0-44
pve-firewall: 2.0-17
pve-ha-manager: 1.0-21
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 0.13-pve3
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie
This is an HP Proliant ML350p running the latest firmwares as of 2015-09-21. HP tells me there is no newer firmware than that. They don't see anything in the ahs logs. They have replaced the motherboard and SSDs and the problem remains.
After HP replaced the motherboard, I ran the insight diagnostics for 4 straight days, without error.
This problem has existed for about a year now. I usually just disable the Proxmox backup and use other things (zfs, tar, backuppc). I had the same problem on 3.4.
I just enabled kexec-tools and crash dump so I'll get more info on that after it crashes.
ZFS Root from the installer. ZFS Swap is disabled. Swap is on RAID 1 on SSDs.
The server crashed 2016-03-06 4:04a. Here is the pastebin of the syslog: http://pastebin.com/JeMauzhc
What else do I need?
root@cb-prox1:~# pveversion -v
proxmox-ve: 4.1-37 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-13 (running version: 4.1-13/cfb599fb)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-37
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-32
qemu-server: 4.0-55
pve-firmware: 1.1-7
libpve-common-perl: 4.0-48
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-40
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-5
pve-container: 1.0-44
pve-firewall: 2.0-17
pve-ha-manager: 1.0-21
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 0.13-pve3
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie