Host crash during backups to NFS

rwadi

New Member
Aug 5, 2013
8
0
1
Long time reader, first time poster. So I have an issue with one of our ProxmoxVE hosts where it will randomly kernel panic during the nightly backups to an NFS server. I have no idea what is causing the problem.......

Here is the information I can give you:

1) I have two identical hosts (running standalone), one host is fine, the other randomly kernel panic during backups - usually about 10 minutes in. It seems stable apart from that.
2) Does not happen every time, seems to be once or twice a months (sometimes more often)
3) Host is running ProxmoxVE 2.3 (see pveversion output below)
4) Host is running Linux software raid1 for /boot and raid5 for pve lvm
5) I have had host backing up to a Debian NFS server, and to Synology RackStation, same results.
6) Host is running and backing up OpenVZ containers and KVM virtual machines
7) I don't see any useful information in any of the logs after the machine automatically reboots

As mentioned this issue seems to only happen on one of these identical hosts, and it does not happen on any of our other 7 hosts which are all running ProxmoxVE.

Any ideas what could be causing this? Let me know if there is any additional information that would be helpful.

root@host1:~# pveversion --verbose
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-96
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1
 
Hi Dietmar,

These hosts were extensively memtested before being deployed to production. However I have moved all VM's off the host in question and am running memtest over it again. Will let you know how that goes.

Thanks for your reply.
 
Hi Dietmar,

Thanks for the reply. I have taken the host out of production use and memtest is running, so far all is good, no errors.....

Any other thoughts?
 
Are you using snapshots for live backup? You may try and drop the limit for vzdump to see if helps? No log info for raid issues?
 
Yes, I am using live snapshots. What exactly do you mean drop the limit for vzdump? Are you meaning to drop the max backups for the storage point? This is currently set to 14 and the backups run daily.

No issues with the RAID that I have seen...

RAM test has been running for 6 days now and no issues, guessing it is not that...
 
I was just thinking you could slow the backups in the vzdump.conf to see if it is an i/o issue. Is it always the same vm that is panics during backup?
 
Yes, I believe that the machine always panics when backing up the same CT, of course it does this only once a month or so.

I just brought the host up again and have upgraded to PVE 3, maybe that will help.....

One thing that I did notice in the config of the CT in question is that it is the only ct/vm that is using a different bridge for it's network. Management and all other ct/vms are using vmbr0 which is bridged to eth0 and has the systems IP etc, which this one ct is using vmbr1 which is bridged to eth1. eth0 is the machines built-in NIC (rtl), while eth1 is a PCI-e add-on card and is an Intel PRO/1000 card.
 
I can't imagine the diff eth would cause issues unless the card has hardware problem. Intel Pro are pretty solid in my exp. Maybe reseat the card, verify speed/duplex with ethtool or the like.
You will like 3 better anyway :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!