New kernel backup again crashes

bazzi

Active Member
Jun 4, 2010
107
2
36
Hi there,

Again with the new kernel the backup will stall, but this time the compute module will not responde so the HA KVM will migrate to other servers. So far so good, but it will crash one one backup of a KVM HA we will get failed startup of the HA process because the VM is locked by the backup process.
So the auto start on a other server will not work!

This is a real problem!

Code:
[COLOR=#000000][FONT=tahoma]INFO: starting new backup job: vzdump --quiet 1 --mailto proxmox2@xxxxxxxxx.nl --mode snapshot --compress lzo --maxfiles 2 --storage backup_day --all 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 103 (openvz)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: CTID 103 exist unmounted down[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = stopped[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: stop[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backup_day/dump/vzdump-openvz-103-2012_06_01-03_45_01.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 1199616000 (1.2GiB, 7.7MiB/s)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: archive file size: 484MB[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: delete old backup '/mnt/pve/backup_day/dump/vzdump-openvz-103-2012_05_28-03_55_49.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Finished Backup of VM 103 (00:02:52)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 104 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = stopped[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: stop[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backup_day/dump/vzdump-qemu-104-2012_06_01-03_47_53.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/backup_day/dump/vzdump-qemu-104-2012_06_01-03_47_53.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/dev/vmdisks/vm-104-disk-1' to archive ('vm-disk-ide0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 34359740928 (14.58 MiB/s)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: archive file size: 7.48GB[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: delete old backup '/mnt/pve/backup_day/dump/vzdump-qemu-104-2012_05_28-04_16_33.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Finished Backup of VM 104 (00:39:15)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 200 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-sint-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backup_day/dump/vzdump-qemu-200-2012_06_01-04_27_08.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/backup_day/dump/vzdump-qemu-200-2012_06_01-04_27_08.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/dev/vmdisks/vzsnap-sint-0' to archive ('vm-disk-virtio0.raw')[/FONT][/COLOR]

Then it is unresponsive an then we will get:
Code:
[COLOR=#000000][FONT=tahoma]task started by HA resource agent[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK ERROR: VM is locked (backup)[/FONT][/COLOR]
multiple times on all our hosts.

We have to do manualy:
Code:
qm unlock 200
and then it will start, but then the HA is useless.

Also is it possible to downgrade the kernel? to the "old" of 2.1, that was really stable?

It crashes on multiple IMS compute modules as host and the are in short:
Code:
RAM usageTotal: 23.48GB
Used: 6.20GB
CPUs
16 x Intel(R) Xeon(R) CPU L5630 @ 2.13GHz
PVE Manager version
pve-manager/2.1-1/f9b0f63a
Kernel version
Linux 2.6.32-12-pve #1 SMP Tue May 15 06:02:20 CEST 2012
It crashes always on a HA KVM with storage in the LVM pool.


the only thing in /var/log/message is:
sint kernel: igb_rx:HBO bit set..
and that is the gigabit driver

root@sint:~# pveversion -vpve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-12-pve
proxmox-ve-2.6.32: 2.1-68
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-16
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

we have the problem on multiple hosts...
 
Last edited:
Also is it possible to downgrade the kernel? to the "old" of 2.1, that was really stable?
...

yes, you can just boot the old kernel (as far as I see you still have pve-kernel-2.6.32-7-pve installed, so the bootloader will still offer this one).
 
yes, you can just boot the old kernel (as far as I see you still have pve-kernel-2.6.32-7-pve installed, so the bootloader will still offer this one).
I just installed 2.6.32.11 and rebooted and the backup still runs, I will post the result.
The
sint kernel: igb_rx:HBO bit set..
is gone from /var/log/messages

But the lock for HA KVM is a real problem because it makes the HA useless if there was a problem during a backup...
 
Last edited:
Ok the backups finish without a problem. So the New kernel crashes kvm backups on a IMS.

Also the HA feature is broken if its go wrong during a backup. So 2 major problems, but the work around is use the old 2.6.32.11 kernel!
 
Ok multiple backup's alter I can confirm it runs again smooth. So there is really a problem with the 2.6.32.12 kernel on IMS.
 
Offcourse!

----------------------------------------------------------------
[h=1]Firmware[/h][h=3]Current Build Version: 10.4.100.20110602.29753[/h]
Firmware Inventory


[TABLE="class: db_table, width: 1131"]
[TR]
[TH="bgcolor: #CAE8EA, align: left"]Component[/TH]
[TH="bgcolor: #CAE8EA, align: left"]Subsystem[/TH]
[TH="bgcolor: #CAE8EA, align: left"]Status[/TH]
[TH="bgcolor: #CAE8EA, align: left"]Current Version[/TH]
[/TR]
[TR]
[TD="class: even, bgcolor: #F5FAFA"]Server 1[/TD]
[TD="class: even, bgcolor: #F5FAFA"]BMC Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.26.3[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]BMC Boot[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]0.28[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]BIOS[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]S5500.86B.01.20.0055.050420111308[/TD]
[/TR]
[TR]
[TD="class: odd"]Server 2[/TD]
[TD="class: odd"]BMC Firmware[/TD]
[TD="class: noCheckBox odd"]ok[/TD]
[TD="class: odd"]1.26.3[/TD]
[/TR]
[TR]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]BMC Boot[/TD]
[TD="class: noCheckBox odd"]ok[/TD]
[TD="class: odd"]0.28[/TD]
[/TR]
[TR]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]BIOS[/TD]
[TD="class: noCheckBox odd"]ok[/TD]
[TD="class: odd"]S5500.86B.01.20.0055.050420111308[/TD]
[/TR]
[TR]
[TD="class: even, bgcolor: #F5FAFA"]Server 3[/TD]
[TD="class: even, bgcolor: #F5FAFA"]BMC Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.26.3[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]BMC Boot[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]0.28[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]BIOS[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]S5500.86B.01.20.0055.050420111308[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: odd"]Server 4[/TD]
[TD="class: odd"]BMC Firmware[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]BMC Boot[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]BIOS[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: even, bgcolor: #F5FAFA"]Server 5[/TD]
[TD="class: even, bgcolor: #F5FAFA"]BMC Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]not present[/TD]
[TD="class: even, bgcolor: #F5FAFA"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]BMC Boot[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]not present[/TD]
[TD="class: even, bgcolor: #F5FAFA"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]BIOS[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]not present[/TD]
[TD="class: even, bgcolor: #F5FAFA"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: odd"]Server 6[/TD]
[TD="class: odd"]BMC Firmware[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]BMC Boot[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]BIOS[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR]
[TD="class: even, bgcolor: #F5FAFA"]Switch 1[/TD]
[TD="class: even, bgcolor: #F5FAFA"]Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.0.0.27[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]Boot[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.0.0.6[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: odd"]Switch 2[/TD]
[TD="class: odd"]Firmware[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]Boot[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR]
[TD="class: even, bgcolor: #F5FAFA"]Storage Control Module 1[/TD]
[TD="class: even, bgcolor: #F5FAFA"]Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]3.08.0140.08[/TD]
[/TR]
[TR="class: notPresent"]
[TD="class: odd"]Storage Control Module 2[/TD]
[TD="class: odd"]Firmware[/TD]
[TD="class: noCheckBox odd"]not present[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR]
[TD="class: even, bgcolor: #F5FAFA"]System Fan 1[/TD]
[TD="class: even, bgcolor: #F5FAFA"]Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.2[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]Boot[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.2[/TD]
[/TR]
[TR]
[TD="class: odd"]System Fan 2[/TD]
[TD="class: odd"]Firmware[/TD]
[TD="class: noCheckBox odd"]ok[/TD]
[TD="class: odd"]1.2[/TD]
[/TR]
[TR]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]Boot[/TD]
[TD="class: noCheckBox odd"]ok[/TD]
[TD="class: odd"]1.2[/TD]
[/TR]
[TR]
[TD="class: even, bgcolor: #F5FAFA"]I/O Fan[/TD]
[TD="class: even, bgcolor: #F5FAFA"]Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.2[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]Boot[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.2[/TD]
[/TR]
[TR]
[TD="class: odd"]Power Supply 1[/TD]
[TD="class: odd"]Firmware[/TD]
[TD="class: noCheckBox odd"]not applicable[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]Boot[/TD]
[TD="class: noCheckBox odd"]not applicable[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR]
[TD="class: even, bgcolor: #F5FAFA"]Power Supply 2[/TD]
[TD="class: even, bgcolor: #F5FAFA"]Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]not applicable[/TD]
[TD="class: even, bgcolor: #F5FAFA"]--[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]Boot[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]not applicable[/TD]
[TD="class: even, bgcolor: #F5FAFA"]--[/TD]
[/TR]
[TR]
[TD="class: odd"]Power Supply 3[/TD]
[TD="class: odd"]Firmware[/TD]
[TD="class: noCheckBox odd"]not applicable[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR]
[TD="class: rowspan odd"][/TD]
[TD="class: odd"]Boot[/TD]
[TD="class: noCheckBox odd"]not applicable[/TD]
[TD="class: odd"]--[/TD]
[/TR]
[TR]
[TD="class: even, bgcolor: #F5FAFA"]Power Supply Blank 4[/TD]
[TD="class: even, bgcolor: #F5FAFA"]Firmware[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.2[/TD]
[/TR]
[TR]
[TD="class: rowspan even, bgcolor: #F5FAFA"][/TD]
[TD="class: even, bgcolor: #F5FAFA"]Boot[/TD]
[TD="class: noCheckBox even, bgcolor: #F5FAFA"]ok[/TD]
[TD="class: even, bgcolor: #F5FAFA"]1.2[/TD]
[/TR]
[/TABLE]
 
As you can see it is a MFSYS25V2 with 3 compute modules (MFS5520VI, with each 2x L5630 AND 24GB RAM).
There are 7x 900GB (ST9900805SS) and those provide a LVM shared disk, just like one your wiki.
 
the hint in the bug report that is an issue with a fiber/serdes device, dus I don't have a mezzanine card in the modules:
root@sint:~# lspci |grep Ether
01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
01:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)