Failed OpenVZ Backups and LVM lockups since upgrade to Proxmox 2.3

HostVPS

New Member
Feb 6, 2013
18
0
1
Firstly, many thanks for Proxmox 2.3! it's an amazing piece of software and we are enjoying the new KVM live backups and memory ballooning features.

The problem is we cannot use LVM snapshot backups on OpenVZ containers since the upgrade, when using vzdump LVM snapshots the LVM sub-system just locks up when creating the snapshot. You cannot run any LVM commands or stop the backup task once this has happened, and we are forced to reboot the node to recover from the LVM lockup and to be able to remove the stuck snapshot.

This happens around about half the time over three different nodes that have always been one hundred percent stable until now, these issues have only started since the upgrade to Proxmox 2.3. We have 16GB free in the VG for the snapshots and OpenVZ LVM snapshot backups still work half the time if you try, but your looking at a node reboot once it fails again.

We are using LSI 9260 RAID cards running in RAID10 with WD RE4 drives on these nodes and backing up to our NFS backup servers on a private LAN, unfortunately I didn't get the LSI driver version before the update to 2.3, has the LSI driver version changed since 2.2? Is their anything else that could be causing this issue?

Regards,

Bob
 
Last edited:
To answer my own question regarding the LSI RAID card driver version been responsible for the LVM lockup issue when performing LVM Snapshot backup on OpenVZ containers, the version is the same in PVE-2.2 and PVE-2.3.

PVE-2.3: modinfo /lib/modules/2.6.32-18-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
Code:
filename:       /lib/modules/2.6.32-18-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    LSI MegaRAID SAS Driver
author:         [EMAIL="megaraidlinux@lsi.com"]megaraidlinux@lsi.com[/EMAIL]
version:        00.00.06.14-rh1
license:        GPL
srcversion:     B5718A893E029F0BEDF21A6
alias:          pci:v00001000d0000005Dsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Bsv*sd*bc*sc*i*
alias:          pci:v00001028d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000413sv*sd*bc*sc*i*
alias:          pci:v00001000d00000071sv*sd*bc*sc*i*
alias:          pci:v00001000d00000073sv*sd*bc*sc*i*
alias:          pci:v00001000d00000079sv*sd*bc*sc*i*
alias:          pci:v00001000d00000078sv*sd*bc*sc*i*
alias:          pci:v00001000d0000007Csv*sd*bc*sc*i*
alias:          pci:v00001000d00000060sv*sd*bc*sc*i*
alias:          pci:v00001000d00000411sv*sd*bc*sc*i*
depends:
vermagic:       2.6.32-18-pve SMP mod_unload modversions
parm:           max_sectors:Maximum number of sectors per IO command (int)
parm:           msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)

PVE-2.2: modinfo /lib/modules/2.6.32-17-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
Code:
filename:       /lib/modules/2.6.32-17-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    LSI MegaRAID SAS Driver
author:         [EMAIL="megaraidlinux@lsi.com"]megaraidlinux@lsi.com[/EMAIL]
version:        00.00.06.14-rh1
license:        GPL
srcversion:     B5718A893E029F0BEDF21A6
alias:          pci:v00001000d0000005Dsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Bsv*sd*bc*sc*i*
alias:          pci:v00001028d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000413sv*sd*bc*sc*i*
alias:          pci:v00001000d00000071sv*sd*bc*sc*i*
alias:          pci:v00001000d00000073sv*sd*bc*sc*i*
alias:          pci:v00001000d00000079sv*sd*bc*sc*i*
alias:          pci:v00001000d00000078sv*sd*bc*sc*i*
alias:          pci:v00001000d0000007Csv*sd*bc*sc*i*
alias:          pci:v00001000d00000060sv*sd*bc*sc*i*
alias:          pci:v00001000d00000411sv*sd*bc*sc*i*
depends:
vermagic:       2.6.32-17-pve SMP mod_unload modversions
parm:           max_sectors:Maximum number of sectors per IO command (int)
parm:           msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)

Anybody got any ideas on what else could be causing the LVM lockup issue during OpenVZ snapshot backups since the upgrade to PVE-2.3?

Regards,

Bob
 
Last edited:
Is their anything that the Proxmox team can think of that has changed between PVE-2.2 and PVE-2.3 that could account for this major issue with LVM snapshots now? As the LSI driver version is the same and these nodes have always been stable before this upgrade. I also note that the driver version is the same in the next update of the Redhat Kernel in RHEL 6.4.

Here's a few examples of what happens when you now try to do a LVM snapshot backup.

Code:
[FONT=Helvetica]root@node2:/var/lib/vz/dump# vgdisplay [/FONT]
[FONT=Helvetica]^C  CTRL-c detected: giving up waiting for lock[/FONT]
[FONT=Helvetica]  /run/lock/lvm/V_pve: flock failed: Interrupted system call[/FONT]
[FONT=Helvetica]  Can't get lock for pve[/FONT]
[FONT=Helvetica]root@node2:/var/lib/vz/dump# lvdisplay [/FONT]
[FONT=Helvetica]^C  CTRL-c detected: giving up waiting for lock[/FONT]
[FONT=Helvetica]  /run/lock/lvm/V_pve: flock failed: Interrupted system call[/FONT]
[FONT=Helvetica]  Can't get lock for pve[/FONT]
[FONT=Helvetica]  Skipping volume group pve[/FONT]
[FONT=Helvetica]root@node2:/var/lib/vz/dump# lvs[/FONT]
[FONT=Helvetica]^C  CTRL-c detected: giving up waiting for lock[/FONT]
[FONT=Helvetica]  /run/lock/lvm/V_pve: flock failed: Interrupted system call[/FONT]
[FONT=Helvetica]  Can't get lock for pve[/FONT]
[FONT=Helvetica]  Skipping volume group pve[/FONT]
[FONT=Helvetica]root@node2:/var/lib/vz/dump#
[/FONT]


Server loads going from almost nothing to 300 plus as LVM locks and we lose io and are forced to reboot the node..
Code:
top - 17:14:23 up 1 day,  7:22,  1 user,  load average: 351.86, 200.25, 85.26
Tasks: 827 total,   1 running, 825 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16420308k total, 12454872k used,  3965436k free,   242200k buffers
Swap: 16777208k total,    78672k used, 16698536k free,  9708172k cached


    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                           
 787220 www-data  20   0  285m  47m 4328 S    1  0.3   0:00.64 apache2                                            
 787687 root      20   0 19608 1992 1020 R    1  0.0   0:00.14 top                                                
 787728 www-data  20   0  285m  46m 4128 S    1  0.3   0:00.03 apache2                                            
      1 root      20   0  8360  680  600 S    0  0.0   0:01.50 init                                               
      2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd                                           
      3 root      RT   0     0    0    0 S    0  0.0   0:02.37 migration/0                                        
      4 root      20   0     0    0    0 S    0  0.0   0:10.80 ksoftirqd/0                                        
      5 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0                                        
      6 root      RT   0     0    0    0 S    0  0.0   0:00.10 watchdog/0                                         
      7 root      RT   0     0    0    0 S    0  0.0   0:00.98 migration/1                                        
      8 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/1                                        
      9 root      20   0     0    0    0 S    0  0.0   0:05.95 ksoftirqd/1                                        
     10 root      RT   0     0    0    0 S    0  0.0   0:00.07 watchdog/1                                         
     11 root      RT   0     0    0    0 S    0  0.0   0:00.56 migration/2                                        
     12 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/2                                        
     13 root      20   0     0    0    0 S    0  0.0   0:09.20 ksoftirqd/2                                        
     14 root      RT   0     0    0    0 S    0  0.0   0:00.07 watchdog/2                                         
     15 root      RT   0     0    0    0 S    0  0.0   0:00.22 migration/3                                        
     16 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/3                                        
     17 root      20   0     0    0    0 S    0  0.0   0:07.84 ksoftirqd/3                                        
     18 root      RT   0     0    0    0 S    0  0.0   0:00.07 watchdog/3                                         
     19 root      RT   0     0    0    0 S    0  0.0   0:00.80 migration/4                                        
     20 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/4

any and all help would be most appreciated.

Best regards,

Bob

 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!