Backup of VM failed: exit code 5

May 5, 2010
44
0
6
California, USA
Hello,

I tried to make a backup of a virtual machine and got this:

INFO: starting new backup job: vzdump 19217 --remove 0 --mode snapshot --storage vmbackup.fortress --node ironworks
INFO: Starting Backup of VM 19217 (qemu)
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: vm-19217-disk-1 must be active exclusively to create snapshot
ERROR: Backup of VM 19217 failed - command 'lvcreate --size 1024M --snapshot --name 'vzsnap-ironworks-0' '/dev/five/vm-19217-disk-1'' failed: exit code 5
INFO: Backup job finished with errors
TASK ERROR: job errors

I did some searching and it looks like this is related to LVM locking? I do run clustered LVM for DRBD.

Any ideas? I would really like to backup my VMs :)

pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-71
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1


Thanks!

--Will
 
Proxmox has its own locking so cLVM is not necessary for LVM in Proxmox.
I have over a dozen Proxmox nodes running DRBD without cLVM, a few of them have been in operation since fall of 2009.

The issue you have is you can not create snapshots on cLVM, we ran into this problem when 2.0 was still in beta and Proxmox devs removed cLVM from being a default configuration.

Three different solutions to your problem:
1. Get rid of cLVM since it is not needed
2. Start using pvetest repo and try using the new backup features (no longer requires LVM snapshots for live backup)
3. Wait until pvetest is moved to stable and use the new backup feature
 
Thank you for your quick reply!


Well that explains why cLVM is so "weird" and not on by default in proxmox....duh. I had a lot of problems with DRBD and locking with LVM so lots of googling had lead me to that setting.

1. I have no problem getting rid of cLVM
2/3. These are production servers so unfortunately I have to hold off and wait for the (cool sounding) new backup features.

How do I use the built in LVM locking in Proxmox?! I had no idea it existed!

--Will

Proxmox has its own locking so cLVM is not necessary for LVM in Proxmox.
I have over a dozen Proxmox nodes running DRBD without cLVM, a few of them have been in operation since fall of 2009.

The issue you have is you can not create snapshots on cLVM, we ran into this problem when 2.0 was still in beta and Proxmox devs removed cLVM from being a default configuration.

Three different solutions to your problem:
1. Get rid of cLVM since it is not needed
2. Start using pvetest repo and try using the new backup features (no longer requires LVM snapshots for live backup)
3. Wait until pvetest is moved to stable and use the new backup feature
 
Great! So right now I have "locking_type = 3" set in lvm.conf, for what I believe is LVM's built in locking? What should I change that too? These are the options in LVM.conf:

Type of locking to use. Defaults to local file-based locking (1). Turn locking off by setting to 0 (dangerous: risks metadata corruption if LVM2 commands get run concurrently).
Type 2 uses the external shared library locking_library.
Type 3 uses built-in clustered locking.
Type 4 uses read-only locking which forbids any operations that might change metadata.

--Will


That is used automatically.
 
Please use 'locking_type = 1'. But keep in mind that only the PVE commands locks correctly (pvesm, qm) - there is only local locking when you use LVM command directly.
 
Please use 'locking_type = 1'. But keep in mind that only the PVE commands locks correctly (pvesm, qm) - there is only local locking when you use LVM command directly.

Cool, so can I make this change with everything "online?" I assume that the locking only comes into play when something has to deal directly with LVM. So if the proxmox tools (pvesm looks cool, didn't know that was there!) are the only ones that lock correctly, should I not use the LVM tools? Like pvscan, pvremove, lvscan, lvremove etc?

So I'm going to try this sequence:

1. /etc/init.d/clvm stop
2. vim /etc/lvm/lvm.conf (edit locking type to 1)
3. test backups

All should be good?
 
should I not use the LVM tools? Like pvscan, pvremove, lvscan, lvremove etc?
[/qoute]

You can safely use LVM tools which read data (lvs, vgs), but only use PVE tools to modify LVM data (alloc/delete volumes).

1. /etc/init.d/clvm stop
2. vim /etc/lvm/lvm.conf (edit locking type to 1)
3. test backups

I guess that should work.
 
should I not use the LVM tools? Like pvscan, pvremove, lvscan, lvremove etc?
[/qoute]

You can safely use LVM tools which read data (lvs, vgs), but only use PVE tools to modify LVM data (alloc/delete volumes).



I guess that should work.

Last night I tried the above sequence:

# /etc/init.d/clvm stop
# vim /etc/lvm/lvm.conf (changed locking type)
# tried backing up a VM

Unfortunately I lost the error message (damn windows updates rebooting computers) but It said something like "skipping clustered volumes, can't activate logical volume" I forgot that I created the VG with the -c flag for clustered LVM. Seems like I need to convert it back to a stand alone VG? I can use vgchange -c n, I ran a test on another server and it seems to keep all the LVs. The only thing I need to do is disable all locking (locking_type = 0) which is kind of scary. Sorry if these questions are getting out of the scope of this forum, just let me know.

So it seems like I need to:
1. stop clvmd
2. disable locking
3. convert the VG to stand a lone
4. re-enable locking with type = 1

That seems like my only option. Just curious, how does the proxmox LVM locking work? Sounds interesting :)

--Will
 
I think the safest approach I have right now is to migrate all VMs to one node. Then convert the clustered LVM back to stand a lone LVM. Then move the VMs back. Problem is I can't migrate VMs now :( When I try to migrate a VM from either node it uses the wrong IP address.

When I run a pvecm status it shows the nodes IP address as it should be:
pvecm status
Version: 6.2.0
Config Version: 3
Cluster Name: do-prod
Cluster Id: 12346
Cluster Member: Yes
Cluster Generation: 180
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 9
Flags:
Ports Bound: 0 11 177 178
Node name: ironworks
Node ID: 1
Multicast addresses: 239.192.48.106
Node addresses: 10.1.69.24


But when I try to migrate it uses an IP address that I have only configured for IPMI/fencing in the cluster.conf. Strangely that is the IP address that is in .members:
cat /etc/pve/.members
{
"nodename": "ironworks",
"version": 7,
"cluster": { "name": "do-prod", "version": 3, "nodes": 2, "quorate": 1 },
"nodelist": {
"ironworks": { "id": 1, "online": 1, "ip": "10.1.69.25"},
"forge": { "id": 2, "online": 1, "ip": "10.1.69.26"}
}
}

I'm not sure how this happened (didn't experience in my testing) but I'm sure it's user error. Hopefully a quick fix as well, please let me know if I should start a new thread for this.

--Will
 
Never changed clustered to non clustered but with drbd I think you might want to do it something like this:
Move all VMs to one node.
Turn off node with no VMs
Turn off all VMs
Remove clustered flag
Reboot physical node
Start up VMs, or let them auto start
Start up other node
Migrate VMs that should be on other node there

Reason I suggest this process is I suspect removing clustered flag on one node might not get recognized (or cause some problem) on the other node.

Also, make sure you have good backups and maint window so it is not a disaster if something goes wrong.
 
Never changed clustered to non clustered but with drbd I think you might want to do it something like this:
Move all VMs to one node.
Turn off node with no VMs
Turn off all VMs
Remove clustered flag
Reboot physical node
Start up VMs, or let them auto start
Start up other node
Migrate VMs that should be on other node there

Reason I suggest this process is I suspect removing clustered flag on one node might not get recognized (or cause some problem) on the other node.

Also, make sure you have good backups and maint window so it is not a disaster if something goes wrong.

I thought changing clustered LVM to stand alone LVM sounded risky. I have been considering something similar to what you have suggested. The problem is that DRBD is in split brain :( Also with out the ability to migrate VMs or back them up I am somewhat stuck. Luckily most of the LVs exist on both nodes, only a couple were made after DRBD split. But of course the LVs are out of sync on both nodes. I might just have to dd the LVs to a file and then copy them over to the other node and dd them back (and then bring up the VM on that node). However the biggest problem I'm facing right now is that I can't migrate VMs because the entries in /etc/hosts and /etc/pve/.members are incorrect. They seem easy enough to change but is there any danger to changer the IP addresses?

I know this cluster is in pretty bad shape, thanks for sticking with me :)

--Will
 
I thought changing clustered LVM to stand alone LVM sounded risky. I have been considering something similar to what you have suggested. The problem is that DRBD is in split brain :( Also with out the ability to migrate VMs or back them up I am somewhat stuck. Luckily most of the LVs exist on both nodes, only a couple were made after DRBD split. But of course the LVs are out of sync on both nodes. I might just have to dd the LVs to a file and then copy them over to the other node and dd them back (and then bring up the VM on that node). However the biggest problem I'm facing right now is that I can't migrate VMs because the entries in /etc/hosts and /etc/pve/.members are incorrect. They seem easy enough to change but is there any danger to changer the IP addresses?

I know this cluster is in pretty bad shape, thanks for sticking with me :)

--Will

I was able to fix my cluster, here are the steps I took:

-Shut down all VMs on all nodes
-vgchange -c n on all VGs
-Killed clvmd process
-returned locking type to "1" in /etc/lvm/lvm.conf
-backed up virtual machines
-reboot
-Everything was happy :)

Thanks guys,

--Will
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!