VZDump Backup Not Starting VM

Extcee

Active Member
Apr 11, 2011
50
1
26
Hey All,

I have a bit of an issue with our VZDump Backup.. this works fine on all VM's except one running Ubuntu 6.06.2.

When the backup kicks off it should suspend the VM and then resume, but when I go to check the VM in the morning it is stopped.

All logs indicate that this has been resumed by prox, but this is not the case.

Heres the logs:
Apr 10 22:50:01 PROXVE01 /USR/SBIN/CRON[4116]: (root) CMD (vzdump --quiet --node 1 --suspend --storage Backup --mailto it@holdcroft.com 113)
Apr 10 22:50:01 PROXVE01 /USR/SBIN/CRON[4115]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
Apr 10 22:50:01 PROXVE01 /USR/SBIN/CRON[4118]: (root) CMD (/usr/share/vzctl/scripts/vpsreboot)
Apr 10 22:50:01 PROXVE01 /USR/SBIN/CRON[4117]: (root) CMD (/usr/share/vzctl/scripts/vpsnetclean)
Apr 10 22:50:02 PROXVE01 vzdump[4116]: INFO: trying to get global lock - waiting...
.....
Apr 11 00:21:55 PROXVE01 vzdump[3269]: INFO: Finished Backup of VM 116 (00:07:15)
Apr 11 00:21:55 PROXVE01 vzdump[3269]: INFO: Backup job finished successfuly
Apr 11 00:21:56 PROXVE01 vzdump[4116]: INFO: got global lock
Apr 11 00:21:56 PROXVE01 vzdump[4116]: INFO: starting new backup job: vzdump --quiet --node 1 --suspend --storage Backup --mailto it@holdcroft.com 113
Apr 11 00:21:56 PROXVE01 vzdump[4116]: INFO: Starting Backup of VM 113 (qemu)
Apr 11 00:21:57 PROXVE01 postfix/pickup[4735]: 6BC163A47D2: uid=0 from=<root>
Apr 11 00:21:57 PROXVE01 postfix/cleanup[5682]: 6BC163A47D2: message-id=<20110410232157.6BC163A47D2@PROXVE01.localdomain>
Apr 11 00:21:57 PROXVE01 postfix/qmgr[1938]: 6BC163A47D2: from=<root@PROXVE01.localdomain>, size=6097, nrcpt=1 (queue active)
Apr 11 00:21:58 PROXVE01 postfix/smtp[5685]: 6BC163A47D2: to=<it@holdcroft.com>, relay=vpop.holdcroft.com[192.168.0.5]:25, delay=1.2, delays=0.51/0.06/0.08/0.53, dsn=2.0.0, status=sent (250 2.0.0 OK)
Apr 11 00:21:58 PROXVE01 postfix/qmgr[1938]: 6BC163A47D2: removed
Apr 11 00:21:58 PROXVE01 qm[5692]: VM 113 suspend
Apr 11 00:22:24 PROXVE01 pvemirror[2079]: starting cluster syncronization
Apr 11 00:22:25 PROXVE01 pvemirror[2079]: syncing templates
Apr 11 00:22:25 PROXVE01 pvemirror[2079]: cluster syncronization finished (0.79 seconds (files 0.00, config 0.00))
....
Apr 11 02:25:13 PROXVE01 qm[7685]: VM 113 resume
Apr 11 02:25:13 PROXVE01 kernel: vmbr0: port 2(tap113i0d0) entering disabled state
Apr 11 02:25:13 PROXVE01 kernel: vmbr0: port 2(tap113i0d0) entering disabled state
Apr 11 02:25:14 PROXVE01 vzdump[4116]: INFO: Finished Backup of VM 113 (02:03:18)
Apr 11 02:25:14 PROXVE01 vzdump[4116]: INFO: Backup job finished successfuly
Apr 11 02:25:16 PROXVE01 postfix/pickup[6418]: A24C43A47D2: uid=0 from=<root>
Apr 11 02:25:16 PROXVE01 postfix/cleanup[7697]: A24C43A47D2: message-id=<20110411012516.A24C43A47D2@PROXVE01.localdomain>
Apr 11 02:25:16 PROXVE01 postfix/qmgr[1938]: A24C43A47D2: from=<root@PROXVE01.localdomain>, size=3943, nrcpt=1 (queue active)
Apr 11 02:25:16 PROXVE01 postfix/smtp[7699]: A24C43A47D2: to=<it@holdcroft.com>, relay=vpop.holdcroft.com[192.168.0.5]:25, delay=2.1, delays=1.9/0.03/0.06/0.12, dsn=2.0.0, status=sent (250 2.0.0 OK)
Apr 11 02:25:16 PROXVE01 postfix/qmgr[1938]: A24C43A47D2: removed
Apr 11 02:25:24 PROXVE01 pvemirror[2079]: starting cluster syncronization
Apr 11 02:25:24 PROXVE01 pvemirror[2079]: syncing templates
Apr 11 02:25:24 PROXVE01 pvemirror[2079]: cluster syncronization finished (0.25 seconds (files 0.00, config 0.00))
Apr 11 02:25:29 PROXVE01 ntpd[2090]: Deleting interface #14 tap113i0d0, fe80::f899:dbff:fee6:9910#123, interface stats: received=0, sent=0, dropped=0, active_time=242100 secs


And the logs from the VM:

Apr 6 23:07:50 han-pdc slapd[8082]: conn=5792 op=1 SRCH base="" scope=0 deref=0 filter="(objectClass=*)"
Apr 6 23:07:50 han-pdc slapd[8082]: conn=5792 op=1 SRCH attr=supportedControl
Apr 7 06:22:54 han-pdc syslogd 1.4.1#17ubuntu7.1: restart.
Apr 7 06:22:54 han-pdc nscd: nss_ldap: could not connect to any LDAP server as (null) - Can't contact LDAP server
Apr 7 06:22:54 han-pdc nscd: nss_ldap: could not connect to any LDAP server as (null) - Can't contact LDAP server
Apr 7 06:22:54 han-pdc kernel: Inspecting /boot/System.map-2.6.15-23-server
Apr 7 06:22:54 han-pdc kernel: Loaded 23140 symbols from /boot/System.map-2.6.15-23-server.

PVE Version:

PROXVE01:~# pveversion -v
pve-manager: 1.8-15 (pve-manager/1.8/5754)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-32
pve-kernel-2.6.32-4-pve: 2.6.32-32
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5

Backup's are stored to local mounted backup folder (/var/lib/vz/backup).
Image files are stored on secondary disk, mounted on folder (in linux) and added to Prox as NFS (/var/lib/vz/DataStore/), this is an LVM where the Backups are just stored in ext3 (I think, dependant on what Prox VE uses for file system)...


I have tried changing the backup method to use suspend, stop and snapshot but I still get the same result each time..

I have checked ACPI on the Linux VM and tried to install this as a package, but this did not resolve.

We have a WinXP VM running on the same node backing up to the same place fine..

Any idea's guys?
 
I suggest you try snapshot mode. if you use snapshot mode you need to store the backups outside /var/lib/vz otherwise you got a loop.
 
Just changed backup schedule to send to a Windows SAN (Mounted by smb.cfs as a directory) with suspend mode.

Will let you know when backup runs tonight... Thanks for your quick response!


It's just strange that Prox say's the VM is starting but this never happens and I have to start from Web GUI
 
Just checked the VM and it was stopped....

This was backing up to a Server2008 box..

Here's the logs
<pre>

VMID NAME STATUS TIME SIZE FILENAME
113 babcom.holdcroft.com OK 01:41:31 105.54GB /mnt/SANBackup/vzdump-qemu-113-2011_04_12-00_23_32.tar
TOTAL 01:41:31 105.54GB


Detailed backup logs:

vzdump --quiet --node 1 --snapshot --storage SANBackup --mailto it@holdcroft.com 113

113: Apr 12 00:23:32 INFO: Starting Backup of VM 113 (qemu)
113: Apr 12 00:23:33 INFO: running
113: Apr 12 00:23:33 INFO: status = running
113: Apr 12 00:23:34 INFO: mode failure - unable to detect lvm volume group
113: Apr 12 00:23:34 INFO: trying 'suspend' mode instead
113: Apr 12 00:23:34 INFO: backup mode: suspend
113: Apr 12 00:23:34 INFO: ionice priority: 7
113: Apr 12 00:23:34 INFO: suspend vm
113: Apr 12 00:23:34 INFO: creating archive '/mnt/SANBackup/vzdump-qemu-113-2011_04_12-00_23_32.tar'
113: Apr 12 00:23:34 INFO: adding '/mnt/SANBackup/vzdump-qemu-113-2011_04_12-00_23_32.tmp/qemu-server.conf' to archive ('qemu-server.conf')
113: Apr 12 00:23:34 INFO: adding '/mnt/pve/DataStore/images/113/vm-113-disk.qcow2' to archive ('vm-disk-ide0.qcow2')
113: Apr 12 00:59:45 INFO: adding '/mnt/pve/DataStore/images/113/vm-113-disk2.qcow2' to archive ('vm-disk-ide1.qcow2')
113: Apr 12 02:05:02 INFO: Total bytes written: 113325472256 (17.75 MiB/s)
113: Apr 12 02:05:02 INFO: archive file size: 105.54GB
113: Apr 12 02:05:03 INFO: resume vm
113: Apr 12 02:05:03 INFO: vm is online again after 6089 seconds
113: Apr 12 02:05:03 INFO: Finished Backup of VM 113 (01:41:31)
</pre>

Any ideas ?
 
...
113: Apr 12 00:23:34 INFO: mode failure - unable to detect lvm volume group
...
Any ideas ?
Hi,
your storage, where the vm-disk is, isn't a logical volume! So vzdump can't create a snapshot.
If you store the vm-data on /var/lib/vz (on lv pve-data) your backup should run (if you have min. 4G free space in the volume group).

Udo
 
Udo,

The backup runs irrespective of where I put it and what method I do, it's just the VM doesn't start even though Prox says that it has..

Its becoming a case where I have to manually start via the web after the backup has taken place... this is the issue that is causing me a nightmare!!

Thanks for your help guys.
 
Udo,

The backup runs irrespective of where I put it and what method I do, it's just the VM doesn't start even though Prox says that it has..

Its becoming a case where I have to manually start via the web after the backup has taken place... this is the issue that is causing me a nightmare!!

Thanks for your help guys.
Hi,
but if you use snapshot, you don't have to start the vm because they are still running.

But what's about a little backup-script (as workaround). First vzdump and after that "qm start VMID"?
Then you have to control this backup via cron and not the gui...

Udo
 
Udo,

Many thanks, ideally this is not the way to go... but I might just have to do this....

When I go to start our VM it is always stopped with no activity/CPU usage.. if this is being suspended to backup why the devil is this stopping the VM?

I did have this working on another node on a different cluster which has now been destroyed and VM's migrated.. (server was hanging at night).. it's only since I have moved this to a new cluster..

Here are the configs for the VM:

--OLD--

name: babcom.holdcroft.com
ide2: local:iso/gparted-live-0.5.2-1.iso,media=cdrom
sockets: 1
vlan0: rtl8139=06:87:67:B3:92:51
ide0: vm-113-disk.qcow2
ide1: vm-113-disk2.qcow2
ostype: l26
memory: 512
onboot: 0
boot: c
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
cores: 2
bootdisk: ide0


--NEW--

name: babcom.holdcroft.com
ide2: none,media=cdrom
sockets: 1
vlan0: rtl8139=06:87:67:B3:92:51
ide0: DataStore:113/vm-113-disk.qcow2
ide1: DataStore:113/vm-113-disk2.qcow2
ostype: l26
memory: 512
onboot: 1
boot: c
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
cores: 2
bootdisk: ide0

If this helps? Im wondering if disabling ACPI will make any difference?