vzdump not working as expected on fresh install but was in prevous versions

stra4d

Well-Known Member
Mar 1, 2012
80
0
46
I have a fresh install of proxmox 3.4 (see version info below). Previous server was version 3.1 and the backup method we were using was working. We do a snapshot mode vzdump Mon-Thurs as follows:
Code:
vzdump --mode snapshot --compress gzip --storage drive2_backups --maxfiles 1 --mailto scott@me.whatever --all

When the snapshot is running, the VMs crash and thus the snapshot does not finish (see info below for vzdump output and VM info).

Friday do a shutdown backup as follows (this is working fine):
Code:
vzdump --compress gzip --storage drive2_backups --maxfiles 1 --mailto scott@rci.ca --all


Backup goes to a second drive (not local). Some VMs are on local storage, some are on second drive.

Code:
# pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-37-pve: 2.6.32-147
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

VZDUMP log (middle removed for clarity):
Code:
100: Jun 02 20:00:09 INFO: Starting Backup of VM 100 (qemu)
100: Jun 02 20:00:09 INFO: status = running
100: Jun 02 20:00:10 INFO: update VM 100: -lock backup
100: Jun 02 20:00:10 INFO: backup mode: snapshot
100: Jun 02 20:00:10 INFO: ionice priority: 7
100: Jun 02 20:00:10 INFO: creating archive '/drive2/vzdumpbackups/dump/vzdump-qemu-100-2015_06_02-20_00_09.vma.gz'
100: Jun 02 20:00:10 INFO: started backup task 'a32f70bb-9501-4b5d-a821-82fbe518c364'
100: Jun 02 20:00:13 INFO: status: 0% (38928384/15032385536), sparse 0% (8904704), duration 3, 12/10 MB/s
100: Jun 02 20:00:19 INFO: status: 1% (167247872/15032385536), sparse 0% (71995392), duration 9, 21/10 MB/s
100: Jun 02 20:00:27 INFO: status: 2% (319291392/15032385536), sparse 0% (78508032), duration 17, 19/18 MB/s
100: Jun 02 20:00:33 INFO: status: 3% (471334912/15032385536), sparse 0% (82812928), duration 23, 25/24 MB/s
...
100: Jun 02 20:09:16 INFO: status: 50% (7523532800/15032385536), sparse 3% (508485632), duration 546, 10/10 MB/s
100: Jun 02 20:09:26 INFO: status: 51% (7668891648/15032385536), sparse 3% (513105920), duration 556, 14/14 MB/s
100: [COLOR=red]Jun 02 20:09:29 ERROR: VM 100 not running
[/COLOR]100: Jun 02 20:09:29 INFO: aborting backup job
100: [COLOR=red]Jun 02 20:09:29 ERROR: VM 100 not running
[/COLOR]100: [COLOR=red]Jun 02 20:09:30 ERROR: Backup of VM 100 failed - VM 100 not running
[/COLOR]

/var/log/messages from VM:
Code:
Jun  2 20:07:06 ediserv nmbd[2676]: [2015/06/02 20:07:06, 0] nmbd/nmbd_become_lmb.c:become_local_master_stage2(396)
Jun  2 20:07:06 ediserv nmbd[2676]:   *****
Jun  2 20:07:06 ediserv nmbd[2676]:
Jun  2  20:07:06 ediserv nmbd[2676]:   Samba name server EDISERV is now a local  master browser for workgroup MYGROUP on subnet 192.168.7.207
Jun  2 20:07:06 ediserv nmbd[2676]:
Jun  2 20:07:06 ediserv nmbd[2676]:   *****
Jun  2 20:08:41 ediserv shutdown[14971]: shutting down for system halt
Jun  2 20:08:43 ediserv smartd[2867]: smartd received signal 15: Terminated
Jun  2 20:08:43 ediserv smartd[2867]: smartd is exiting (exit status 0)
Jun  2 20:08:43 ediserv avahi-daemon[2783]: Got SIGTERM, quitting.
Jun  2 20:08:43 ediserv avahi-daemon[2783]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::8c61:ceff:fe5b:3365.
Jun  2 20:08:43 ediserv avahi-daemon[2783]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.7.207.
Jun  2 20:08:46 ediserv nmbd[2676]: [2015/06/02 20:08:46, 0] nmbd/nmbd.c:terminate(58)
Jun  2 20:08:46 ediserv nmbd[2676]:   Got SIGTERM: going down...
Jun  2 20:08:46 ediserv xinetd[2566]: Exiting...
Jun  2 20:09:04 ediserv gconfd (root-2894): GConf server is not in use, shutting down.
Jun  2 20:09:04 ediserv gconfd (root-2894): Exiting
Jun  2 20:09:09 ediserv ntpd[2578]: ntpd exiting on signal 15
Jun  2 20:09:09 ediserv hcid[2310]: Got disconnected from the system message bus
Jun  2 20:09:09 ediserv nm-system-settings: disconnected from the system bus, exiting.
Jun  2 20:09:10 ediserv auditd[2144]: The audit daemon is exiting.
Jun  2 20:09:10 ediserv kernel: audit(1433290150.008:1077): audit_pid=0 old=2144 by auid=4294967295
Jun  2 20:09:10 ediserv pcscd: pcscdaemon.c:572:signal_trap() Preparing for suicide
Jun  2 20:09:10 ediserv pcscd: hotplug_libusb.c:376:HPRescanUsbBus() Hotplug stopped
Jun  2 20:09:11 ediserv pcscd: readerfactory.c:1379:RFCleanupReaders() entering cleaning function
Jun  2 20:09:11 ediserv pcscd: pcscdaemon.c:532:at_exit() cleaning /var/run
Jun  2 20:09:11 ediserv kernel: Kernel logging (proc) stopped.
Jun  2 20:09:11 ediserv kernel: Kernel log daemon terminating.
Jun  2 20:09:12 ediserv exiting on signal 15
Jun  2 21:26:28 ediserv syslogd 1.4.1: restart.
 
Same results:
Code:
Jun 03 20:09:43 INFO: status: 53% (7978483712/15032385536), sparse 3% (527204352), duration 572, 11/11 MB/s
Jun 03 20:09:53 INFO: status: 54% (8126726144/15032385536), sparse 3% (531656704), duration 582, 14/14 MB/s
Jun 03 20:10:05 INFO: status: 55% (8271167488/15032385536), sparse 3% (535670784), duration 594, 12/11 MB/s
Jun 03 20:10:14 INFO: status: 56% (8419213312/15032385536), sparse 3% (537821184), duration 603, 16/16 MB/s
Jun 03 20:10:18 ERROR: VM 100 not running
Jun 03 20:10:18 INFO: aborting backup job
Jun 03 20:10:18 ERROR: VM 100 not running
Jun 03 20:10:19 ERROR: Backup of VM 100 failed - VM 100 not running

Other VMs have also stopped (some, not all). The next VM, is not running when it moves to its turn to backup:
Code:
Jun 03 20:10:19 INFO: Starting Backup of VM 101 (qemu)
Jun 03 20:10:19 INFO: status = stopped
Jun 03 20:10:19 INFO: update VM 101: -lock backup
Jun 03 20:10:19 INFO: backup mode: stop
Jun 03 20:10:19 INFO: ionice priority: 7
Jun 03 20:10:19 INFO: creating archive '/drive2/vzdumpbackups/dump/vzdump-qemu-101-2015_06_03-20_10_19.vma.gz'
Jun 03 20:10:19 INFO: starting kvm to execute backup task

Updated:
Code:
# pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-6 (running version: 3.4-6/102d4547)
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-17
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
 
Last edited:
It seems that 4 out of the 5 qemu VMs crash approx 50% of the way through the first vm snapshot backup. Not sure why. Any ideas? Something to do with the fact that the 'discs' were created under 3.1 and now are running under 3.4?
 
I have tried suspend mode (--mode suspend) and still have inconsistent results. The backups now finish, but 3 out of the 5 VMs crash part way through the backup of the first VM. Machine has the lvm, and a second drive added for addtional space (not LVM). Here is info, results on backup:
VMIDNAMEDISK LOCATIONBACKUP MODESTATUS AT STARTRUNNING AFTER
100ediservdrive2suspendrunningYES
101rciservlocalstopstoppedNO
102devservlocalstopstoppedNO
103zmaildrive2stopstoppedNO
110arrowdrive2suspendrunningYES

[TD="width: 113"][/TD]
[TD="width: 132"][/TD]
[TD="width: 121"][/TD]


First VM backup starts at 20:00:09. And a section of /var/log/messages from the first crashed VM:
Code:
Jun  8 20:08:30 rciserv nmbd[2707]: [2015/06/08 20:08:30, 0] nmbd/nmbd_become_lmb.c:become_local_master_stage2(396)
Jun  8 20:08:30 rciserv nmbd[2707]:   *****
Jun  8 20:08:30 rciserv nmbd[2707]:
Jun  8 20:08:30 rciserv nmbd[2707]:   Samba name server RCISERV is now a local master browser for workgroup MYGROUP on subnet 192.168.7.106
Jun  8 20:08:30 rciserv nmbd[2707]:
Jun  8 20:08:30 rciserv nmbd[2707]:   *****
Jun  8 20:08:47 rciserv shutdown[22122]: shutting down for system halt
Jun  8 20:08:49 rciserv smartd[2896]: smartd received signal 15: Terminated
Jun  8 20:08:49 rciserv smartd[2896]: smartd is exiting (exit status 0)
Jun  8 20:08:49 rciserv avahi-daemon[2814]: Got SIGTERM, quitting.
Jun  8 20:08:49 rciserv avahi-daemon[2814]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::646f:e6ff:fec7:4699.
Jun  8 20:08:49 rciserv avahi-daemon[2814]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.7.106.
Jun  8 20:08:51 rciserv nmbd[2707]: [2015/06/08 20:08:51, 0] nmbd/nmbd.c:terminate(58)
Jun  8 20:08:51 rciserv nmbd[2707]:   Got SIGTERM: going down...
Jun  8 20:08:52 rciserv xinetd[2604]: Exiting...
Jun  8 20:09:13 rciserv ntpd[3062]: ntpd exiting on signal 15
Jun  8 20:09:14 rciserv nm-system-settings: disconnected from the system bus, exiting.
Jun  8 20:09:14 rciserv gconfd (root-2901): GConf server is not in use, shutting down.
Jun  8 20:09:14 rciserv gconfd (root-2901): Exiting
Jun  8 20:09:14 rciserv auditd[2157]: The audit daemon is exiting.
Jun  8 20:09:14 rciserv kernel: audit(1433808554.559:518): audit_pid=0 old=2157 by auid=4294967295
Jun  8 20:09:14 rciserv pcscd: pcscdaemon.c:572:signal_trap() Preparing for suicide
 
Update - more info on disks. Added the disk type to my info, not sure if raw vs qcow2 has any affect:

Backup:
vzdump --mode suspend --compress gzip --storage drive2_backups --maxfiles 1 --mailto scott@me.com --all

VMIDNAMEDISK LOCATIONDISK TYPEBACKUP MODESTATUS AT STARTRUNNING AFTER
100ediservdrive2qcow2suspendrunningYES
101rciservlocalrawstopstoppedNO
102devservlocalrawstopstoppedNO
103zmaildrive2qcow2stopstoppedNO
110arrowdrive2qcow2suspendrunningYES

[TD="colspan: 4"]BACKUP RESULTS (--mode suspend)[/TD]
[TD="width: 113"][/TD]
[TD="width: 132"][/TD]
[TD="width: 121"][/TD]
 
I found it - in /var/log/secure was a root login and shutdown. This was caused by an old server (still running for testing) that was shutting things down. Duh!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!