After update to 3.2: VM crashing during backup

jens.kuespert · Mar 15, 2014

Hi,

after upgrading to PVE 3.2 one of our VMs is crashing during backup. We are usgin a LVM-storage and snapshot mode.

Code:

  101: Mar 13 00:42:11 INFO: Starting Backup of VM 101 (qemu)
  101: Mar 13 00:42:11 INFO: status = running
  101: Mar 13 00:42:11 INFO: update VM 101: -lock backup
  101: Mar 13 00:42:12 INFO: exclude disk 'ide1' (backup=no)
  101: Mar 13 00:42:12 INFO: backup mode: snapshot
  101: Mar 13 00:42:12 INFO: ionice priority: 7
  101: Mar 13 00:42:12 INFO: creating archive '/mnt/pve/nas-backup/dump/vzdump-qemu-101-2014_03_13-00_42_11.vma.lzo'
  101: Mar 13 00:42:12 INFO: started backup task 'e3947a88-205a-42c1-9e7c-a352bcef5837'
  101: Mar 13 00:42:15 INFO: status: 0% (806354944/343597383680), sparse 0% (33640448), duration 3, 268/257 MB/s
  101: Mar 13 00:42:26 INFO: status: 1% (3516399616/343597383680), sparse 0% (105631744), duration 14, 246/239 MB/s
  101: Mar 13 00:45:53 INFO: status: 2% (7066157056/343597383680), sparse 0% (204587008), duration 221, 17/16 MB/s
  101: Mar 13 00:46:07 INFO: status: 3% (10373693440/343597383680), sparse 0% (258056192), duration 235, 236/232 MB/s
  101: Mar 13 00:49:00 ERROR: VM 101 not running
  101: Mar 13 00:49:00 INFO: aborting backup job
  101: Mar 13 00:49:00 ERROR: VM 101 not running
  101: Mar 13 00:49:02 ERROR: Backup of VM 101 failed - VM 101 not running

As the storage lies directly on a LVM there no imagefiles involved. Any ideas we we can start to look for the cause of the error?

FYI:

Code:

# pveversion -v
proxmox-ve-2.6.32: 3.2-121 (running kernel: 2.6.32-27-pve)
pve-manager: 3.2-1 (running version: 3.2-1/1933730b)
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-2.6.32-24-pve: 2.6.32-111
pve-kernel-2.6.32-25-pve: 2.6.32-113
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-4
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

--
- Jens -

dietmar · Mar 16, 2014

jens.kuespert said:
As the storage lies directly on a LVM there no imagefiles involved. Any ideas we we can start to look for the cause of the error?

Are there any hints in /var/log/syslog?

And please can you post the VM config?

cesarpk · Mar 16, 2014

jens.kuespert said:
As the storage lies directly on a LVM there no imagefiles involved. Any ideas we we can start to look for the cause of the error?

Please PVE developers, can answer this question?... soon i will have a VM with 4 virtual disks and i believe that the code of Backup must be solid as a rock (for the VM and also for the Backup that running in mode snapshot)
(Note: i don't know if the problem is in the code of backup, or is other problem, but in anyway this topic is very important!!!)

Thanking you in advance for the fine attention from this community so kind and above all by his developers, I'll be holding their breath.

Best regards
Cesar

dietmar · Mar 16, 2014

cesarpk said:
Please PVE developers, can answer this question?...

Please stop posting such useless messages (i already answered).

cesarpk · Mar 16, 2014

dietmar said:
Please stop posting such useless messages (i already answered).

OK, sorry

jens.kuespert · Mar 16, 2014

dietmar said:
Are there any hints in /var/log/syslog?

I've looked through it and found nothing. Any ideas for keywords I'd look out for?

dietmar said:
And please can you post the VM config?

Sure.

Code:

root@host1:/etc# cat pve/qemu-server/101.conf
#Test-Installation f%C3%BCr den Server
bootdisk: virtio0
cores: 2
ide0: local:iso/ubuntu-12.04.2-server-amd64.iso,media=cdrom,size=656M
memory: 8196
name: test-server
net0: virtio=16:3E:5D:B5:7F:8E,bridge=vmbr1
onboot: 1
ostype: l26
sockets: 1
virtio0: machines:vm-101-disk-1,size=10G
virtio1: machines:vm-101-disk-2,size=10G
virtio2: machines:vm-101-disk-3,size=150G
virtio3: machines:vm-101-disk-4,size=150G
root@host1:/etc# cat pve/storage.cfg
lvm: machines
        vgname data
        content images

dir: local
        path /var/lib/vz
        content images,iso,vztmpl,rootdir
        maxfiles 0

nfs: nas-backup
        path /mnt/pve/nas-backup
        server 192.168.2.101
        export /Backup
        options vers=3
        content backup
        maxfiles 11

root@host1:/etc# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda2  pve  lvm2 a--  929.49g 16.00g
  /dev/sdb1  data lvm2 a--    3.63t  2.58t
root@host1:/etc# lvs
  LV                            VG   Attr      LSize   Pool Origin        Data%  Move Log Copy%  Convert
  vm-100-disk-1                 data -wi-ao---  16.00g
  vm-101-disk-1                 data -wi-ao---  10.00g
  vm-101-disk-2                 data -wi-ao---  10.00g
  vm-101-disk-3                 data -wi-ao--- 150.00g
  vm-101-disk-4                 data -wi-ao--- 150.00g
  vm-102-disk-1                 data -wi------ 140.00g
  vm-103-disk-1                 data -wi-a----  41.00g
  vm-104-disk-1                 data owi---s--  16.00g
  vm-104-snapshot-20130415-1325 data swi---s--   4.00g      vm-104-disk-1
  vm-201-disk-1                 data -wi-a----  16.00g
  vm-202-disk-1                 data -wi-a----  10.00g
  vm-202-disk-2                 data -wi-a----  10.00g
  vm-202-disk-3                 data -wi-a---- 150.00g
  vm-202-disk-4                 data -wi-a---- 150.00g
  vm-205-disk-1                 data -wi-ao--- 200.00g
  data                          pve  -wi-ao--- 810.49g
  root                          pve  -wi-ao---  96.00g
  swap                          pve  -wi-ao---   7.00g

Backup goes to the storage nas-backup.

shartenauer · Mar 17, 2014

Hello together,

after I have upgraded 3 of our Proxmox servers from 3.1 to version 3.2 I have exactly the same behaviour. During the backup in mode snapshot to our FTP Space the VM dies and the backup fails. Before the upgrade to 3.2 the backup was running fine for 9 month.

Host1 Host2 Host3
pveversion -v
proxmox-ve-2.6.32: 3.2-121 (running kernel: 2.6.32-27-pve)
pve-manager: 3.2-1 (running version: 3.2-1/1933730b)
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-4
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

The behaviour is independent from the VM or the OS or configuration of the VM.
Host1:
VM1 Ubuntu 12.04 crashes 1 time 12.03.2014
VM2 was ok WIN2012 R2
VM3 CentOS 5.6 chrashes 5 times: 12.03, 13.03, 14.03, 15.03 & 17.03.2014
110: Mar 17 00:36:27 INFO: Starting Backup of VM 110 (qemu)
110: Mar 17 00:36:27 INFO: status = running
110: Mar 17 00:36:27 INFO: update VM 110: -lock backup
110: Mar 17 00:36:27 INFO: backup mode: snapshot
110: Mar 17 00:36:27 INFO: ionice priority: 7
110: Mar 17 00:36:27 INFO: creating archive '/mnt/backup-server/iBackOffice/dump/vzdump-qemu-110-2014_03_17-00_36_27.vma.gz'
110: Mar 17 00:36:27 INFO: started backup task 'f59b14f8-3407-4e5f-b8ce-f2cfef0a772b'
110: Mar 17 00:36:30 INFO: status: 0% (129236992/53687091200), sparse 0% (25931776), duration 3, 43/34 MB/s
110: Mar 17 00:36:33 INFO: status: 1% (779223040/53687091200), sparse 1% (591933440), duration 6, 216/27 MB/s
110: Mar 17 00:36:37 INFO: status: 2% (1094713344/53687091200), sparse 1% (684638208), duration 10, 78/55 MB/s
110: Mar 17 00:36:59 ERROR: VM 110 not running
110: Mar 17 00:36:59 INFO: aborting backup job
110: Mar 17 00:36:59 ERROR: VM 110 not running
110: Mar 17 00:37:00 ERROR: Backup of VM 110 failed - VM 110 not running

Host 2
VM1 crashes 1 time on 16.03.2014
INFO: Starting Backup of VM 100 (qemu)
INFO: status = running
INFO: update VM 100: -lock backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/backup-server/WD/dump/vzdump-qemu-100-2014_03_16-03_00_01.vma.gz'
INFO: started backup task '3c451922-3360-479f-8f3a-7f9337a3c7c3'
ERROR: VM 100 not running
INFO: aborting backup job
ERROR: VM 100 not running
ERROR: Backup of VM 100 failed - VM 100 not running

VM2 was OK WIN2008R2

HOST3 OK
VM1 Ubuntu 12.04 OK

This is a very big problem, I hope its possible to fix.

Best regards Stephan

dietmar · Mar 17, 2014

shartenauer said:
after I have upgraded 3 of our Proxmox servers from 3.1 to version 3.2 I have exactly the same behaviour. During the backup in mode snapshot to our FTP Space

How do you mount that FTP space? Does it work when you use local storage as backup target?

Oer2001 · Mar 17, 2014

Same problem here:

INFO: Starting Backup of VM 209 (qemu)
INFO: status = running
INFO: update VM 209: -lock backup
INFO: backup mode: snapshot
INFO: bandwidth limit: 20000 KB/s
INFO: ionice priority: 7
INFO: creating archive '/mnt/backup/proxmox//dump/vzdump-qemu-209-2014_03_16-00_09_07.vma.gz'
INFO: started backup task '96bf0b4c-7bb9-44d9-be10-98b6605d058a'
INFO: status: 0% (62914560/483183820800), sparse 0% (1064960), duration 3, 20/20 MB/s
INFO: status: 1% (4849729536/483183820800), sparse 0% (81276928), duration 240, 20/19 MB/s
INFO: status: 2% (9683140608/483183820800), sparse 0% (110456832), duration 480, 20/20 MB/s
ERROR: VM 209 not running
INFO: aborting backup job
ERROR: VM 209 not running
ERROR: Backup of VM 209 failed - VM 209 not running

Two of my 20 VM's are affected. Both VM's uses disks larger than 300GB.

cat /etc/pve/qemu-server/209.conf
boot: dcn
bootdisk: virtio0
cores: 4
ide2: none,media=cdrom
memory: 49152
name: klondike
net0: virtio=06:86:5F:A5:44:66,bridge=vmbr0
net1: virtio=8E:EE:5C:B2:A9:CE,bridge=vmbr1
onboot: 1
ostype: l26
sockets: 2
virtio0: kvm_disk0_lun1:vm-209-disk-1,size=450G

pveversion -v
proxmox-ve-2.6.32: 3.2-121 (running kernel: 2.6.32-27-pve)
pve-manager: 3.2-1 (running version: 3.2-1/1933730b)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-2.6.32-24-pve: 2.6.32-111
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-4
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

Thank you for help.
Regards,
Erik

shartenauer · Mar 17, 2014

dietmar said:
How do you mount that FTP space? Does it work when you use local storage as backup target?

Hello Dietmar,

the FTP Space (Hetzner FTP Space) is mounted via CIFS in fstab with:
//uXXXXX.your-backup.de/backup /mnt/backup-server cifs noauto,iocharset=utf8,rw,credentials=/etc/backup-credentials.txt,file_mode=0660,dir_mode=0770 0 0

I can't backup to an local backup storage because, the drives are small SSD's with no space for a local backup.

Best regards Stephan

jens.kuespert · Mar 17, 2014

dietmar said:
Are there any hints in /var/log/syslog?

As noted above, the syslog tells nothing (or I do now know what to look for).

But I did some further testing:

I restored the VM from the last backup and tried a backup to a directory-storage. This time everything runs as expected. Weird.

Now I'll try backup the freshly restored machine to our nas. Stay tuned...

jens.kuespert · Mar 17, 2014

jens.kuespert said:
Now I'll try backup the freshly restored machine to our nas. Stay tuned...

Hm... This time the backup succeeded. To wrap it up:
* The old VM could not be backed up to NAS.
* A freshly restored copy of the VM could be backed up (local and NAS).

Tonight I'll try backup the old VM to the local storage... TBC

Kaya · Mar 17, 2014

I had the same problem on my proxmox 2.3 version. VM suddenly hang during snapshot.

gbr · Mar 17, 2014

Just got the same thing on a 3.1-43 install, using local 'directory' storage.

Syslog shows this:

Mar 17 07:35:37 proxmoxvm pvedaemon[219569]: <root@pam> successful auth for user 'root@pam'
Mar 17 07:36:30 proxmoxvm pvedaemon[235617]: <root@pam> starting task UPID

roxmoxvm:0005508C:061D24C8:5326EC4E:vzdump::root@pam:
Mar 17 07:36:30 proxmoxvm pvedaemon[348300]: INFO: starting new backup job: vzdump 100 --remove 0 --mode snapshot --compress gzip --storage vm-storage --node proxmoxvm
Mar 17 07:36:30 proxmoxvm pvedaemon[348300]: INFO: Starting Backup of VM 100 (qemu)
Mar 17 07:36:30 proxmoxvm qm[348305]: <root@pam> update VM 100: -lock backup
Mar 17 07:40:32 proxmoxvm kernel: vmbr0: port 2(tap100i0) entering disabled state
Mar 17 07:40:32 proxmoxvm kernel: vmbr0: port 2(tap100i0) entering disabled state
Mar 17 07:40:32 proxmoxvm kernel: vmbr1: port 2(tap100i1) entering disabled state
Mar 17 07:40:32 proxmoxvm kernel: vmbr1: port 2(tap100i1) entering disabled state
Mar 17 07:40:32 proxmoxvm kernel: vmbr2: port 2(tap100i2) entering disabled state
Mar 17 07:40:32 proxmoxvm kernel: vmbr2: port 2(tap100i2) entering disabled state
Mar 17 07:40:32 proxmoxvm pvedaemon[348300]: VM 100 qmp command failed - VM 100 not running
Mar 17 07:40:32 proxmoxvm pvedaemon[348300]: VM 100 qmp command failed - VM 100 not running
Mar 17 07:40:32 proxmoxvm pvedaemon[348300]: ERROR: Backup of VM 100 failed - VM 100 not running
Mar 17 07:40:32 proxmoxvm pvedaemon[348300]: INFO: Backup job finished with errors
Mar 17 07:40:32 proxmoxvm pvedaemon[348300]: job errors
Mar 17 07:40:32 proxmoxvm pvedaemon[235617]: <root@pam> end task UPID

roxmoxvm:0005508C:061D24C8:5326EC4E:vzdump::root@pam: job errors
Mar 17 07:40:34 proxmoxvm ntpd[2945]: Deleting interface #14 tap100i1, fe80::3406:95ff:fe59:3f64#123, interface stats: received=0, sent=0, dropped=0, active_time=1025798 secs
Mar 17 07:40:34 proxmoxvm ntpd[2945]: Deleting interface #13 tap100i0, fe80::b461:40ff:fea2:564f#123, interface stats: received=0, sent=0, dropped=0, active_time=1025798 secs
Mar 17 07:40:34 proxmoxvm ntpd[2945]: Deleting interface #12 tap100i2, fe80::8864:49ff:fe4a:df05#123, interface stats: received=0, sent=0, dropped=0, active_time=1025798 secs
Mar 17 07:40:34 proxmoxvm ntpd[2945]: peers refreshed
Mar 17 07:41:12 proxmoxvm rrdcached[2972]: flushing old values

Gerald

jens.kuespert · Mar 17, 2014

...answering myself again:

I did a backup of the VM which failed before (reproducibly) to a local storage. The backup ran as expected. Now I'm really confused.

For now I'll do my backups this way. On the next weekend I'll give it another try at backing up to our NAS.

johnplv · Mar 18, 2014

Today I have the same problem. Never until this moment this is not happening.
I have a cluster of three nodes with shared storage on iSCSI (NexentaStor via ZFS plugin).
Updated to version 3.2 immediately after release (a week ago). And everything was fine.
But today, two virtual machines (out of 12) are broken during backup.
I did migrate virtual machine to another node. And tried to make a backup there. The problem persisted.
Backup was performed on a local drive. In the logs there are no errors (except for crash of the virtual machine).

Code:

INFO: starting new backup job: vzdump 1070 --remove 0 --mode snapshot --compress gzip --storage local --node pve-node3
INFO: Starting Backup of VM 1070 (qemu)
INFO: status = running
INFO: update VM 1070: -lock backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-1070-2014_03_18-09_15_03.vma.gz'
INFO: started backup task '881c1248-25d1-43ac-bd5a-7216d2e797a8'
INFO: status: 0% (111017984/34359738368), sparse 0% (78438400), duration 3, 37/10 MB/s
INFO: status: 1% (396034048/34359738368), sparse 1% (353619968), duration 8, 57/1 MB/s
INFO: status: 2% (699072512/34359738368), sparse 1% (410693632), duration 35, 11/9 MB/s
ERROR: VM 1070 not running
INFO: aborting backup job
ERROR: VM 1070 not running
ERROR: Backup of VM 1070 failed - VM 1070 not running
INFO: Backup job finished with errors
TASK ERROR: job errors

Code:

proxmox-ve-2.6.32: 3.2-121 (running kernel: 2.6.32-27-pve)
pve-manager: 3.2-1 (running version: 3.2-1/1933730b)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-2.6.32-24-pve: 2.6.32-111
pve-kernel-2.6.32-25-pve: 2.6.32-113
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-4
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

Code:

balloon: 1024
boot: c
bootdisk: virtio0
cores: 1
ide2: none,media=cdrom
machine: pc-i440fx-1.4
memory: 2048
name: Zabbix
net0: virtio=42:C3:BA:0E:09:A3,bridge=vmbr0
net1: virtio=7A:70:F8:7F:04:C8,bridge=vmbr1
ostype: l26
sockets: 2
vga: qxl
virtio0: stor2:vm-1070-disk-1,size=32G

Code:

Mar 18 09:12:52 pve-node3 pvedaemon[3057]: starting 1 worker(s)
Mar 18 09:12:52 pve-node3 pvedaemon[3057]: worker 523042 started
Mar 18 09:15:03 pve-node3 pvedaemon[522248]: <root@pam> starting task UPID:pve-node3:0007FBB4:03580D62:5327D657:vzdump::root@pam:
Mar 18 09:15:03 pve-node3 pvedaemon[523188]: INFO: starting new backup job: vzdump 1070 --remove 0 --mode snapshot --compress gzip --storage local --node pve-node3
Mar 18 09:15:04 pve-node3 pvedaemon[523188]: INFO: Starting Backup of VM 1070 (qemu)
Mar 18 09:15:04 pve-node3 qm[523193]: <root@pam> update VM 1070: -lock backup
Mar 18 09:16:05 pve-node3 kernel: vmbr0: port 6(tap1070i0) entering disabled state
Mar 18 09:16:05 pve-node3 kernel: vmbr0: port 6(tap1070i0) entering disabled state
Mar 18 09:16:05 pve-node3 kernel: vmbr1: port 3(tap1070i1) entering disabled state
Mar 18 09:16:05 pve-node3 kernel: vmbr1: port 3(tap1070i1) entering disabled state
Mar 18 09:16:05 pve-node3 pvedaemon[523188]: VM 1070 qmp command failed - VM 1070 not running
Mar 18 09:16:05 pve-node3 pvedaemon[523188]: VM 1070 qmp command failed - VM 1070 not running
Mar 18 09:16:06 pve-node3 pvedaemon[523188]: ERROR: Backup of VM 1070 failed - VM 1070 not running
Mar 18 09:16:06 pve-node3 pvedaemon[523188]: INFO: Backup job finished with errors
Mar 18 09:16:06 pve-node3 pvedaemon[523188]: job errors
Mar 18 09:16:06 pve-node3 pvedaemon[522248]: <root@pam> end task UPID:pve-node3:0007FBB4:03580D62:5327D657:vzdump::root@pam: job errors
Mar 18 09:16:08 pve-node3 ntpd[2528]: Deleting interface #62 tap1070i1, fe80::40a:69ff:fe74:a1bf#123, interface stats: received=0, sent=0, dropped=0, active_time=406 secs
Mar 18 09:16:08 pve-node3 ntpd[2528]: Deleting interface #61 tap1070i0, fe80::10d8:b1ff:fe4b:64dc#123, interface stats: received=0, sent=0, dropped=0, active_time=406 secs
Mar 18 09:16:08 pve-node3 ntpd[2528]: peers refreshed
Mar 18 09:16:27 pve-node3 pvedaemon[522248]: <root@pam> starting task UPID:pve-node3:0007FC1E:03582E19:5327D6AB:qmigrate:1070:root@pam:
Mar 18 09:16:28 pve-node3 pvedaemon[522248]: <root@pam> end task UPID:pve-node3:0007FC1E:03582E19:5327D6AB:qmigrate:1070:root@pam: OK
Mar 18 09:16:34 pve-node3 pmxcfs[2623]: [status] notice: received log
Mar 18 09:16:36 pve-node3 pmxcfs[2623]: [status] notice: received log
Mar 18 09:17:01 pve-node3 /USR/SBIN/CRON[523327]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar 18 09:21:18 pve-node3 pmxcfs[2623]: [status] notice: received log
Mar 18 09:21:19 pve-node3 pmxcfs[2623]: [status] notice: received log
Mar 18 09:21:24 pve-node3 pvedaemon[3057]: worker 522248 finished

adauc · Mar 18, 2014

Hello,

Same problem here, it's failed since update to 3.2 and reboot to kernel 2.6.32-27-pve. Backup is on local disk.

Code:

# pveversion -v
proxmox-ve-2.6.32: 3.2-121 (running kernel: 2.6.32-27-pve)
pve-manager: 3.2-1 (running version: 3.2-1/1933730b)
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-4
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

Code:

bootdisk: virtio0
cores: 2
ide2: none,media=cdrom
memory: 2048
name: monit
net0: virtio=A2:B6:45:66:C6:EC,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 1
virtio0: lvvh:vm-302-disk-1,cache=none,size=15364M

Code:

Mar 18 00:09:43 INFO: Starting Backup of VM 302 (qemu)
Mar 18 00:09:43 INFO: status = running
Mar 18 00:09:44 INFO: update VM 302: -lock backup
Mar 18 00:09:44 INFO: backup mode: snapshot
Mar 18 00:09:44 INFO: ionice priority: 7
Mar 18 00:09:44 INFO: creating archive '/srv/dump/vzdump-qemu-302-2014_03_18-00_09_43.vma.gz'
Mar 18 00:09:44 INFO: started backup task 'db81e999-2ca9-4a6a-bb42-8aec5a5c1fa2'
Mar 18 00:09:47 INFO: status: 1% (297795584/16110321664), sparse 1% (240988160), duration 3, 99/18 MB/s
Mar 18 00:09:50 INFO: status: 7% (1197342720/16110321664), sparse 6% (1043963904), duration 6, 299/32 MB/s
Mar 18 00:09:53 INFO: status: 8% (1318977536/16110321664), sparse 6% (1081192448), duration 9, 40/28 MB/s
Mar 18 00:09:59 INFO: status: 9% (1452015616/16110321664), sparse 6% (1082675200), duration 15, 22/21 MB/s
Mar 18 00:10:07 INFO: status: 10% (1626865664/16110321664), sparse 6% (1083707392), duration 23, 21/21 MB/s
Mar 18 00:10:15 INFO: status: 11% (1786511360/16110321664), sparse 6% (1086218240), duration 31, 19/19 MB/s
Mar 18 00:10:23 INFO: status: 12% (1949958144/16110321664), sparse 6% (1094348800), duration 39, 20/19 MB/s
Mar 18 00:10:31 INFO: status: 13% (2094399488/16110321664), sparse 6% (1097342976), duration 47, 18/17 MB/s
Mar 18 00:20:39 ERROR: VM 302 qmp command 'query-backup' failed - got timeout
Mar 18 00:20:39 INFO: aborting backup job
Mar 18 00:30:39 ERROR: VM 302 qmp command 'backup-cancel' failed - unable to connect to VM 302 socket - timeout after 5987 retries
Mar 18 00:30:39 ERROR: Backup of VM 302 failed - VM 302 qmp command 'query-backup' failed - got timeout

Code:

/dev/mapper/vg0-nfs on /srv/dump type ext4 (rw,relatime,barrier=1,data=ordered)

Thanks.

cesarpk · Mar 18, 2014

@gbr, @kaya, @oer2001, @jens.kuespert, @shartenauer:
@dietmar (if i am in the correct way):

I believe that the problem is a common scenery for all you, and please correct me if I'm wrong:

When the Backup are in process (Snapshot or other mode), and two or more PVE Hosts are doing backup to the same destination (HDD or RAID of disks), so the destination NAS/CIFS is saturated of many inputs, and PVE (vzdump) don't know how negotiate this situation.

Questions to the users:
- Is correct that when the error appear, many Backups are in progress to the same destination? (Please count each backup as simultaneous at the exact moment in that the error appear)

- If the question above is correct, how many Backups are in progress simultaneously?

- Please run "nfsstat -n -c" (or any tool useful) in each PVE Host while many Backups are in progress simultaneously and report the results (i think that this will show that we don't have to do several backups simultaneously to the same destination)

- Anybody had proved to do many Backups and in simultaneous, but with the difference of that for each PVE Host to have one disk (or RAID of disks) different as destination of each Backup?. or that is better: for each backup of each PVE Host to have one disk (or array) and physical network different?

To Dietmar:
How to reproduce the error?
Use a NFS shared with a link of 1 Gb/s with a single HDD SATA, and make Backup since several PVE Hosts (six or two for example) simultaneously.

Questions to Dietmar:
What do you think about of this? and/or do you have any suggestions?

Best regards
Cesar

dietmar · Mar 18, 2014

adauc said:
Same problem here, it's failed since update to 3.2 and reboot to kernel 2.6.32-27-pve. Backup is on local disk.

Just want to note that this is a totally different error (timeout instead of crash).

Kaya · Mar 18, 2014

cesarpk said:
@gbr, @kaya, @oer2001, @jens.kuespert, @shartenauer:
@dietmar (if i am in the correct way):

I believe that the problem is a common scenery for all you, and please correct me if I'm wrong:

When the Backup are in process (Snapshot or other mode), and two or more PVE Hosts are doing backup to the same destination (HDD or RAID of disks), so the destination NAS/CIFS is saturated of many inputs, and PVE (vzdump) don't know how negotiate this situation.

This is not my scenario.
My crash happened with only a single backup on NFS share (QNAP).
The VM simply power off (not a clean one, like a power loss).

For now, I workaround that problem by removing that VM from nightly snapshot and running other backup methods.

After update to 3.2: VM crashing during backup

Renowned Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Renowned Member

Renowned Member

Proxmox Staff Member

Member

Renowned Member

Renowned Member

Renowned Member

Member

Well-Known Member

Renowned Member

Renowned Member

New Member

Well-Known Member

Proxmox Staff Member

Member

We value your privacy