VZDump backup failed

obrienmd

Member
Oct 14, 2009
109
0
16
For the last few weeks, my VZDump backups have been failing:

Feb 07 22:56:44 INFO: Starting Backup of VM 100 (qemu)
Feb 07 22:56:44 INFO: status = running
Feb 07 22:56:45 INFO: backup mode: snapshot
Feb 07 22:56:45 INFO: ionice priority: 7
Feb 07 22:56:45 INFO: creating archive '/backup/dump/vzdump-qemu-100-2013_02_07-22_56_44.vma.lzo'
Feb 07 22:56:45 INFO: started backup task '4c6705b2-8010-4cbd-819a-2c6c67e76f62'
Feb 07 22:56:48 INFO: status: 0% (369426432/107374182400), sparse 0% (40050688), duration 3, 123/109 MB/s
Feb 07 22:56:57 INFO: status: 1% (1186594816/107374182400), sparse 0% (85319680), duration 12, 90/85 MB/s
Feb 07 22:57:06 INFO: status: 2% (2167275520/107374182400), sparse 0% (301617152), duration 21, 108/84 MB/s
Feb 07 22:57:17 INFO: status: 3% (3300065280/107374182400), sparse 0% (377200640), duration 32, 102/96 MB/s
Feb 07 22:57:30 INFO: status: 3% (4280025088/107374182400), sparse 0% (410435584), duration 45, 75/72 MB/s
Feb 07 22:57:30 INFO: aborting backup job
Feb 07 22:57:31 ERROR: Backup of VM 100 failed - vma_co_write write error - Broken pipe

A quick search for vma_co_write write error - Broken pipe only found the source code that writes the error :)

dmesg does not show i/o error on either source lv/vg/pv or physical device, or backup target. No other strange system log errors either.
 
For the last few weeks, my VZDump backups have been failing:

Feb 07 22:56:44 INFO: Starting Backup of VM 100 (qemu)
Feb 07 22:56:44 INFO: status = running
Feb 07 22:56:45 INFO: backup mode: snapshot
Feb 07 22:56:45 INFO: ionice priority: 7
Feb 07 22:56:45 INFO: creating archive '/backup/dump/vzdump-qemu-100-2013_02_07-22_56_44.vma.lzo'
Feb 07 22:56:45 INFO: started backup task '4c6705b2-8010-4cbd-819a-2c6c67e76f62'
Feb 07 22:56:48 INFO: status: 0% (369426432/107374182400), sparse 0% (40050688), duration 3, 123/109 MB/s
Feb 07 22:56:57 INFO: status: 1% (1186594816/107374182400), sparse 0% (85319680), duration 12, 90/85 MB/s
Feb 07 22:57:06 INFO: status: 2% (2167275520/107374182400), sparse 0% (301617152), duration 21, 108/84 MB/s
Feb 07 22:57:17 INFO: status: 3% (3300065280/107374182400), sparse 0% (377200640), duration 32, 102/96 MB/s
Feb 07 22:57:30 INFO: status: 3% (4280025088/107374182400), sparse 0% (410435584), duration 45, 75/72 MB/s
Feb 07 22:57:30 INFO: aborting backup job
Feb 07 22:57:31 ERROR: Backup of VM 100 failed - vma_co_write write error - Broken pipe

A quick search for vma_co_write write error - Broken pipe only found the source code that writes the error :)

dmesg does not show i/o error on either source lv/vg/pv or physical device, or backup target. No other strange system log errors either.

Hi,
where is the /backup filesystem?
Can you post the output of
Code:
mount
vgs
lvs
df -k
cat /etc/pve/storage.cfg
Udo
 
Hi,
where is the /backup filesystem?
Can you post the output of
Code:
mount
vgs
lvs
df -k
cat /etc/pve/storage.cfg
Udo
Mount:
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/mapper/pve-data on /var/lib/vz type ext3 (rw)
/dev/sdc1 on /boot type ext3 (rw)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
beancounter on /proc/vz/beancounter type cgroup (rw,name=beancounter)
container on /proc/vz/container type cgroup (rw,name=container)
fairsched on /proc/vz/fairsched type cgroup (rw,name=fairsched)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,default_permissions,allow_other)
/dev/sdd1 on /backup type ext4 (rw)



VGS / LVS:
root@vmserver:~# vgs
VG #PV #LV #SN Attr VSize VFree
array 1 8 0 wz--n- 10.91t 4.18t
pve 1 3 0 wz--n- 36.77g 4.00g
ssdarray 1 1 0 wz--n- 223.52g 23.52g
root@vmserver:~# lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
vm-100-disk-1 array -wi-ao-- 100.00g
vm-102-disk-1 array -wi-a--- 200.00g
vm-103-disk-1 array -wi-ao-- 100.00g
vm-103-disk-2 array -wi-ao-- 5.86t
vm-104-disk-1 array -wi-ao-- 100.00g
vm-105-disk-1 array -wi-ao-- 100.00g
vm-106-disk-1 array -wi-ao-- 100.00g
vm-107-disk-1 array -wi-ao-- 200.00g
data pve -wi-ao-- 18.90g
root pve -wi-ao-- 9.25g
swap pve -wi-ao-- 4.62g
vm-102-disk-1 ssdarray -wi-ao-- 200.00g

DF:
root@vmserver:~# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/pve-root 9547112 7835432 1226716 87% /
tmpfs 16443360 0 16443360 0% /lib/init/rw
udev 16433532 256 16433276 1% /dev
tmpfs 16443360 15660 16427700 1% /dev/shm
/dev/mapper/pve-data 19505020 7676936 11828084 40% /var/lib/vz
/dev/sdc1 506724 222169 258392 47% /boot
/dev/fuse 30720 16 30704 1% /etc/pve
/dev/sdd1 1874416508 359728388 1419473280 21% /backup




storage.cfg:
dir: local
path /var/lib/vz
content images,iso,vztmpl,rootdir


lvm: array
vgname array
shared
content images


dir: backup
path /backup
shared
content images,backup
maxfiles 100


lvm: ssdarray
vgname ssdarray
shared
content images


dir: extra
path /extra
shared
content images,backup
maxfiles 100
 
Interesting... Just got past VM 100 to 102, but now 102 backup is hung (dat file is 38 bytes, no i/o or cpu to speak of but vzdump process is alive).

In syslog:
Feb 8 00:10:50 vmserver vzdump[849709]: VM 102 qmp command failed - VM 102 qmp command 'backup_cancel' failed - The command backup_cancel has not been found
 
Feb 8 00:10:50 vmserver vzdump[849709]: VM 102 qmp command failed - VM 102 qmp command 'backup_cancel' failed - The command backup_cancel has not been found

You use very new code for the pvetest repository. But the VM 102 was started using old kvm version - I guess you need to stop/start all VMs before using the new backup.
 
I've got the same issue, ironically on VMID 102 as well...

I've got daily backups scheduled and it started failing daily about a week and a half ago. The strange thing when start the backup manually and it does not fail.

The VM is a KVM living on a local SSD. It is being backed up to a local spinner. The other VM's on the box backup successful without error.

Any thoughts?

Detailed backup logs:

vzdump --mode snapshot --mailnotification always --compress lzo --quiet 1 --storage Local-Backups --mailto john@Kirker.com --node mox1 --all 1

102: May 05 21:00:02 INFO: Starting Backup of VM 102 (qemu)
102: May 05 21:00:02 INFO: status = running
102: May 05 21:00:03 INFO: update VM 102: -lock backup
102: May 05 21:00:03 INFO: backup mode: snapshot
102: May 05 21:00:03 INFO: ionice priority: 7
102: May 05 21:00:03 INFO: creating archive '/home/backups//dump/vzdump-qemu-102-2016_05_05-21_00_01.vma.lzo'
102: May 05 21:00:03 INFO: started backup task 'b41333b6-5d1a-4e4f-9309-3cf251552b99'
102: May 05 21:00:06 INFO: status: 0% (837812224/506806140928), sparse 0% (145879040), duration 3, 279/230 MB/s
102: May 05 21:00:27 INFO: status: 1% (5308153856/506806140928), sparse 0% (439717888), duration 24, 212/198 MB/s
102: May 05 21:00:53 INFO: status: 2% (10278141952/506806140928), sparse 0% (715440128), duration 50, 191/180 MB/s
102: May 05 21:01:44 INFO: status: 3% (15287975936/506806140928), sparse 0% (1111339008), duration 101, 98/90 MB/s
102: May 05 21:02:35 INFO: status: 4% (20339621888/506806140928), sparse 0% (1380786176), duration 152, 99/93 MB/s
102: May 05 21:03:25 INFO: status: 5% (25429278720/506806140928), sparse 0% (1656549376), duration 202, 101/96 MB/s
102: May 05 21:04:14 INFO: status: 6% (30427709440/506806140928), sparse 0% (2059886592), duration 251, 102/93 MB/s
102: May 05 21:05:03 INFO: status: 7% (35547774976/506806140928), sparse 0% (2337173504), duration 300, 104/98 MB/s
102: May 05 21:05:56 INFO: status: 8% (40610824192/506806140928), sparse 0% (2605821952), duration 353, 95/90 MB/s
102: May 05 21:06:47 INFO: status: 9% (45729054720/506806140928), sparse 0% (3008811008), duration 404, 100/92 MB/s
102: May 05 21:07:33 INFO: status: 10% (50736922624/506806140928), sparse 0% (3276771328), duration 450, 108/103 MB/s
102: May 05 21:08:18 INFO: status: 11% (55788568576/506806140928), sparse 0% (3550560256), duration 495, 112/106 MB/s
102: May 05 21:09:10 INFO: status: 12% (60906602496/506806140928), sparse 0% (3953504256), duration 547, 98/90 MB/s
102: May 05 21:10:02 INFO: status: 13% (65888059392/506806140928), sparse 0% (4222238720), duration 599, 95/90 MB/s
102: May 05 21:10:56 INFO: status: 14% (70984269824/506806140928), sparse 0% (4491096064), duration 653, 94/89 MB/s
102: May 05 21:11:44 INFO: status: 15% (76101582848/506806140928), sparse 0% (4896145408), duration 701, 106/98 MB/s
102: May 05 21:12:40 INFO: status: 16% (81111416832/506806140928), sparse 1% (5164466176), duration 757, 89/84 MB/s
102: May 05 21:13:31 INFO: status: 17% (86206840832/506806140928), sparse 1% (5536342016), duration 808, 99/92 MB/s
102: May 05 21:14:13 INFO: status: 18% (91302526976/506806140928), sparse 1% (5830938624), duration 850, 121/114 MB/s
102: May 05 21:15:04 INFO: status: 19% (96311967744/506806140928), sparse 1% (6093164544), duration 901, 98/93 MB/s
102: May 05 21:15:40 INFO: status: 20% (101472731136/506806140928), sparse 1% (6443229184), duration 937, 143/133 MB/s
102: May 05 21:16:22 INFO: status: 21% (106431905792/506806140928), sparse 1% (6695530496), duration 979, 118/112 MB/s
102: May 05 21:17:08 INFO: status: 22% (111516319744/506806140928), sparse 1% (6839517184), duration 1025, 110/107 MB/s
102: May 05 21:17:48 INFO: status: 23% (116670595072/506806140928), sparse 1% (6906003456), duration 1065, 128/127 MB/s
102: May 05 21:18:37 INFO: status: 24% (121672826880/506806140928), sparse 1% (6958305280), duration 1114, 102/101 MB/s
102: May 05 21:19:33 INFO: status: 25% (126716870656/506806140928), sparse 1% (7124127744), duration 1170, 90/87 MB/s
102: May 05 21:20:25 INFO: status: 26% (131814129664/506806140928), sparse 1% (7529201664), duration 1222, 98/90 MB/s
102: May 05 21:21:20 INFO: status: 27% (136839168000/506806140928), sparse 1% (7796559872), duration 1277, 91/86 MB/s
102: May 05 21:22:11 INFO: status: 28% (141906018304/506806140928), sparse 1% (8057335808), duration 1328, 99/94 MB/s
102: May 05 21:23:03 INFO: status: 29% (147010879488/506806140928), sparse 1% (8470269952), duration 1380, 98/90 MB/s
102: May 05 21:23:57 INFO: status: 30% (152047321088/506806140928), sparse 1% (8731222016), duration 1434, 93/88 MB/s
102: May 05 21:24:54 INFO: status: 31% (157125181440/506806140928), sparse 1% (9022672896), duration 1491, 89/83 MB/s
102: May 05 21:25:51 INFO: status: 32% (162196226048/506806140928), sparse 1% (9351266304), duration 1548, 88/83 MB/s
102: May 05 21:26:46 INFO: status: 33% (167263076352/506806140928), sparse 1% (9611403264), duration 1603, 92/87 MB/s
102: May 05 21:27:40 INFO: status: 34% (172364136448/506806140928), sparse 1% (10029056000), duration 1657, 94/86 MB/s
102: May 05 21:28:36 INFO: status: 35% (177434787840/506806140928), sparse 2% (10417410048), duration 1713, 90/83 MB/s
102: May 05 21:29:30 INFO: status: 36% (182452224000/506806140928), sparse 2% (10863243264), duration 1767, 92/84 MB/s
102: May 05 21:30:24 INFO: status: 37% (187595096064/506806140928), sparse 2% (11400609792), duration 1821, 95/85 MB/s
102: May 05 21:31:18 INFO: status: 38% (192705134592/506806140928), sparse 2% (11661250560), duration 1875, 94/89 MB/s
102: May 05 21:32:14 INFO: status: 39% (197675581440/506806140928), sparse 2% (11902722048), duration 1931, 88/84 MB/s
102: May 05 21:33:12 INFO: status: 40% (202780442624/506806140928), sparse 2% (12118155264), duration 1989, 88/84 MB/s
102: May 05 21:34:04 INFO: status: 41% (207862497280/506806140928), sparse 2% (12171874304), duration 2041, 97/96 MB/s
102: May 05 21:34:44 INFO: status: 42% (212944551936/506806140928), sparse 2% (12323217408), duration 2081, 127/123 MB/s
102: May 05 21:35:33 INFO: status: 43% (217965789184/506806140928), sparse 2% (12498477056), duration 2130, 102/98 MB/s
102: May 05 21:36:18 INFO: status: 44% (223120719872/506806140928), sparse 2% (12706131968), duration 2175, 114/109 MB/s
102: May 05 21:37:03 INFO: status: 45% (228096278528/506806140928), sparse 2% (12953321472), duration 2220, 110/105 MB/s
102: May 05 21:37:46 INFO: status: 46% (233146417152/506806140928), sparse 2% (13179830272), duration 2263, 117/112 MB/s
102: May 05 21:38:41 INFO: status: 47% (238214184960/506806140928), sparse 2% (13433839616), duration 2318, 92/87 MB/s
102: May 05 21:39:40 INFO: status: 48% (243421675520/506806140928), sparse 2% (13809823744), duration 2377, 88/81 MB/s
102: May 05 21:40:39 INFO: status: 49% (248403132416/506806140928), sparse 2% (14076190720), duration 2436, 84/79 MB/s
102: May 05 21:41:39 INFO: status: 50% (253464150016/506806140928), sparse 2% (14344683520), duration 2496, 84/79 MB/s
102: May 05 21:42:36 INFO: status: 51% (258504392704/506806140928), sparse 2% (14748102656), duration 2553, 88/81 MB/s
102: May 05 21:43:37 INFO: status: 52% (263546404864/506806140928), sparse 2% (15017201664), duration 2614, 82/78 MB/s
102: May 05 21:44:37 INFO: status: 53% (268664700928/506806140928), sparse 3% (15286808576), duration 2674, 85/80 MB/s
102: May 05 21:45:45 INFO: status: 54% (273780965376/506806140928), sparse 3% (15690694656), duration 2742, 75/69 MB/s
102: May 05 21:46:37 INFO: status: 55% (278876979200/506806140928), sparse 3% (16033222656), duration 2794, 98/91 MB/s
102: May 05 21:47:23 INFO: status: 56% (284036300800/506806140928), sparse 3% (16496467968), duration 2840, 112/102 MB/s
102: May 05 21:48:09 INFO: status: 57% (288901693440/506806140928), sparse 3% (16792109056), duration 2886, 105/99 MB/s
102: May 05 21:48:54 INFO: status: 58% (293976145920/506806140928), sparse 3% (17097629696), duration 2931, 112/105 MB/s
102: May 05 21:49:50 INFO: status: 59% (299027791872/506806140928), sparse 3% (17508388864), duration 2987, 90/82 MB/s
102: May 05 21:50:34 INFO: status: 60% (304121249792/506806140928), sparse 3% (17793187840), duration 3031, 115/109 MB/s
102: May 05 21:50:59 INFO: status: 61% (309351743488/506806140928), sparse 3% (18069798912), duration 3056, 209/198 MB/s
102: May 05 21:51:17 INFO: status: 62% (314508050432/506806140928), sparse 4% (21324787712), duration 3074, 286/105 MB/s
102: May 05 21:51:30 INFO: status: 63% (319455363072/506806140928), sparse 4% (25035390976), duration 3087, 380/95 MB/s
102: May 05 21:51:55 INFO: status: 64% (324381048832/506806140928), sparse 4% (25283371008), duration 3112, 197/187 MB/s
102: May 05 21:52:51 INFO: status: 65% (329434595328/506806140928), sparse 5% (25646051328), duration 3168, 90/83 MB/s
102: May 05 21:53:17 INFO: status: 66% (334586970112/506806140928), sparse 5% (25904549888), duration 3194, 198/188 MB/s
102: May 05 21:54:09 INFO: status: 67% (339780894720/506806140928), sparse 5% (26283520000), duration 3246, 99/92 MB/s
102: May 05 21:54:26 INFO: status: 67% (342101721088/506806140928), sparse 5% (26420785152), duration 3263, 136/128 MB/s
102: May 05 21:54:26 ERROR: vma_queue_write: write error - Broken pipe
102: May 05 21:54:26 INFO: aborting backup job
102: May 05 21:55:21 ERROR: Backup of VM 102 failed - vma_queue_write: write error - Broken pipe
 
I am having the same issue also, 2 VMs backup just fine 100% always, but one always fails with the same vma_queue write error at around 75%. Backing up to local 5TB sata drive. Running PVE 4.2-15/6669ad2c kernel: 4.4.13-1-pve
 
I was also experiencing this problem within PVE 4.4.
I found out, that this can also happen if your VM has pending changes that need a shutdown/restart of the VM. In my case the VMs memory settings were changed and shown in red in the GUI. Even though there was enough space on the backup volume I got the vma_queue_write error. After VM shutdown and reboot the backup ran smoothly again.