VZDump backup failed

obrienmd

Member
Oct 14, 2009
109
0
16
For the last few weeks, my VZDump backups have been failing:

Feb 07 22:56:44 INFO: Starting Backup of VM 100 (qemu)
Feb 07 22:56:44 INFO: status = running
Feb 07 22:56:45 INFO: backup mode: snapshot
Feb 07 22:56:45 INFO: ionice priority: 7
Feb 07 22:56:45 INFO: creating archive '/backup/dump/vzdump-qemu-100-2013_02_07-22_56_44.vma.lzo'
Feb 07 22:56:45 INFO: started backup task '4c6705b2-8010-4cbd-819a-2c6c67e76f62'
Feb 07 22:56:48 INFO: status: 0% (369426432/107374182400), sparse 0% (40050688), duration 3, 123/109 MB/s
Feb 07 22:56:57 INFO: status: 1% (1186594816/107374182400), sparse 0% (85319680), duration 12, 90/85 MB/s
Feb 07 22:57:06 INFO: status: 2% (2167275520/107374182400), sparse 0% (301617152), duration 21, 108/84 MB/s
Feb 07 22:57:17 INFO: status: 3% (3300065280/107374182400), sparse 0% (377200640), duration 32, 102/96 MB/s
Feb 07 22:57:30 INFO: status: 3% (4280025088/107374182400), sparse 0% (410435584), duration 45, 75/72 MB/s
Feb 07 22:57:30 INFO: aborting backup job
Feb 07 22:57:31 ERROR: Backup of VM 100 failed - vma_co_write write error - Broken pipe

A quick search for vma_co_write write error - Broken pipe only found the source code that writes the error :)

dmesg does not show i/o error on either source lv/vg/pv or physical device, or backup target. No other strange system log errors either.
 
For the last few weeks, my VZDump backups have been failing:

Feb 07 22:56:44 INFO: Starting Backup of VM 100 (qemu)
Feb 07 22:56:44 INFO: status = running
Feb 07 22:56:45 INFO: backup mode: snapshot
Feb 07 22:56:45 INFO: ionice priority: 7
Feb 07 22:56:45 INFO: creating archive '/backup/dump/vzdump-qemu-100-2013_02_07-22_56_44.vma.lzo'
Feb 07 22:56:45 INFO: started backup task '4c6705b2-8010-4cbd-819a-2c6c67e76f62'
Feb 07 22:56:48 INFO: status: 0% (369426432/107374182400), sparse 0% (40050688), duration 3, 123/109 MB/s
Feb 07 22:56:57 INFO: status: 1% (1186594816/107374182400), sparse 0% (85319680), duration 12, 90/85 MB/s
Feb 07 22:57:06 INFO: status: 2% (2167275520/107374182400), sparse 0% (301617152), duration 21, 108/84 MB/s
Feb 07 22:57:17 INFO: status: 3% (3300065280/107374182400), sparse 0% (377200640), duration 32, 102/96 MB/s
Feb 07 22:57:30 INFO: status: 3% (4280025088/107374182400), sparse 0% (410435584), duration 45, 75/72 MB/s
Feb 07 22:57:30 INFO: aborting backup job
Feb 07 22:57:31 ERROR: Backup of VM 100 failed - vma_co_write write error - Broken pipe

A quick search for vma_co_write write error - Broken pipe only found the source code that writes the error :)

dmesg does not show i/o error on either source lv/vg/pv or physical device, or backup target. No other strange system log errors either.

Hi,
where is the /backup filesystem?
Can you post the output of
Code:
mount
vgs
lvs
df -k
cat /etc/pve/storage.cfg
Udo
 
Hi,
where is the /backup filesystem?
Can you post the output of
Code:
mount
vgs
lvs
df -k
cat /etc/pve/storage.cfg
Udo
Mount:
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/mapper/pve-data on /var/lib/vz type ext3 (rw)
/dev/sdc1 on /boot type ext3 (rw)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
beancounter on /proc/vz/beancounter type cgroup (rw,name=beancounter)
container on /proc/vz/container type cgroup (rw,name=container)
fairsched on /proc/vz/fairsched type cgroup (rw,name=fairsched)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,default_permissions,allow_other)
/dev/sdd1 on /backup type ext4 (rw)



VGS / LVS:
root@vmserver:~# vgs
VG #PV #LV #SN Attr VSize VFree
array 1 8 0 wz--n- 10.91t 4.18t
pve 1 3 0 wz--n- 36.77g 4.00g
ssdarray 1 1 0 wz--n- 223.52g 23.52g
root@vmserver:~# lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
vm-100-disk-1 array -wi-ao-- 100.00g
vm-102-disk-1 array -wi-a--- 200.00g
vm-103-disk-1 array -wi-ao-- 100.00g
vm-103-disk-2 array -wi-ao-- 5.86t
vm-104-disk-1 array -wi-ao-- 100.00g
vm-105-disk-1 array -wi-ao-- 100.00g
vm-106-disk-1 array -wi-ao-- 100.00g
vm-107-disk-1 array -wi-ao-- 200.00g
data pve -wi-ao-- 18.90g
root pve -wi-ao-- 9.25g
swap pve -wi-ao-- 4.62g
vm-102-disk-1 ssdarray -wi-ao-- 200.00g

DF:
root@vmserver:~# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/pve-root 9547112 7835432 1226716 87% /
tmpfs 16443360 0 16443360 0% /lib/init/rw
udev 16433532 256 16433276 1% /dev
tmpfs 16443360 15660 16427700 1% /dev/shm
/dev/mapper/pve-data 19505020 7676936 11828084 40% /var/lib/vz
/dev/sdc1 506724 222169 258392 47% /boot
/dev/fuse 30720 16 30704 1% /etc/pve
/dev/sdd1 1874416508 359728388 1419473280 21% /backup




storage.cfg:
dir: local
path /var/lib/vz
content images,iso,vztmpl,rootdir


lvm: array
vgname array
shared
content images


dir: backup
path /backup
shared
content images,backup
maxfiles 100


lvm: ssdarray
vgname ssdarray
shared
content images


dir: extra
path /extra
shared
content images,backup
maxfiles 100
 
Interesting... Just got past VM 100 to 102, but now 102 backup is hung (dat file is 38 bytes, no i/o or cpu to speak of but vzdump process is alive).

In syslog:
Feb 8 00:10:50 vmserver vzdump[849709]: VM 102 qmp command failed - VM 102 qmp command 'backup_cancel' failed - The command backup_cancel has not been found
 
Feb 8 00:10:50 vmserver vzdump[849709]: VM 102 qmp command failed - VM 102 qmp command 'backup_cancel' failed - The command backup_cancel has not been found

You use very new code for the pvetest repository. But the VM 102 was started using old kvm version - I guess you need to stop/start all VMs before using the new backup.
 
I've got the same issue, ironically on VMID 102 as well...

I've got daily backups scheduled and it started failing daily about a week and a half ago. The strange thing when start the backup manually and it does not fail.

The VM is a KVM living on a local SSD. It is being backed up to a local spinner. The other VM's on the box backup successful without error.

Any thoughts?

Detailed backup logs:

vzdump --mode snapshot --mailnotification always --compress lzo --quiet 1 --storage Local-Backups --mailto john@Kirker.com --node mox1 --all 1

102: May 05 21:00:02 INFO: Starting Backup of VM 102 (qemu)
102: May 05 21:00:02 INFO: status = running
102: May 05 21:00:03 INFO: update VM 102: -lock backup
102: May 05 21:00:03 INFO: backup mode: snapshot
102: May 05 21:00:03 INFO: ionice priority: 7
102: May 05 21:00:03 INFO: creating archive '/home/backups//dump/vzdump-qemu-102-2016_05_05-21_00_01.vma.lzo'
102: May 05 21:00:03 INFO: started backup task 'b41333b6-5d1a-4e4f-9309-3cf251552b99'
102: May 05 21:00:06 INFO: status: 0% (837812224/506806140928), sparse 0% (145879040), duration 3, 279/230 MB/s
102: May 05 21:00:27 INFO: status: 1% (5308153856/506806140928), sparse 0% (439717888), duration 24, 212/198 MB/s
102: May 05 21:00:53 INFO: status: 2% (10278141952/506806140928), sparse 0% (715440128), duration 50, 191/180 MB/s
102: May 05 21:01:44 INFO: status: 3% (15287975936/506806140928), sparse 0% (1111339008), duration 101, 98/90 MB/s
102: May 05 21:02:35 INFO: status: 4% (20339621888/506806140928), sparse 0% (1380786176), duration 152, 99/93 MB/s
102: May 05 21:03:25 INFO: status: 5% (25429278720/506806140928), sparse 0% (1656549376), duration 202, 101/96 MB/s
102: May 05 21:04:14 INFO: status: 6% (30427709440/506806140928), sparse 0% (2059886592), duration 251, 102/93 MB/s
102: May 05 21:05:03 INFO: status: 7% (35547774976/506806140928), sparse 0% (2337173504), duration 300, 104/98 MB/s
102: May 05 21:05:56 INFO: status: 8% (40610824192/506806140928), sparse 0% (2605821952), duration 353, 95/90 MB/s
102: May 05 21:06:47 INFO: status: 9% (45729054720/506806140928), sparse 0% (3008811008), duration 404, 100/92 MB/s
102: May 05 21:07:33 INFO: status: 10% (50736922624/506806140928), sparse 0% (3276771328), duration 450, 108/103 MB/s
102: May 05 21:08:18 INFO: status: 11% (55788568576/506806140928), sparse 0% (3550560256), duration 495, 112/106 MB/s
102: May 05 21:09:10 INFO: status: 12% (60906602496/506806140928), sparse 0% (3953504256), duration 547, 98/90 MB/s
102: May 05 21:10:02 INFO: status: 13% (65888059392/506806140928), sparse 0% (4222238720), duration 599, 95/90 MB/s
102: May 05 21:10:56 INFO: status: 14% (70984269824/506806140928), sparse 0% (4491096064), duration 653, 94/89 MB/s
102: May 05 21:11:44 INFO: status: 15% (76101582848/506806140928), sparse 0% (4896145408), duration 701, 106/98 MB/s
102: May 05 21:12:40 INFO: status: 16% (81111416832/506806140928), sparse 1% (5164466176), duration 757, 89/84 MB/s
102: May 05 21:13:31 INFO: status: 17% (86206840832/506806140928), sparse 1% (5536342016), duration 808, 99/92 MB/s
102: May 05 21:14:13 INFO: status: 18% (91302526976/506806140928), sparse 1% (5830938624), duration 850, 121/114 MB/s
102: May 05 21:15:04 INFO: status: 19% (96311967744/506806140928), sparse 1% (6093164544), duration 901, 98/93 MB/s
102: May 05 21:15:40 INFO: status: 20% (101472731136/506806140928), sparse 1% (6443229184), duration 937, 143/133 MB/s
102: May 05 21:16:22 INFO: status: 21% (106431905792/506806140928), sparse 1% (6695530496), duration 979, 118/112 MB/s
102: May 05 21:17:08 INFO: status: 22% (111516319744/506806140928), sparse 1% (6839517184), duration 1025, 110/107 MB/s
102: May 05 21:17:48 INFO: status: 23% (116670595072/506806140928), sparse 1% (6906003456), duration 1065, 128/127 MB/s
102: May 05 21:18:37 INFO: status: 24% (121672826880/506806140928), sparse 1% (6958305280), duration 1114, 102/101 MB/s
102: May 05 21:19:33 INFO: status: 25% (126716870656/506806140928), sparse 1% (7124127744), duration 1170, 90/87 MB/s
102: May 05 21:20:25 INFO: status: 26% (131814129664/506806140928), sparse 1% (7529201664), duration 1222, 98/90 MB/s
102: May 05 21:21:20 INFO: status: 27% (136839168000/506806140928), sparse 1% (7796559872), duration 1277, 91/86 MB/s
102: May 05 21:22:11 INFO: status: 28% (141906018304/506806140928), sparse 1% (8057335808), duration 1328, 99/94 MB/s
102: May 05 21:23:03 INFO: status: 29% (147010879488/506806140928), sparse 1% (8470269952), duration 1380, 98/90 MB/s
102: May 05 21:23:57 INFO: status: 30% (152047321088/506806140928), sparse 1% (8731222016), duration 1434, 93/88 MB/s
102: May 05 21:24:54 INFO: status: 31% (157125181440/506806140928), sparse 1% (9022672896), duration 1491, 89/83 MB/s
102: May 05 21:25:51 INFO: status: 32% (162196226048/506806140928), sparse 1% (9351266304), duration 1548, 88/83 MB/s
102: May 05 21:26:46 INFO: status: 33% (167263076352/506806140928), sparse 1% (9611403264), duration 1603, 92/87 MB/s
102: May 05 21:27:40 INFO: status: 34% (172364136448/506806140928), sparse 1% (10029056000), duration 1657, 94/86 MB/s
102: May 05 21:28:36 INFO: status: 35% (177434787840/506806140928), sparse 2% (10417410048), duration 1713, 90/83 MB/s
102: May 05 21:29:30 INFO: status: 36% (182452224000/506806140928), sparse 2% (10863243264), duration 1767, 92/84 MB/s
102: May 05 21:30:24 INFO: status: 37% (187595096064/506806140928), sparse 2% (11400609792), duration 1821, 95/85 MB/s
102: May 05 21:31:18 INFO: status: 38% (192705134592/506806140928), sparse 2% (11661250560), duration 1875, 94/89 MB/s
102: May 05 21:32:14 INFO: status: 39% (197675581440/506806140928), sparse 2% (11902722048), duration 1931, 88/84 MB/s
102: May 05 21:33:12 INFO: status: 40% (202780442624/506806140928), sparse 2% (12118155264), duration 1989, 88/84 MB/s
102: May 05 21:34:04 INFO: status: 41% (207862497280/506806140928), sparse 2% (12171874304), duration 2041, 97/96 MB/s
102: May 05 21:34:44 INFO: status: 42% (212944551936/506806140928), sparse 2% (12323217408), duration 2081, 127/123 MB/s
102: May 05 21:35:33 INFO: status: 43% (217965789184/506806140928), sparse 2% (12498477056), duration 2130, 102/98 MB/s
102: May 05 21:36:18 INFO: status: 44% (223120719872/506806140928), sparse 2% (12706131968), duration 2175, 114/109 MB/s
102: May 05 21:37:03 INFO: status: 45% (228096278528/506806140928), sparse 2% (12953321472), duration 2220, 110/105 MB/s
102: May 05 21:37:46 INFO: status: 46% (233146417152/506806140928), sparse 2% (13179830272), duration 2263, 117/112 MB/s
102: May 05 21:38:41 INFO: status: 47% (238214184960/506806140928), sparse 2% (13433839616), duration 2318, 92/87 MB/s
102: May 05 21:39:40 INFO: status: 48% (243421675520/506806140928), sparse 2% (13809823744), duration 2377, 88/81 MB/s
102: May 05 21:40:39 INFO: status: 49% (248403132416/506806140928), sparse 2% (14076190720), duration 2436, 84/79 MB/s
102: May 05 21:41:39 INFO: status: 50% (253464150016/506806140928), sparse 2% (14344683520), duration 2496, 84/79 MB/s
102: May 05 21:42:36 INFO: status: 51% (258504392704/506806140928), sparse 2% (14748102656), duration 2553, 88/81 MB/s
102: May 05 21:43:37 INFO: status: 52% (263546404864/506806140928), sparse 2% (15017201664), duration 2614, 82/78 MB/s
102: May 05 21:44:37 INFO: status: 53% (268664700928/506806140928), sparse 3% (15286808576), duration 2674, 85/80 MB/s
102: May 05 21:45:45 INFO: status: 54% (273780965376/506806140928), sparse 3% (15690694656), duration 2742, 75/69 MB/s
102: May 05 21:46:37 INFO: status: 55% (278876979200/506806140928), sparse 3% (16033222656), duration 2794, 98/91 MB/s
102: May 05 21:47:23 INFO: status: 56% (284036300800/506806140928), sparse 3% (16496467968), duration 2840, 112/102 MB/s
102: May 05 21:48:09 INFO: status: 57% (288901693440/506806140928), sparse 3% (16792109056), duration 2886, 105/99 MB/s
102: May 05 21:48:54 INFO: status: 58% (293976145920/506806140928), sparse 3% (17097629696), duration 2931, 112/105 MB/s
102: May 05 21:49:50 INFO: status: 59% (299027791872/506806140928), sparse 3% (17508388864), duration 2987, 90/82 MB/s
102: May 05 21:50:34 INFO: status: 60% (304121249792/506806140928), sparse 3% (17793187840), duration 3031, 115/109 MB/s
102: May 05 21:50:59 INFO: status: 61% (309351743488/506806140928), sparse 3% (18069798912), duration 3056, 209/198 MB/s
102: May 05 21:51:17 INFO: status: 62% (314508050432/506806140928), sparse 4% (21324787712), duration 3074, 286/105 MB/s
102: May 05 21:51:30 INFO: status: 63% (319455363072/506806140928), sparse 4% (25035390976), duration 3087, 380/95 MB/s
102: May 05 21:51:55 INFO: status: 64% (324381048832/506806140928), sparse 4% (25283371008), duration 3112, 197/187 MB/s
102: May 05 21:52:51 INFO: status: 65% (329434595328/506806140928), sparse 5% (25646051328), duration 3168, 90/83 MB/s
102: May 05 21:53:17 INFO: status: 66% (334586970112/506806140928), sparse 5% (25904549888), duration 3194, 198/188 MB/s
102: May 05 21:54:09 INFO: status: 67% (339780894720/506806140928), sparse 5% (26283520000), duration 3246, 99/92 MB/s
102: May 05 21:54:26 INFO: status: 67% (342101721088/506806140928), sparse 5% (26420785152), duration 3263, 136/128 MB/s
102: May 05 21:54:26 ERROR: vma_queue_write: write error - Broken pipe
102: May 05 21:54:26 INFO: aborting backup job
102: May 05 21:55:21 ERROR: Backup of VM 102 failed - vma_queue_write: write error - Broken pipe
 
I am having the same issue also, 2 VMs backup just fine 100% always, but one always fails with the same vma_queue write error at around 75%. Backing up to local 5TB sata drive. Running PVE 4.2-15/6669ad2c kernel: 4.4.13-1-pve
 
I was also experiencing this problem within PVE 4.4.
I found out, that this can also happen if your VM has pending changes that need a shutdown/restart of the VM. In my case the VMs memory settings were changed and shown in red in the GUI. Even though there was enough space on the backup volume I got the vma_queue_write error. After VM shutdown and reboot the backup ran smoothly again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!