I'm trying to configure scheduled backups over NFS for my VM's with Proxmox. My smaller VM's back up successfully, but my larger VM's fail during the backup process - sample output of the backup job below.
1) There is sufficient space - 1.3tb free, while the VM in it's entirety is less than 100gb between the two virtual disks.
2) I am able to set the backup to local storage, then manually copy the backup file over via the command line.
3) The smaller VM always backs up successfully, while the larger one fails, but only over NFS. The time it takes for the larger backup to fail varies.
4) I've tried mounting the NFS share from my file server (a local bare metal ubuntu machine) with the noac option, but this did not resolve the problem.
I'm baffled.
Fileserver /etc/exports line: /mnt/pool *(rw,sync,no_root_squash,no_subtree_check,fsid=100)
PVE /etc/fstab line: 192.168.0.11:/mnt/pool /mnt/pool nfs auto,noac 0 0
I'd really like to get this figured out. I mean, worst case scenario, I can set the backup to local storage, then use a cronjob to move the file over, but that just seems needlessly clumsy and introduces failure points. I'd much rather understand where I'm going wrong here. Any ideas?
1) There is sufficient space - 1.3tb free, while the VM in it's entirety is less than 100gb between the two virtual disks.
2) I am able to set the backup to local storage, then manually copy the backup file over via the command line.
3) The smaller VM always backs up successfully, while the larger one fails, but only over NFS. The time it takes for the larger backup to fail varies.
4) I've tried mounting the NFS share from my file server (a local bare metal ubuntu machine) with the noac option, but this did not resolve the problem.
I'm baffled.
Fileserver /etc/exports line: /mnt/pool *(rw,sync,no_root_squash,no_subtree_check,fsid=100)
PVE /etc/fstab line: 192.168.0.11:/mnt/pool /mnt/pool nfs auto,noac 0 0
I'd really like to get this figured out. I mean, worst case scenario, I can set the backup to local storage, then use a cronjob to move the file over, but that just seems needlessly clumsy and introduces failure points. I'd much rather understand where I'm going wrong here. Any ideas?
Code:
INFO: starting new backup job: vzdump 100 101 --mailnotification always --mailto [myemailaddress] --mode stop --node pve --compress lzo --quiet 1 --storage NetworkStorage
INFO: Starting Backup of VM 100 (qemu)
INFO: status = running
INFO: update VM 100: -lock backup
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: plexServer
INFO: include disk 'scsi0' 'vmStorage:vm-100-disk-1' 32G
INFO: include disk 'scsi1' 'vmStorage:vm-100-disk-2' 320G
INFO: stopping vm
INFO: creating archive '/mnt/pve/NetworkStorage/dump/vzdump-qemu-100-2018_02_26-03_00_02.vma.lzo'
INFO: starting kvm to execute backup task
INFO: started backup task '7f727e8c-a8ea-4d44-9ea4-5bad497e1112'
INFO: resume VM
INFO: status: 0% (403243008/377957122048), sparse 0% (118853632), duration 3, read/write 134/94 MB/s
INFO: status: 1% (3834380288/377957122048), sparse 0% (1623560192), duration 67, read/write 53/30 MB/s
INFO: status: 2% (8410169344/377957122048), sparse 1% (6067716096), duration 79, read/write 381/10 MB/s
INFO: status: 3% (11443503104/377957122048), sparse 2% (8928538624), duration 83, read/write 758/43 MB/s
INFO: status: 4% (15196094464/377957122048), sparse 3% (12392062976), duration 89, read/write 625/48 MB/s
INFO: status: 5% (19370934272/377957122048), sparse 4% (15825666048), duration 112, read/write 181/32 MB/s
INFO: status: 6% (23114153984/377957122048), sparse 5% (19009376256), duration 120, read/write 467/69 MB/s
INFO: status: 7% (26858356736/377957122048), sparse 5% (22634692608), duration 125, read/write 748/23 MB/s
INFO: status: 8% (30482038784/377957122048), sparse 6% (26187677696), duration 143, read/write 201/3 MB/s
INFO: status: 9% (34367471616/377957122048), sparse 7% (30067789824), duration 147, read/write 971/1 MB/s
INFO: status: 10% (37867552768/377957122048), sparse 8% (30341197824), duration 207, read/write 58/53 MB/s
INFO: status: 11% (41694134272/377957122048), sparse 8% (30762512384), duration 361, read/write 24/22 MB/s
INFO: status: 12% (45429817344/377957122048), sparse 8% (31188344832), duration 457, read/write 38/34 MB/s
lzop: Stale file handle: <stdout>
INFO: status: 12% (48874520576/377957122048), sparse 8% (31438381056), duration 545, read/write 39/36 MB/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: vm is online again after 552 seconds
ERROR: Backup of VM 100 failed - vma_queue_write: write error - Broken pipe
INFO: Starting Backup of VM 101 (qemu)
INFO: status = running
INFO: update VM 101: -lock backup
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: webCrawler
INFO: include disk 'scsi0' 'vmStorage:vm-101-disk-1' 32G
INFO: stopping vm
INFO: creating archive '/mnt/pve/NetworkStorage/dump/vzdump-qemu-101-2018_02_26-03_09_20.vma.lzo'
INFO: starting kvm to execute backup task
INFO: started backup task '1564a72d-948b-4f99-b000-0381474ccdc6'
INFO: resume VM
INFO: status: 1% (357302272/34359738368), sparse 0% (126500864), duration 3, read/write 119/76 MB/s
INFO: status: 2% (755040256/34359738368), sparse 0% (143122432), duration 8, read/write 79/76 MB/s
INFO: status: 3% (1070727168/34359738368), sparse 0% (151437312), duration 13, read/write 63/61 MB/s
INFO: status: 4% (1428357120/34359738368), sparse 0% (267358208), duration 20, read/write 51/34 MB/s
INFO: status: 5% (1743716352/34359738368), sparse 1% (407330816), duration 23, read/write 105/58 MB/s
INFO: status: 6% (2109014016/34359738368), sparse 1% (503988224), duration 27, read/write 91/67 MB/s
INFO: status: 7% (2417491968/34359738368), sparse 1% (622784512), duration 62, read/write 8/5 MB/s
INFO: status: 8% (2783969280/34359738368), sparse 1% (623415296), duration 68, read/write 61/60 MB/s
INFO: status: 9% (3122135040/34359738368), sparse 2% (702287872), duration 71, read/write 112/86 MB/s
INFO: status: 10% (3703701504/34359738368), sparse 3% (1185955840), duration 74, read/write 193/32 MB/s
INFO: status: 12% (4329373696/34359738368), sparse 5% (1811628032), duration 77, read/write 208/0 MB/s
INFO: status: 13% (4542103552/34359738368), sparse 5% (1932492800), duration 80, read/write 70/30 MB/s
INFO: status: 14% (4873650176/34359738368), sparse 5% (1933082624), duration 85, read/write 66/66 MB/s
INFO: status: 15% (5440929792/34359738368), sparse 7% (2415685632), duration 88, read/write 189/28 MB/s
INFO: status: 18% (6497173504/34359738368), sparse 10% (3468468224), duration 91, read/write 352/1 MB/s
INFO: status: 19% (6686113792/34359738368), sparse 10% (3586326528), duration 94, read/write 62/23 MB/s
INFO: status: 20% (6967001088/34359738368), sparse 10% (3586961408), duration 111, read/write 16/16 MB/s
INFO: status: 21% (7329808384/34359738368), sparse 10% (3587817472), duration 114, read/write 120/120 MB/s
INFO: status: 22% (7596802048/34359738368), sparse 10% (3606831104), duration 117, read/write 88/82 MB/s
INFO: status: 23% (7922581504/34359738368), sparse 10% (3645980672), duration 120, read/write 108/95 MB/s
INFO: status: 24% (8307474432/34359738368), sparse 10% (3697135616), duration 160, read/write 9/8 MB/s
INFO: status: 25% (8594259968/34359738368), sparse 10% (3698282496), duration 164, read/write 71/71 MB/s
INFO: status: 26% (8949202944/34359738368), sparse 11% (3816570880), duration 168, read/write 88/59 MB/s
INFO: status: 27% (9410772992/34359738368), sparse 11% (4094967808), duration 172, read/write 115/45 MB/s
INFO: status: 31% (10941825024/34359738368), sparse 16% (5560971264), duration 175, read/write 510/21 MB/s
INFO: status: 32% (11090067456/34359738368), sparse 16% (5563375616), duration 178, read/write 49/48 MB/s
INFO: status: 33% (11341594624/34359738368), sparse 16% (5573902336), duration 181, read/write 83/80 MB/s
INFO: status: 37% (12942966784/34359738368), sparse 20% (7016206336), duration 184, read/write 533/53 MB/s
INFO: status: 38% (13261996032/34359738368), sparse 20% (7105642496), duration 187, read/write 106/76 MB/s
INFO: status: 41% (14322302976/34359738368), sparse 23% (8118611968), duration 210, read/write 46/2 MB/s
INFO: status: 44% (15308292096/34359738368), sparse 26% (8955867136), duration 213, read/write 328/49 MB/s
INFO: status: 45% (15612444672/34359738368), sparse 26% (8986820608), duration 216, read/write 101/91 MB/s
INFO: status: 46% (15896150016/34359738368), sparse 26% (8998309888), duration 226, read/write 28/27 MB/s
INFO: status: 53% (18368954368/34359738368), sparse 33% (11384152064), duration 229, read/write 824/28 MB/s
INFO: status: 59% (20312752128/34359738368), sparse 38% (13231075328), duration 232, read/write 647/32 MB/s
INFO: status: 65% (22568435712/34359738368), sparse 44% (15419068416), duration 235, read/write 751/22 MB/s
INFO: status: 72% (24896405504/34359738368), sparse 51% (17695322112), duration 238, read/write 775/17 MB/s
INFO: status: 77% (26552369152/34359738368), sparse 55% (19198238720), duration 241, read/write 551/51 MB/s
INFO: status: 82% (28200271872/34359738368), sparse 60% (20686041088), duration 244, read/write 549/53 MB/s
INFO: status: 87% (30218649600/34359738368), sparse 65% (22538141696), duration 247, read/write 672/55 MB/s
INFO: status: 88% (30317477888/34359738368), sparse 65% (22541053952), duration 250, read/write 32/31 MB/s
INFO: status: 89% (30682382336/34359738368), sparse 65% (22543253504), duration 265, read/write 24/24 MB/s
INFO: status: 90% (31153717248/34359738368), sparse 65% (22554607616), duration 268, read/write 157/153 MB/s
INFO: status: 92% (31895257088/34359738368), sparse 66% (22949347328), duration 271, read/write 247/115 MB/s
INFO: status: 100% (34359738368/34359738368), sparse 73% (25413828608), duration 274, read/write 821/0 MB/s
INFO: transferred 34359 MB in 274 seconds (125 MB/s)
INFO: archive file size: 4.71GB
INFO: delete old backup '/mnt/pve/NetworkStorage/dump/vzdump-qemu-101-2018_02_19-11_02_21.vma.lzo'
INFO: vm is online again after 315 seconds
INFO: Finished Backup of VM 101 (00:05:16)
INFO: Backup job finished with errors
TASK ERROR: job errors