twice in 3 weeks at least one node had failed backups.
this thread showed a similar situation: https://forum.proxmox.com/threads/proxmox-ve-vzdump-backup-of-vms.54291/post-250155
there are about 10 vm's on the node. the 1st kvm backed up ok:
then the remainder failed :
the next 3 nodes completed backups with no errors.
I schedule backups one hour apart so there is not a network overload. I'll try to debug this later on.
Next I'll schedule the node to do another backup in a few minutes.
Edit: the next backup worked.
Any clues to prevent this or debug?
this thread showed a similar situation: https://forum.proxmox.com/threads/proxmox-ve-vzdump-backup-of-vms.54291/post-250155
there are about 10 vm's on the node. the 1st kvm backed up ok:
Code:
INFO: status: 98% (21063794688/21474836480), sparse 68% (14780002304), duration 84, read/write 2364/1 MB/s
INFO: status: 100% (21474836480/21474836480), sparse 70% (15191044096), duration 85, read/write 411/0 MB/s
INFO: transferred 21474 MB in 85 seconds (252 MB/s)
INFO: stopping kvm after backup task
INFO: archive file size: 2.63GB
INFO: delete old backup '/mnt/pve/bkup-nfs/dump/vzdump-qemu-106-2019_08_03-00_00_02.vma.lzo'
INFO: Finished Backup of VM 106 (00:01:28)
INFO: Backup finished at 2019-08-24 00:01:30
INFO: Starting Backup of VM 10782 (qemu)
then the remainder failed :
Code:
INFO: Backup finished at 2019-08-24 00:01:30
INFO: Starting Backup of VM 10782 (qemu)
INFO: Backup started at 2019-08-24 00:01:30
INFO: status = running
INFO: update VM 10782: -lock backup
INFO: VM Name: Kwiklok
INFO: include disk 'ide0' 'rbd_ssd:vm-10782-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 5
INFO: creating archive '/mnt/pve/bkup-nfs/dump/vzdump-qemu-10782-2019_08_24-00_01_30.vma.lzo'
INFO: started backup task 'b3f76c26-8da9-483d-ab3e-c62b3a9906ce'
INFO: status: 1% (509345792/34359738368), sparse 0% (186986496), duration 3, read/write 169/107 MB/s
INFO: status: 2% (851443712/34359738368), sparse 0% (206741504), duration 6, read/write 114/107 MB/s
INFO: status: 3% (1182793728/34359738368), sparse 0% (215281664), duration 9, read/write 110/107 MB/s
INFO: status: 4% (1606418432/34359738368), sparse 0% (230178816), duration 12, read/write 141/136 MB/s
INFO: status: 5% (1950351360/34359738368), sparse 0% (243765248), duration 15, read/write 114/110 MB/s
INFO: status: 6% (2326265856/34359738368), sparse 0% (275148800), duration 18, read/write 125/114 MB/s
INFO: status: 7% (2497314816/34359738368), sparse 0% (294576128), duration 21, read/write 57/50 MB/s
INFO: status: 8% (2862219264/34359738368), sparse 1% (357285888), duration 48, read/write 13/11 MB/s
INFO: status: 9% (3229614080/34359738368), sparse 1% (363343872), duration 51, read/write 122/120 MB/s
INFO: status: 10% (3607101440/34359738368), sparse 1% (377077760), duration 54, read/write 125/121 MB/s
INFO: status: 11% (3997171712/34359738368), sparse 1% (392585216), duration 57, read/write 130/124 MB/s
INFO: status: 12% (4356046848/34359738368), sparse 1% (408723456), duration 60, read/write 119/114 MB/s
INFO: status: 13% (4739956736/34359738368), sparse 1% (428412928), duration 63, read/write 127/121 MB/s
INFO: status: 14% (5051645952/34359738368), sparse 1% (456728576), duration 66, read/write 103/94 MB/s
INFO: status: 15% (5188354048/34359738368), sparse 1% (462778368), duration 74, read/write 17/16 MB/s
INFO: status: 16% (5570035712/34359738368), sparse 1% (528830464), duration 78, read/write 95/78 MB/s
INFO: status: 17% (5963907072/34359738368), sparse 1% (563154944), duration 81, read/write 131/119 MB/s
INFO: status: 18% (6302203904/34359738368), sparse 1% (607698944), duration 84, read/write 112/97 MB/s
INFO: status: 19% (6694109184/34359738368), sparse 1% (647401472), duration 87, read/write 130/117 MB/s
INFO: status: 20% (7032012800/34359738368), sparse 2% (688037888), duration 90, read/write 112/99 MB/s
INFO: status: 21% (7381975040/34359738368), sparse 2% (720273408), duration 93, read/write 116/105 MB/s
INFO: status: 22% (7742816256/34359738368), sparse 2% (764256256), duration 96, read/write 120/105 MB/s
INFO: status: 23% (8023703552/34359738368), sparse 2% (784474112), duration 99, read/write 93/86 MB/s
INFO: status: 24% (8430551040/34359738368), sparse 2% (835563520), duration 102, read/write 135/118 MB/s
INFO: status: 25% (8749318144/34359738368), sparse 2% (860991488), duration 105, read/write 106/97 MB/s
INFO: status: 26% (9092202496/34359738368), sparse 2% (900317184), duration 108, read/write 114/101 MB/s
INFO: status: 27% (9361686528/34359738368), sparse 2% (935059456), duration 111, read/write 89/78 MB/s
INFO: status: 28% (9639559168/34359738368), sparse 2% (1011642368), duration 114, read/write 92/67 MB/s
INFO: status: 29% (10078912512/34359738368), sparse 3% (1134518272), duration 117, read/write 146/105 MB/s
INFO: status: 30% (10433986560/34359738368), sparse 3% (1175810048), duration 121, read/write 88/78 MB/s
INFO: status: 31% (10715267072/34359738368), sparse 3% (1234259968), duration 124, read/write 93/74 MB/s
INFO: status: 32% (11072569344/34359738368), sparse 3% (1278296064), duration 127, read/write 119/104 MB/s
INFO: status: 33% (11492392960/34359738368), sparse 3% (1360261120), duration 132, read/write 83/67 MB/s
ERROR: unable to find configuration file for VM 10782 - no such machine
INFO: aborting backup job
ERROR: unable to find configuration file for VM 10782 - no such machine
INFO: ipcc_send_rec[1] failed: Connection refused
INFO: ipcc_send_rec[2] failed: Connection refused
INFO: ipcc_send_rec[3] failed: Connection refused
INFO: Unable to load access control list: Connection refused
command 'qm unlock 10782' failed: exit code 111
ERROR: Backup of VM 10782 failed - unable to find configuration file for VM 10782 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 108 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 108 failed - unable to find configuration file for VM 108 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 111 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 111 failed - unable to find configuration file for VM 111 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 114 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 114 failed - unable to find configuration file for VM 114 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 445 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 445 failed - unable to find configuration file for VM 445 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 900 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 900 failed - unable to find configuration file for VM 900 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = running
ERROR: Backup of VM 102 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 104 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 104 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 110 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 110 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 113 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = running
ERROR: Backup of VM 113 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 115 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 115 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 160 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = running
ERROR: Backup of VM 160 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 707 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = running
ERROR: Backup of VM 707 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 7101 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 7101 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 9136 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 9136 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Backup job finished with errors
ERROR: Connection refused
TASK ERROR: job errors
the next 3 nodes completed backups with no errors.
I schedule backups one hour apart so there is not a network overload. I'll try to debug this later on.
Next I'll schedule the node to do another backup in a few minutes.
Edit: the next backup worked.
Any clues to prevent this or debug?
Last edited: