Backup: ERROR: unable to find configuration file for VM

RobFantini

Famous Member
May 24, 2012
2,015
102
133
Boston,Mass
twice in 3 weeks at least one node had failed backups.

this thread showed a similar situation: https://forum.proxmox.com/threads/proxmox-ve-vzdump-backup-of-vms.54291/post-250155

there are about 10 vm's on the node. the 1st kvm backed up ok:
Code:
INFO: status: 98% (21063794688/21474836480), sparse 68% (14780002304), duration 84, read/write 2364/1 MB/s
INFO: status: 100% (21474836480/21474836480), sparse 70% (15191044096), duration 85, read/write 411/0 MB/s
INFO: transferred 21474 MB in 85 seconds (252 MB/s)
INFO: stopping kvm after backup task
INFO: archive file size: 2.63GB
INFO: delete old backup '/mnt/pve/bkup-nfs/dump/vzdump-qemu-106-2019_08_03-00_00_02.vma.lzo'
INFO: Finished Backup of VM 106 (00:01:28)
INFO: Backup finished at 2019-08-24 00:01:30
INFO: Starting Backup of VM 10782 (qemu)

then the remainder failed :
Code:
INFO: Backup finished at 2019-08-24 00:01:30
INFO: Starting Backup of VM 10782 (qemu)
INFO: Backup started at 2019-08-24 00:01:30
INFO: status = running
INFO: update VM 10782: -lock backup
INFO: VM Name: Kwiklok
INFO: include disk 'ide0' 'rbd_ssd:vm-10782-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 5
INFO: creating archive '/mnt/pve/bkup-nfs/dump/vzdump-qemu-10782-2019_08_24-00_01_30.vma.lzo'
INFO: started backup task 'b3f76c26-8da9-483d-ab3e-c62b3a9906ce'
INFO: status: 1% (509345792/34359738368), sparse 0% (186986496), duration 3, read/write 169/107 MB/s
INFO: status: 2% (851443712/34359738368), sparse 0% (206741504), duration 6, read/write 114/107 MB/s
INFO: status: 3% (1182793728/34359738368), sparse 0% (215281664), duration 9, read/write 110/107 MB/s
INFO: status: 4% (1606418432/34359738368), sparse 0% (230178816), duration 12, read/write 141/136 MB/s
INFO: status: 5% (1950351360/34359738368), sparse 0% (243765248), duration 15, read/write 114/110 MB/s
INFO: status: 6% (2326265856/34359738368), sparse 0% (275148800), duration 18, read/write 125/114 MB/s
INFO: status: 7% (2497314816/34359738368), sparse 0% (294576128), duration 21, read/write 57/50 MB/s
INFO: status: 8% (2862219264/34359738368), sparse 1% (357285888), duration 48, read/write 13/11 MB/s
INFO: status: 9% (3229614080/34359738368), sparse 1% (363343872), duration 51, read/write 122/120 MB/s
INFO: status: 10% (3607101440/34359738368), sparse 1% (377077760), duration 54, read/write 125/121 MB/s
INFO: status: 11% (3997171712/34359738368), sparse 1% (392585216), duration 57, read/write 130/124 MB/s
INFO: status: 12% (4356046848/34359738368), sparse 1% (408723456), duration 60, read/write 119/114 MB/s
INFO: status: 13% (4739956736/34359738368), sparse 1% (428412928), duration 63, read/write 127/121 MB/s
INFO: status: 14% (5051645952/34359738368), sparse 1% (456728576), duration 66, read/write 103/94 MB/s
INFO: status: 15% (5188354048/34359738368), sparse 1% (462778368), duration 74, read/write 17/16 MB/s
INFO: status: 16% (5570035712/34359738368), sparse 1% (528830464), duration 78, read/write 95/78 MB/s
INFO: status: 17% (5963907072/34359738368), sparse 1% (563154944), duration 81, read/write 131/119 MB/s
INFO: status: 18% (6302203904/34359738368), sparse 1% (607698944), duration 84, read/write 112/97 MB/s
INFO: status: 19% (6694109184/34359738368), sparse 1% (647401472), duration 87, read/write 130/117 MB/s
INFO: status: 20% (7032012800/34359738368), sparse 2% (688037888), duration 90, read/write 112/99 MB/s
INFO: status: 21% (7381975040/34359738368), sparse 2% (720273408), duration 93, read/write 116/105 MB/s
INFO: status: 22% (7742816256/34359738368), sparse 2% (764256256), duration 96, read/write 120/105 MB/s
INFO: status: 23% (8023703552/34359738368), sparse 2% (784474112), duration 99, read/write 93/86 MB/s
INFO: status: 24% (8430551040/34359738368), sparse 2% (835563520), duration 102, read/write 135/118 MB/s
INFO: status: 25% (8749318144/34359738368), sparse 2% (860991488), duration 105, read/write 106/97 MB/s
INFO: status: 26% (9092202496/34359738368), sparse 2% (900317184), duration 108, read/write 114/101 MB/s
INFO: status: 27% (9361686528/34359738368), sparse 2% (935059456), duration 111, read/write 89/78 MB/s
INFO: status: 28% (9639559168/34359738368), sparse 2% (1011642368), duration 114, read/write 92/67 MB/s
INFO: status: 29% (10078912512/34359738368), sparse 3% (1134518272), duration 117, read/write 146/105 MB/s
INFO: status: 30% (10433986560/34359738368), sparse 3% (1175810048), duration 121, read/write 88/78 MB/s
INFO: status: 31% (10715267072/34359738368), sparse 3% (1234259968), duration 124, read/write 93/74 MB/s
INFO: status: 32% (11072569344/34359738368), sparse 3% (1278296064), duration 127, read/write 119/104 MB/s
INFO: status: 33% (11492392960/34359738368), sparse 3% (1360261120), duration 132, read/write 83/67 MB/s
ERROR: unable to find configuration file for VM 10782 - no such machine
INFO: aborting backup job
ERROR: unable to find configuration file for VM 10782 - no such machine
INFO: ipcc_send_rec[1] failed: Connection refused
INFO: ipcc_send_rec[2] failed: Connection refused
INFO: ipcc_send_rec[3] failed: Connection refused
INFO: Unable to load access control list: Connection refused
command 'qm unlock 10782' failed: exit code 111
ERROR: Backup of VM 10782 failed - unable to find configuration file for VM 10782 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 108 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 108 failed - unable to find configuration file for VM 108 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 111 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 111 failed - unable to find configuration file for VM 111 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 114 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 114 failed - unable to find configuration file for VM 114 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 445 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 445 failed - unable to find configuration file for VM 445 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 900 (qemu)
INFO: Backup started at 2019-08-24 00:03:48
ERROR: Backup of VM 900 failed - unable to find configuration file for VM 900 - no such machine
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = running
ERROR: Backup of VM 102 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 104 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 104 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 110 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 110 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 113 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = running
ERROR: Backup of VM 113 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 115 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 115 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 160 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = running
ERROR: Backup of VM 160 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 707 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = running
ERROR: Backup of VM 707 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 7101 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 7101 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Starting Backup of VM 9136 (lxc)
INFO: Backup started at 2019-08-24 00:03:48
INFO: status = stopped
ERROR: Backup of VM 9136 failed - Connection refused
INFO: Failed at 2019-08-24 00:03:48
INFO: Backup job finished with errors
ERROR: Connection refused

TASK ERROR: job errors

the next 3 nodes completed backups with no errors.


I schedule backups one hour apart so there is not a network overload. I'll try to debug this later on.

Next I'll schedule the node to do another backup in a few minutes.

Edit: the next backup worked.

Any clues to prevent this or debug?
 
Last edited:
/etc/pve was disconnecting during the backup. from syslog on that node:
Code:
Aug 24 00:01:30 pve14 vzdump[1989762]: INFO: Starting Backup of VM 10782 (qemu)
..

Aug 24 00:01:41 pve14 systemd[1]: pve-cluster.service: Failed with result 'signal'.
Aug 24 00:01:41 pve14 systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.
Aug 24 00:01:41 pve14 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 29.
Aug 24 00:01:41 pve14 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Aug 24 00:01:41 pve14 systemd[1]: Starting The Proxmox VE cluster filesystem...
Aug 24 00:01:41 pve14 systemd[1]: etc-pve.mount: Succeeded.

we have a site wide rsyslog server. another node had /etc/pve issue, however the backup did not fail:
Code:
# grep etc-pve.mount zz-sitewide.log
Aug 24 00:01:41 pve14 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 00:03:40 pve14 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 00:03:50 pve14 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 00:05:12 pve14 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 01:03:19 pve10 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 02:18:00 pve10 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 05:06:01 pve10 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 05:12:26 pve10 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 06:44:28 pve10 systemd[1]: etc-pve.mount: Succeeded.
Aug 24 06:56:15 pve10 systemd[1]: etc-pve.mount: Succeeded.

for network we use a dual ring network for cluster , 10G for storage. and 1G for vmbr0 .

vzdump :
Code:
# 2019-03-11 using default pve , adjusted with https://github.com/extremeshok/xshok-proxmox/blob/master/install-post.sh
#            also pigz set up using same link.

#tmpdir: DIR
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
bwlimit: 0
ionice: 5
#lockwait: MINUTES
#stopwait: MINUTES
#size: MB
#stdexcludes: BOOLEAN
#mailto: ADDRESSLIST
#maxfiles: N
#script: FILENAME
#exclude-path: PATHLIST
pigz: 1


We backup to a nfs system connected to the 10G switch.

Is there anything we could try to further debug or prevent the backup issue?
 
We made the following changes:

1- changed backup storage to cephfs [ it was nfs]
2- tried stop mode instead of suspend after the 1st fail.
3- changed /etc/vzdump.conf to developers default after the 1st fail.

we still have backup fail.
last night 2 different nodes had a failed backup due to 'unable to find configuration file for VM <vm id> - no such machine' .

there has been improvement as there was only one vm to fail backup per node. last week all subsequent vm's on the node would fail.

So most backups are working - which is not good enough.

Does anyone have a suggestion on something else to try to fix this issue?
 
Next :
the system that had the majority of fails - had two vm's that had heavy network traffic. i moved one of the vm's to another node to balance network load.
in addition will set bandwidth limit on all the heavy network vm's at all nodes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!