Proxmox VE vzdump backup of vm's

Jera92

Active Member
May 14, 2019
11
0
41
32
Dear members of the Proxmox forum

Since yesterday I'm experiencing a problem with the vzdump back-up tool.
When the backup of my vm's starts, everything goes well, except after a minutes or 5-10 I receive the following error message: unable to find configuration file for VM 100 - no such machine.
After that the backup is aborted.

I back-up my vm's to an external hard drive, next to the PVE server.
The version I'm running: pve-manager/5.4-5/c6fdb264 (running kernel: 4.15.18-14-pve)

I thought maybe there is a file or directory missing in the /var/lock directory, but I'm not sure about that.

Can somebody help me with this?
 
* Can you post the full output of such a backup command? (and the command itself)
* Is there anything weird/interesting in /var/log/syslog while this issue happens?
* Are you backing up a CT or a VM?
* Can you post the config for this guest? (`pct config CTID` or `qm config CTID`)
 
Thank you for your reply oguz,

The backup command: vzdump 100 101 104 106 --mailto "email-address" --compress gzip --mailnotification failure --mode snapshot --quiet 1 --storage back-up --node proxmox-sam
The output of one VM (same for the others):
100: 2019-05-18 00:30:02 INFO: Starting Backup of VM 100 (qemu)
100: 2019-05-18 00:30:02 INFO: status = running
100: 2019-05-18 00:30:03 INFO: update VM 100: -lock backup
100: 2019-05-18 00:30:03 INFO: VM Name: Fedorasrv28
100: 2019-05-18 00:30:03 INFO: include disk 'scsi0' 'local-lvm:vm-100-disk-0' 30G
100: 2019-05-18 00:30:03 INFO: include disk 'scsi1' 'local-lvm:vm-100-disk-1' 60G
100: 2019-05-18 00:30:03 INFO: backup mode: snapshot
100: 2019-05-18 00:30:03 INFO: ionice priority: 7
100: 2019-05-18 00:30:03 INFO: skip unused drive 'DATA:vm-100-disk-1' (not included into backup)
100: 2019-05-18 00:30:03 INFO: snapshots found (not included into backup)
100: 2019-05-18 00:30:03 INFO: creating archive '/mnt/bkp/dump/vzdump-qemu-100-2019_05_18-00_30_02.vma.gz'
100: 2019-05-18 00:30:03 INFO: started backup task 'b81c5695-37ff-44ba-8f72-4f4e402b6e14'
100: 2019-05-18 00:30:06 INFO: status: 0% (197656576/96636764160), sparse 0% (135872512), duration 3, read/write 65/20 MB/s
100: 2019-05-18 00:30:21 INFO: status: 1% (1615462400/96636764160), sparse 1% (1279037440), duration 18, read/write 94/18 MB/s
100: 2019-05-18 00:30:33 INFO: status: 2% (1934753792/96636764160), sparse 1% (1285128192), duration 30, read/write 26/26 MB/s
100: 2019-05-18 00:31:11 INFO: status: 3% (2907832320/96636764160), sparse 1% (1305821184), duration 68, read/write 25/25 MB/s
100: 2019-05-18 00:31:58 INFO: status: 4% (3869048832/96636764160), sparse 1% (1314619392), duration 115, read/write 20/20 MB/s
100: 2019-05-18 00:32:42 INFO: status: 5% (4850188288/96636764160), sparse 1% (1391579136), duration 159, read/write 22/20 MB/s
100: 2019-05-18 00:32:43 ERROR: unable to find configuration file for VM 100 - no such machine
100: 2019-05-18 00:32:43 INFO: aborting backup job
100: 2019-05-18 00:32:43 ERROR: unable to find configuration file for VM 100 - no such machine
100: 2019-05-18 00:32:45 ERROR: Backup of VM 100 failed - unable to find configuration file for VM 100 - no such machine

I'm backing up a VM, i'm not using containers atm.
I checked the syslog, but I found nothing that can lead to the backup failure.
 
The output off qm config 100:

Code:
agent: 1
bootdisk: scsi0
cores: 2
keyboard: fr-be
memory: 2048
name: Fedorasrv28
net0: virtio=B6:9A:CC:FD:55:CC,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
parent: snapRestoreDisk
scsi0: local-lvm:vm-100-disk-0,size=30G
scsi1: local-lvm:vm-100-disk-1,size=60G
scsihw: virtio-scsi-pci
smbios1: uuid=359112dd-86d0-477e-82bf-eb038abd4b96
sockets: 2
unused0: DATA:vm-100-disk-1
vga: qxl
 
hi.

a few questions:

* are you on the latest pve version? (output of `pveversion -v` is useful
* do you have a cluster? (if yes, details)
* what kind of storage is on `/mnt/bkp`? sometimes there are network related issues on nfs/smb storages which might cause problems during backup
* did you try with another backup mode? ("suspend" or "stop")
 
Hi

* are you on the latest pve version? (output of `pveversion -v` is useful)
the output:
Code:
proxmox-ve: 5.4-1 (running kernel: 4.15.18-14-pve)
pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-2
pve-kernel-4.15.18-14-pve: 4.15.18-38
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-9
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-42
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-37
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-2
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-51
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

* do you have a cluster? (if yes, details)
No I don't, it's a stand alone server.

* what kind of storage is on `/mnt/bkp`? sometimes there are network related issues on nfs/smb storages which might cause problems during backup
The /mnt/bkp directory is a mounted External hard drive formatted in ext4. I'm not using a nfs or samba storage space.

* did you try with another backup mode? ("suspend" or "stop")
I tried both, but they ended u with the same result.

Thank you for hour anwser!
 
hi again.

* are you on the latest pve version? (output of `pveversion -v` is useful)
the output:

looks like your packages aren't the latest ones.

* try an upgrade with `apt update` and then `apt full-upgrade`, followed by rebooting the server.

The /mnt/bkp directory is a mounted External hard drive formatted in ext4. I'm not using a nfs or samba storage space.

alright. this rules out any network related issues. maybe there's something wrong with your storage.

* try running a smartctl test on your device.

Code:
smartctl --test=short /your/device

wait a few minutes and run:

Code:
smartctl -a /your/device

that's all i can think of for now.
 
* try an upgrade with `apt update` and then `apt full-upgrade`, followed by rebooting the server.
I did a complete upgrade of my pve server and then rebooted.
After the reboot I triggered a manual backup proces, but I still receive the error message for all my vm's.
100: 2019-05-18 00:32:43 ERROR: unable to find configuration file for VM 100 - no such machine
100: 2019-05-18 00:32:43 INFO: aborting backup job
100: 2019-05-18 00:32:43 ERROR: unable to find configuration file for VM 100 - no such machine
100: 2019-05-18 00:32:45 ERROR: Backup of VM 100 failed - unable to find configuration file for VM 100 - no such machine

* try running a smartctl test on your device.
I ran the smartctl test on all my hard drives, I didn't found any problems that may cause the fabackup

I checked the system logs, but I still can't find something that can lead to the issue.
 
hi.

can you check and send the outputs of:

Code:
ls -arilh /etc/pve
ls -arilh /etc/pve/qemu-server/
systemctl status pve-cluster
ps aux | grep pmxcfs

there should be VM config files under /etc/pve/qemu-server/VMID.conf and you should see a running pmxcfs process
 
Hi Oguz, here you can find the output:

ls -arilh /etc/pve
total 8.0K
3 -rw-r----- 1 root www-data 880 May 14 11:59 vzdump.cron
104717 -r--r----- 1 root www-data 483 Jan 1 1970 .vmlist
104715 -r--r----- 1 root www-data 445 Jan 1 1970 .version
104723 -rw-r----- 1 root www-data 60 Jul 30 2018 user.cfg
104726 -rw-r----- 1 root www-data 412 Apr 20 13:30 storage.cfg
104716 -r--r----- 1 root www-data 1.5K Jan 1 1970 .rrd
104721 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 qemu-server -> nodes/proxmox-sam/qemu-server
13 -rw-r----- 1 root www-data 1.7K Jul 31 2018 pve-www.key
16 -rw-r----- 1 root www-data 2.1K Jul 31 2018 pve-root-ca.pem
4 drwx------ 2 root www-data 0 Jul 31 2018 priv
104718 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 openvz -> nodes/proxmox-sam/openvz
6 drwxr-xr-x 2 root www-data 0 Jul 31 2018 nodes
104722 -r--r----- 1 root www-data 206 Jan 1 1970 .members
104719 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 lxc -> nodes/proxmox-sam/lxc
638 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 local -> nodes/proxmox-sam
104727 drwxr-xr-x 2 root www-data 0 Apr 10 10:09 firewall
104714 -rw-r----- 1 root www-data 2 Jan 1 1970 .debug
104725 -rw-r----- 1 root www-data 56 May 26 09:44 datacenter.cfg
104724 -rw-r----- 1 root www-data 374 Apr 12 20:24 corosync.conf
104720 -r--r----- 1 root www-data 8.8K Jan 1 1970 .clusterlog
28 -rw-r----- 1 root www-data 451 Jul 31 2018 authkey.pub
3276801 drwxr-xr-x 93 root root 4.0K May 23 16:28 ..
1 drwxr-xr-x 2 root www-data 0 Jan 1 1970 .

ls -arilh /etc/pve/qemu-server/
104721 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 /etc/pve/qemu-server -> nodes/proxmox-sam/qemu-server

systemctl status pve-cluster
pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2019-05-28 13:29:31 CEST; 4 days ago
Process: 18787 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 18765 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 18774 (pmxcfs)
Tasks: 7 (limit: 4915)
Memory: 36.9M
CPU: 5min 31.306s
CGroup: /system.slice/pve-cluster.service
└─18774 /usr/bin/pmxcfs

Jun 01 08:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 09:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 10:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 11:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 12:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 13:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 14:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 15:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 16:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 17:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful

ps aux | grep pmxcfs
root 3391 0.0 0.0 12784 940 pts/0 S+ 17:46 0:00 grep mxcfs
root 18774 0.0 0.4 675840 38536 ? Ssl May28 5:30 /usr/bin/pmxcfs
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!