Proxmox VE vzdump backup of vm's

Jera92

Active Member
May 14, 2019
26
0
41
33
Dear members of the Proxmox forum

Since yesterday I'm experiencing a problem with the vzdump back-up tool.
When the backup of my vm's starts, everything goes well, except after a minutes or 5-10 I receive the following error message: unable to find configuration file for VM 100 - no such machine.
After that the backup is aborted.

I back-up my vm's to an external hard drive, next to the PVE server.
The version I'm running: pve-manager/5.4-5/c6fdb264 (running kernel: 4.15.18-14-pve)

I thought maybe there is a file or directory missing in the /var/lock directory, but I'm not sure about that.

Can somebody help me with this?
 
* Can you post the full output of such a backup command? (and the command itself)
* Is there anything weird/interesting in /var/log/syslog while this issue happens?
* Are you backing up a CT or a VM?
* Can you post the config for this guest? (`pct config CTID` or `qm config CTID`)
 
Thank you for your reply oguz,

The backup command: vzdump 100 101 104 106 --mailto "email-address" --compress gzip --mailnotification failure --mode snapshot --quiet 1 --storage back-up --node proxmox-sam
The output of one VM (same for the others):
100: 2019-05-18 00:30:02 INFO: Starting Backup of VM 100 (qemu)
100: 2019-05-18 00:30:02 INFO: status = running
100: 2019-05-18 00:30:03 INFO: update VM 100: -lock backup
100: 2019-05-18 00:30:03 INFO: VM Name: Fedorasrv28
100: 2019-05-18 00:30:03 INFO: include disk 'scsi0' 'local-lvm:vm-100-disk-0' 30G
100: 2019-05-18 00:30:03 INFO: include disk 'scsi1' 'local-lvm:vm-100-disk-1' 60G
100: 2019-05-18 00:30:03 INFO: backup mode: snapshot
100: 2019-05-18 00:30:03 INFO: ionice priority: 7
100: 2019-05-18 00:30:03 INFO: skip unused drive 'DATA:vm-100-disk-1' (not included into backup)
100: 2019-05-18 00:30:03 INFO: snapshots found (not included into backup)
100: 2019-05-18 00:30:03 INFO: creating archive '/mnt/bkp/dump/vzdump-qemu-100-2019_05_18-00_30_02.vma.gz'
100: 2019-05-18 00:30:03 INFO: started backup task 'b81c5695-37ff-44ba-8f72-4f4e402b6e14'
100: 2019-05-18 00:30:06 INFO: status: 0% (197656576/96636764160), sparse 0% (135872512), duration 3, read/write 65/20 MB/s
100: 2019-05-18 00:30:21 INFO: status: 1% (1615462400/96636764160), sparse 1% (1279037440), duration 18, read/write 94/18 MB/s
100: 2019-05-18 00:30:33 INFO: status: 2% (1934753792/96636764160), sparse 1% (1285128192), duration 30, read/write 26/26 MB/s
100: 2019-05-18 00:31:11 INFO: status: 3% (2907832320/96636764160), sparse 1% (1305821184), duration 68, read/write 25/25 MB/s
100: 2019-05-18 00:31:58 INFO: status: 4% (3869048832/96636764160), sparse 1% (1314619392), duration 115, read/write 20/20 MB/s
100: 2019-05-18 00:32:42 INFO: status: 5% (4850188288/96636764160), sparse 1% (1391579136), duration 159, read/write 22/20 MB/s
100: 2019-05-18 00:32:43 ERROR: unable to find configuration file for VM 100 - no such machine
100: 2019-05-18 00:32:43 INFO: aborting backup job
100: 2019-05-18 00:32:43 ERROR: unable to find configuration file for VM 100 - no such machine
100: 2019-05-18 00:32:45 ERROR: Backup of VM 100 failed - unable to find configuration file for VM 100 - no such machine

I'm backing up a VM, i'm not using containers atm.
I checked the syslog, but I found nothing that can lead to the backup failure.
 
The output off qm config 100:

Code:
agent: 1
bootdisk: scsi0
cores: 2
keyboard: fr-be
memory: 2048
name: Fedorasrv28
net0: virtio=B6:9A:CC:FD:55:CC,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
parent: snapRestoreDisk
scsi0: local-lvm:vm-100-disk-0,size=30G
scsi1: local-lvm:vm-100-disk-1,size=60G
scsihw: virtio-scsi-pci
smbios1: uuid=359112dd-86d0-477e-82bf-eb038abd4b96
sockets: 2
unused0: DATA:vm-100-disk-1
vga: qxl
 
hi.

a few questions:

* are you on the latest pve version? (output of `pveversion -v` is useful
* do you have a cluster? (if yes, details)
* what kind of storage is on `/mnt/bkp`? sometimes there are network related issues on nfs/smb storages which might cause problems during backup
* did you try with another backup mode? ("suspend" or "stop")
 
Hi

* are you on the latest pve version? (output of `pveversion -v` is useful)
the output:
Code:
proxmox-ve: 5.4-1 (running kernel: 4.15.18-14-pve)
pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-2
pve-kernel-4.15.18-14-pve: 4.15.18-38
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-9
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-42
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-37
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-2
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-51
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

* do you have a cluster? (if yes, details)
No I don't, it's a stand alone server.

* what kind of storage is on `/mnt/bkp`? sometimes there are network related issues on nfs/smb storages which might cause problems during backup
The /mnt/bkp directory is a mounted External hard drive formatted in ext4. I'm not using a nfs or samba storage space.

* did you try with another backup mode? ("suspend" or "stop")
I tried both, but they ended u with the same result.

Thank you for hour anwser!
 
hi again.

* are you on the latest pve version? (output of `pveversion -v` is useful)
the output:

looks like your packages aren't the latest ones.

* try an upgrade with `apt update` and then `apt full-upgrade`, followed by rebooting the server.

The /mnt/bkp directory is a mounted External hard drive formatted in ext4. I'm not using a nfs or samba storage space.

alright. this rules out any network related issues. maybe there's something wrong with your storage.

* try running a smartctl test on your device.

Code:
smartctl --test=short /your/device

wait a few minutes and run:

Code:
smartctl -a /your/device

that's all i can think of for now.
 
* try an upgrade with `apt update` and then `apt full-upgrade`, followed by rebooting the server.
I did a complete upgrade of my pve server and then rebooted.
After the reboot I triggered a manual backup proces, but I still receive the error message for all my vm's.
100: 2019-05-18 00:32:43 ERROR: unable to find configuration file for VM 100 - no such machine
100: 2019-05-18 00:32:43 INFO: aborting backup job
100: 2019-05-18 00:32:43 ERROR: unable to find configuration file for VM 100 - no such machine
100: 2019-05-18 00:32:45 ERROR: Backup of VM 100 failed - unable to find configuration file for VM 100 - no such machine

* try running a smartctl test on your device.
I ran the smartctl test on all my hard drives, I didn't found any problems that may cause the fabackup

I checked the system logs, but I still can't find something that can lead to the issue.
 
hi.

can you check and send the outputs of:

Code:
ls -arilh /etc/pve
ls -arilh /etc/pve/qemu-server/
systemctl status pve-cluster
ps aux | grep pmxcfs

there should be VM config files under /etc/pve/qemu-server/VMID.conf and you should see a running pmxcfs process
 
Hi Oguz, here you can find the output:

ls -arilh /etc/pve
total 8.0K
3 -rw-r----- 1 root www-data 880 May 14 11:59 vzdump.cron
104717 -r--r----- 1 root www-data 483 Jan 1 1970 .vmlist
104715 -r--r----- 1 root www-data 445 Jan 1 1970 .version
104723 -rw-r----- 1 root www-data 60 Jul 30 2018 user.cfg
104726 -rw-r----- 1 root www-data 412 Apr 20 13:30 storage.cfg
104716 -r--r----- 1 root www-data 1.5K Jan 1 1970 .rrd
104721 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 qemu-server -> nodes/proxmox-sam/qemu-server
13 -rw-r----- 1 root www-data 1.7K Jul 31 2018 pve-www.key
16 -rw-r----- 1 root www-data 2.1K Jul 31 2018 pve-root-ca.pem
4 drwx------ 2 root www-data 0 Jul 31 2018 priv
104718 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 openvz -> nodes/proxmox-sam/openvz
6 drwxr-xr-x 2 root www-data 0 Jul 31 2018 nodes
104722 -r--r----- 1 root www-data 206 Jan 1 1970 .members
104719 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 lxc -> nodes/proxmox-sam/lxc
638 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 local -> nodes/proxmox-sam
104727 drwxr-xr-x 2 root www-data 0 Apr 10 10:09 firewall
104714 -rw-r----- 1 root www-data 2 Jan 1 1970 .debug
104725 -rw-r----- 1 root www-data 56 May 26 09:44 datacenter.cfg
104724 -rw-r----- 1 root www-data 374 Apr 12 20:24 corosync.conf
104720 -r--r----- 1 root www-data 8.8K Jan 1 1970 .clusterlog
28 -rw-r----- 1 root www-data 451 Jul 31 2018 authkey.pub
3276801 drwxr-xr-x 93 root root 4.0K May 23 16:28 ..
1 drwxr-xr-x 2 root www-data 0 Jan 1 1970 .

ls -arilh /etc/pve/qemu-server/
104721 lrwxr-xr-x 1 root www-data 0 Jan 1 1970 /etc/pve/qemu-server -> nodes/proxmox-sam/qemu-server

systemctl status pve-cluster
pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2019-05-28 13:29:31 CEST; 4 days ago
Process: 18787 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 18765 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 18774 (pmxcfs)
Tasks: 7 (limit: 4915)
Memory: 36.9M
CPU: 5min 31.306s
CGroup: /system.slice/pve-cluster.service
└─18774 /usr/bin/pmxcfs

Jun 01 08:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 09:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 10:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 11:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 12:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 13:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 14:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 15:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 16:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful
Jun 01 17:29:05 proxmox-sam.jdr-mdr.be pmxcfs[18774]: [dcdb] notice: data verification successful

ps aux | grep pmxcfs
root 3391 0.0 0.0 12784 940 pts/0 S+ 17:46 0:00 grep mxcfs
root 18774 0.0 0.4 675840 38536 ? Ssl May28 5:30 /usr/bin/pmxcfs