Back up fail for a VM

nil.tosar

New Member
Nov 23, 2022
8
1
3
Hello I have a PBS set to do bakcups of my VM. I was going fine until a week ago when a couple of my VM bakcups started failing. Here is my configuration and error messages:

Config of the VM that is failing:
Code:
balloon: 0
boot: 
cores: 8
cpu: SandyBridge
memory: 1024
name: ROU-PUNTACCES-KVM-02
net0: virtio=6A:A2:C0:D2:07:F3,bridge=vmbr0,queues=8
net1: virtio=7E:7D:1F:48:C4:D0,bridge=vmbr0,link_down=1,tag=11
numa: 0
ostype: l26
sata0: vmssdstore:95017/vm-95017-disk-0.qcow2,size=64M
smbios1: uuid=d220eee0-b45a-4def-a485-0f28ea0dc53c
sockets: 1
vmgenid: aceff035-00d2-4d01-9ea6-942f12769d9f

Error msg from journalctl:
Code:
Nov 23 01:56:49 pve03 pvescheduler[48983]: INFO: Starting Backup of VM 95017 (qemu)
Nov 23 01:56:49 pve03 pvescheduler[48983]: Use of uninitialized value $used in pattern match (m//) at /usr/share/perl5/PVE/Storage/Plugin.pm line 906.
Nov 23 01:56:49 pve03 pvescheduler[48983]: Use of uninitialized value $used in concatenation (.) or string at /usr/share/perl5/PVE/Storage/Plugin.pm line 906.
Nov 23 01:56:49 pve03 pvescheduler[48983]: ERROR: Backup of VM 95017 failed - no such volume 'vmssdstore:95017/vm-95017-disk-0.qcow2'
Nov 23 01:56:49 pve03 pvescheduler[48983]: INFO: Backup job finished with errors

pvesm status:
Code:
Name                  Type     Status           Total            Used       Available        %
PBSBCNZF-guifi         pbs     active      9629664768      8811636224       818028544   91.51%
PBSGURB-guifi          pbs   disabled               0               0               0      N/A
hddpool            zfspool     active      2828533760       901908664      1926625096   31.89%
local                  dir     active        52428672         5478528        46950144   10.45%
vmhddstore             dir     active      3770548224      1281105792      2489442432   33.98%
vmssdstore             dir     active      2375680000      1973579136       402100864   83.07%

The VM disk does exist in vmssdstore:95017/vm-95017-disk-0.qcow2 so I don't really understand where the error is.

Thank you so much
 
Please provide the output of pveversion -v from your PVE host and the output of proxmox-backup-manager versions --verbose from your PBS host.
 
Please provide the output of pveversion -v from your PVE host and the output of proxmox-backup-manager versions --verbose from your PBS host.
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.39-2-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-7
pve-kernel-helper: 7.2-7
pve-kernel-5.4: 6.4-19
pve-kernel-5.15.39-2-pve: 5.15.39-2
pve-kernel-5.4.195-1-pve: 5.4.195-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 10.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-7
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-11
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1

proxmox-backup-manager versions --verbose:
Code:
proxmox-backup             2.1-1        running kernel: 5.13.19-1-pve
proxmox-backup-server      2.1.1-1      running version: 2.1.1       
pve-kernel-5.13            7.1-4                                     
pve-kernel-helper          7.1-4                                     
pve-kernel-5.13.19-1-pve   5.13.19-2                                 
ifupdown2                  3.1.0-1+pmx3                             
libjs-extjs                7.0.0-1                                   
proxmox-backup-docs        2.1.1-2                                   
proxmox-backup-client      2.1.1-1                                   
proxmox-mini-journalreader 1.2-1                                     
proxmox-widget-toolkit     3.4-3                                     
pve-xtermjs                4.12.0-1                                 
smartmontools              7.2-1                                     
zfsutils-linux             2.1.1-pve3
 
Hi,
I think this error is also displayed when determining the size of the volume failed. Please also post the output of
Code:
qemu-img info --output=json /path/to/vm-95017-disk-0.qcow2

What kind of filesystem is vmssdstore?
 
Hi,
I think this error is also displayed when determining the size of the volume failed. Please also post the output of
Code:
qemu-img info --output=json /path/to/vm-95017-disk-0.qcow2

What kind of filesystem is vmssdstore?
The output of the command is:
Code:
{
    "virtual-size": 67108864,
    "filename": "95017/vm-95017-disk-0.qcow2",
    "cluster-size": 65536,
    "format": "qcow2",
    "format-specific": {
        "type": "qcow2",
        "data": {
            "compat": "1.1",
            "compression-type": "zlib",
            "lazy-refcounts": false,
            "refcount-bits": 16,
            "corrupt": false,
            "extended-l2": false
        }
    },
    "dirty-flag": false
}

The filesystem vmssdstore is a gluster volume working on a ZFS
 
I was going fine until a week ago when a couple of my VM bakcups started failing.
Did you maybe upgrade qemu or the kernel at that time? Any other changes you can remember?

The issue is that there is no actual-size entry in the qemu-img info output, which is why our code trips up.
 
I don't recall doing any change. Weirdly it only happens to 2 of our 50 machines. I checked and I see that i have the
Code:
actual-size
parameter on the other ones. Can i add it manually or is there a workaround?
 
Don't know about a workaround. We'd need to find out why this happens first. Are the other volumes also on the same storage? Are they also qcow2?
 
Yes mostly all of our production VMs are in the same storage
Code:
vmssdstore
. And all of them have the qcow2 disk. More info that i gathered is that both VM backups started failing the same day (a week ago) and both have inside a mikrotik routerOS.
 
Does qemu-img check /path/to/image show anything or report an error code? While the command is read-only (if you don't pass -r), I still think it's better to run it when the VM is not running.

Does qemu-img info show the same when the VM is running and when it's not running?

I noticed that you are still using pve-qemu-kvm=6.2.0-11. If you are lucky, the issue might be gone with a newer version. Updates for the package are available on all our repositories.
 
The result of the qemu-img info command is exactly the same with the VM not running.

Also the qemu-img check /path/to/image seems fine:
Code:
No errors were found on the image.
262144/262144 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 17182752768

To update the qemu version i need to run the command apt update and it will take all the new versions from the repo?

One last thing is that I have seen that in this machines the qemu guest agent is disabled... But we have other that is disabled as well and they do the backup fine so I don't know if it really matter for this issue.
 
The result of the qemu-img info command is exactly the same with the VM not running.

Also the qemu-img check /path/to/image seems fine:
Code:
No errors were found on the image.
262144/262144 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 17182752768

To update the qemu version i need to run the command apt update and it will take all the new versions from the repo?
apt update will refresh the package lists. You can use apt install pve-qemu-kvm to pull in the latest available version afterwards. Or you can use apt dist-upgrade to upgrade the whole system. Make sure you have a Proxmox VE repository configured (use the no-subscription one if you don't have a subscription).

One last thing is that I have seen that in this machines the qemu guest agent is disabled... But we have other that is disabled as well and they do the backup fine so I don't know if it really matter for this issue.
I'd guess that the agent doesn't matter, because the information is missing from qemu-img info, which doesn't interact with the agent AFAIK.
 
Since qemu-img doesn't strictly guarantee that actual-size is present (but your issue seems to be the first reported instance where it's not), we're currently considering using a fall-back. Can you post the output of stat /path/to/image for the problematic images?
 
Sorry didn't see this last message.

Here is the stat command response:
Code:
  File: 95017/vm-95017-disk-0.qcow2
  Size: 67436544      Blocks: 18446744073709551594 IO Block: 131072 regular file
Device: 3fh/63d    Inode: 10343603782766364064  Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-08-30 15:47:15.185770094 +0200
Modify: 2022-12-08 11:49:32.025890385 +0100
Change: 2022-12-08 11:49:32.025890385 +0100
 Birth: -

We are going to upgrade no matter what and see if that helps aswell. Weirdly one of the two machines started doing the backups just fine 4 days ago.
 
Here is the stat command response:
Code:
Blocks: 18446744073709551594
Well, I guess that might explain it. This is supposed to be the number of (512 byte)-blocks the file actually uses, but the value is way too big. When QEMU tries to multiply it with 512, it overflows and the negative result is treated as an error ;)

So the issue is rather with GlusterFS reporting a wrong value here.
 
Alright thank you so much. We'll try to figure out what's wrong with the GlusterFS and why is giving us this error.
 
  • Like
Reactions: fiona

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!