Backup of VM fails

vb87 · Jun 20, 2022

Hello Experts,

I´m running a Proxmox-Server with several LXC Containers and one VM.
I´ve configured my Backup with PBS and it ran perfectly until May 27.
Now it fails every day when backing up the one VM and I´m not unterstanding why this happens. I tried some Forum-Entries, but nothing helped.
I also tried an manual Backup to local storage. This worked.
Hopefully someone can help me.

Here are some important additional Infos:

proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.4: 6.4-5
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;net0
cores: 2
memory: 2048
meta: creation-qemu=6.1.0,ctime=1643746981
name: proxy
net0: virtio=CE:80:9A

2:A7:A1,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: data:vm-106-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=ff365e0b-0310-4c58-bb13-5ab9fd1adda8
sockets: 1
vmgenid: cd2bcdee-45b8-475a-a568-04581f131305
#qmdump#map:scsi0:drive-scsi0:data::

Static hostname: proxy
Icon name: computer-vm
Chassis: vm
Machine ID: ff365e0b03104c58bb135ab9fd1adda8
Boot ID: 1ddb95abe48846b998071d172d70d8dc
Virtualization: kvm
Operating System: Ubuntu 22.04 LTS
Kernel: Linux 5.15.0-39-generic
Architecture: x86-64
Hardware Vendor: QEMU
Hardware Model: Standard PC _i440FX + PIIX, 1996_
qemu-guest-agent/jammy-updates,now 1:6.2+dfsg-2ubuntu6.1

INFO: starting new backup job: vzdump 106 --remove 0 --storage backup --notes-template 'Manual: {{guestname}} ({{vmid}})' --node pve --mode snapshot
INFO: Starting Backup of VM 106 (qemu)
INFO: Backup started at 2022-06-20 20:41:24
INFO: status = running
INFO: VM Name: proxy
INFO: include disk 'scsi0' 'data:vm-106-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/106/2022-06-20T18:41:24Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 106 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 106 failed - VM 106 qmp command 'backup' failed - got timeout
INFO: Failed at 2022-06-20 20:43:30
INFO: Backup job finished with errors
TASK ERROR: job errors

Jun 20 20:41:24 pve pvedaemon[1666636]: INFO: starting new backup job: vzdump 106 --remove 0 --storage backup --notes-template 'Manual: {{guestname}} ({{vmid}})' --node pve --mode snapshot
Jun 20 20:41:24 pve pvedaemon[1666636]: INFO: Starting Backup of VM 106 (qemu)
Jun 20 20:41:58 pve pvedaemon[1541405]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 20:42:03 pve pvestatd[3243]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 301 retries
Jun 20 20:42:04 pve pvestatd[3243]: status update time (33.574 seconds)
Jun 20 20:42:37 pve pvestatd[3243]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 301 retries
Jun 20 20:42:37 pve pvestatd[3243]: status update time (33.595 seconds)
Jun 20 20:43:10 pve pvestatd[3243]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 301 retries
Jun 20 20:43:11 pve pvestatd[3243]: status update time (33.612 seconds)
Jun 20 20:43:29 pve pvedaemon[1512066]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 301 retries

vb87 · Jun 27, 2022

No one, with an idea for debugging this problem?

shrdlicka · Jun 28, 2022

Hi,

can you ping your PBS host? It looks like it can't establish a connection.

JMLMS · Jun 28, 2022

It looks to me like I have the exact same issue, although my Setup is a bit different:

My error messages look the same. For me this issue occured after an Update on June 21. Since I am using the enterprise repositories I get the updates later, but I am on the same kernel as @vb87.

The differences: I am not using PBS, because the builtin Backups are good enough for me. I have a Windows VM on the same host that has no backup problems. The Linux VM is stuck after the Backup and can only be restarted/shut-down with "Stop" which ultimately kills the service.

Start-Date: 2022-06-21 13:23:43
Commandline: apt dist-upgrade
Install: pve-kernel-5.15.35-2-pve:amd64 (5.15.35-5, automatic)
Upgrade: dpkg:amd64 (1.20.9, 1.20.10)
cifs-utils:amd64 (2:6.11-3.1, 2:6.11-3.1+deb11u1)
libcups2:amd64 (2.3.3op2-3+deb11u1, 2.3.3op2-3+deb11u2)
tzdata:amd64 (2021a-1+deb11u3, 2021a-1+deb11u4)
pve-qemu-kvm:amd64 (6.2.0-7, 6.2.0-10)
pve-lxc-syscalld:amd64 (1.1.0-1, 1.1.1-1)
proxmox-backup-file-restore:amd64 (2.2.1-1, 2.2.3-1)
libpve-access-control:amd64 (7.1-8, 7.2-2)
rsyslog:amd64 (8.2102.0-2, 8.2102.0-2+deb11u1)
proxmox-backup-client:amd64 (2.2.1-1, 2.2.3-1)
libpve-common-perl:amd64 (7.2-1, 7.2-2)
pve-kernel-5.15:amd64 (7.2-3, 7.2-4)
libnozzle1:amd64 (1.22-pve2, 1.24-pve1)
libknet1:amd64 (1.22-pve2, 1.24-pve1)
pve-kernel-helper:amd64 (7.2-3, 7.2-4)
End-Date: 2022-06-21 13:24:34

INFO: starting new backup job: vzdump --mailnotification failure --mode stop --notes-template '{{guestname}} on {{cluster}}' --compress 0 --quiet 1 --all 1 --storage smb-rndnas02 --prune-backups 'keep-daily=7,keep-monthly=12,keep-weekly=4,keep-yearly=1'
INFO: skip external VMs: 100, 104, 106, 107, 108, 109
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2022-06-24 03:00:02
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: buildbotworker2
INFO: include disk 'virtio0' 'ceph:vm-101-disk-0' 16G
INFO: exclude disk 'virtio1' 'iscsi-rndnas02:0.0.4.scsi-36589cfc000000465f2cc65c0d7d7f874' (backup=no)
INFO: stopping virtual guest
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/mnt/pve/smb-rndnas02/dump/vzdump-qemu-101-2022_06_24-03_00_02.vma'
INFO: starting kvm to execute backup task
no efidisk configured! Using temporary efivars disk.
INFO: started backup task 'd1dc631e-5ab2-428e-ba19-8bbc78d0e6d6'
INFO: resuming VM again after 12 seconds
...
INFO: 92% (14.8 GiB of 16.0 GiB) in 19s, read: 712.7 MiB/s, write: 682.0 MiB/s
ERROR: VM 101 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 101 qmp command 'backup-cancel' failed - unable to connect to VM 101 qmp socket - timeout after 5988 retries
INFO: resuming VM again
ERROR: Backup of VM 101 failed - VM 101 qmp command 'cont' failed - unable to connect to VM 101 qmp socket - timeout after 450 retries
INFO: Failed at 2022-06-24 03:21:20
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2022-06-24 03:21:20
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: buildvm01
INFO: include disk 'scsi0' 'ceph:vm-105-disk-0' 1T
INFO: stopping virtual guest
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/mnt/pve/smb-rndnas02/dump/vzdump-qemu-105-2022_06_24-03_21_20.vma'
INFO: starting kvm to execute backup task
iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
INFO: started backup task 'ff78e3f0-fb5b-4460-ac3e-dbff5de0bf58'
INFO: resuming VM again after 49 seconds
INFO: 0% (3.3 GiB of 1.0 TiB) in 3s, read: 1.1 GiB/s, write: 1.1 GiB/s
...
INFO: 100% (1.0 TiB of 1.0 TiB) in 11m 27s, read: 2.5 GiB/s, write: 68.5 MiB/s
INFO: backup is sparse: 633.89 GiB (61%) total zero data
INFO: transferred 1.00 TiB in 687 seconds (1.5 GiB/s)
INFO: archive file size: 390.25GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-daily=7, keep-monthly=12, keep-weekly=4, keep-yearly=1
INFO: removing backup 'smb-rndnas02:backup/vzdump-qemu-105-2022_06_17-03_01_46.vma'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 105 (00:12:28)
INFO: Backup finished at 2022-06-24 03:33:48
INFO: Backup job finished with errors
TASK ERROR: job errors

Jun 28 09:09:44 lmsman-hive02 pvedaemon[2327875]: stop VM 101: UPID:lmsman-hive02:00238543:020A500E:62BAA938:qmstop:101:jmlms@pve:
Jun 28 09:09:44 lmsman-hive02 pvedaemon[1913]: <jmlms@pve> starting task UPID:lmsman-hive02:00238543:020A500E:62BAA938:qmstop:101:jmlms@pve:
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopping User Manager for UID 0...
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Main User Target.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Basic System.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Paths.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Sockets.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Timers.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: dirmngr.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG network certificate management daemon.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-browser.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-extra.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-ssh.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Removed slice User Application Slice.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Reached target Shutdown.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: systemd-exit.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Finished Exit the Session.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Reached target Exit the Session.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: user@0.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopped User Manager for UID 0.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopping User Runtime Directory /run/user/0...
Jun 28 09:09:47 lmsman-hive02 systemd[1]: run-user-0.mount: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: user-runtime-dir@0.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopped User Runtime Directory /run/user/0.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Removed slice User Slice of UID 0.
Jun 28 09:09:47 lmsman-hive02 pvedaemon[1912]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qm>
Jun 28 09:09:47 lmsman-hive02 pvedaemon[2327875]: VM 101 qmp command failed - VM 101 qmp command 'quit' failed - unable to connect to VM 101 qmp socket - tim>
Jun 28 09:09:47 lmsman-hive02 pvedaemon[2327875]: VM quit/powerdown failed - terminating now with SIGTERM
Jun 28 09:09:48 lmsman-hive02 pvestatd[1857]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp>
Jun 28 09:09:48 lmsman-hive02 pvestatd[1857]: status update time (6.160 seconds)
Jun 28 09:09:57 lmsman-hive02 pvedaemon[2327875]: VM still running - terminating now with SIGKILL
Jun 28 09:09:57 lmsman-hive02 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 28 09:09:57 lmsman-hive02 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 28 09:09:58 lmsman-hive02 qmeventd[997]: read: Connection reset by peer
Jun 28 09:09:58 lmsman-hive02 pvestatd[1857]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp>
Jun 28 09:09:58 lmsman-hive02 systemd[1]: 101.scope: Succeeded.
Jun 28 09:09:58 lmsman-hive02 systemd[1]: 101.scope: Consumed 38.411s CPU time.
Jun 28 09:09:58 lmsman-hive02 pvestatd[1857]: status update time (5.744 seconds)
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: Starting cleanup for 101
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: trying to acquire lock...
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: OK

fiona · Jun 28, 2022

Hi,

JMLMS said:
It looks to me like I have the exact same issue, although my Setup is a bit different:

My error messages look the same. For me this issue occured after an Update on June 21. Since I am using the enterprise repositories I get the updates later, but I am on the same kernel as @vb87.

The differences: I am not using PBS, because the builtin Backups are good enough for me. I have a Windows VM on the same host that has no backup problems. The Linux VM is stuck after the Backup and can only be restarted/shut-down with "Stop" which ultimately kills the service.

Start-Date: 2022-06-21 13:23:43
Commandline: apt dist-upgrade
Install: pve-kernel-5.15.35-2-pve:amd64 (5.15.35-5, automatic)
Upgrade: dpkg:amd64 (1.20.9, 1.20.10)
cifs-utils:amd64 (2:6.11-3.1, 2:6.11-3.1+deb11u1)
libcups2:amd64 (2.3.3op2-3+deb11u1, 2.3.3op2-3+deb11u2)
tzdata:amd64 (2021a-1+deb11u3, 2021a-1+deb11u4)
pve-qemu-kvm:amd64 (6.2.0-7, 6.2.0-10)
pve-lxc-syscalld:amd64 (1.1.0-1, 1.1.1-1)
proxmox-backup-file-restore:amd64 (2.2.1-1, 2.2.3-1)
libpve-access-control:amd64 (7.1-8, 7.2-2)
rsyslog:amd64 (8.2102.0-2, 8.2102.0-2+deb11u1)
proxmox-backup-client:amd64 (2.2.1-1, 2.2.3-1)
libpve-common-perl:amd64 (7.2-1, 7.2-2)
pve-kernel-5.15:amd64 (7.2-3, 7.2-4)
libnozzle1:amd64 (1.22-pve2, 1.24-pve1)
libknet1:amd64 (1.22-pve2, 1.24-pve1)
pve-kernel-helper:amd64 (7.2-3, 7.2-4)
End-Date: 2022-06-21 13:24:34

INFO: starting new backup job: vzdump --mailnotification failure --mode stop --notes-template '{{guestname}} on {{cluster}}' --compress 0 --quiet 1 --all 1 --storage smb-rndnas02 --prune-backups 'keep-daily=7,keep-monthly=12,keep-weekly=4,keep-yearly=1'
INFO: skip external VMs: 100, 104, 106, 107, 108, 109
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2022-06-24 03:00:02
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: buildbotworker2
INFO: include disk 'virtio0' 'ceph:vm-101-disk-0' 16G
INFO: exclude disk 'virtio1' 'iscsi-rndnas02:0.0.4.scsi-36589cfc000000465f2cc65c0d7d7f874' (backup=no)
INFO: stopping virtual guest
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/mnt/pve/smb-rndnas02/dump/vzdump-qemu-101-2022_06_24-03_00_02.vma'
INFO: starting kvm to execute backup task
no efidisk configured! Using temporary efivars disk.
INFO: started backup task 'd1dc631e-5ab2-428e-ba19-8bbc78d0e6d6'
INFO: resuming VM again after 12 seconds
...
INFO: 92% (14.8 GiB of 16.0 GiB) in 19s, read: 712.7 MiB/s, write: 682.0 MiB/s
ERROR: VM 101 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 101 qmp command 'backup-cancel' failed - unable to connect to VM 101 qmp socket - timeout after 5988 retries
INFO: resuming VM again
ERROR: Backup of VM 101 failed - VM 101 qmp command 'cont' failed - unable to connect to VM 101 qmp socket - timeout after 450 retries
INFO: Failed at 2022-06-24 03:21:20
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2022-06-24 03:21:20
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: buildvm01
INFO: include disk 'scsi0' 'ceph:vm-105-disk-0' 1T
INFO: stopping virtual guest
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/mnt/pve/smb-rndnas02/dump/vzdump-qemu-105-2022_06_24-03_21_20.vma'
INFO: starting kvm to execute backup task
iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
INFO: started backup task 'ff78e3f0-fb5b-4460-ac3e-dbff5de0bf58'
INFO: resuming VM again after 49 seconds
INFO: 0% (3.3 GiB of 1.0 TiB) in 3s, read: 1.1 GiB/s, write: 1.1 GiB/s
...
INFO: 100% (1.0 TiB of 1.0 TiB) in 11m 27s, read: 2.5 GiB/s, write: 68.5 MiB/s
INFO: backup is sparse: 633.89 GiB (61%) total zero data
INFO: transferred 1.00 TiB in 687 seconds (1.5 GiB/s)
INFO: archive file size: 390.25GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-daily=7, keep-monthly=12, keep-weekly=4, keep-yearly=1
INFO: removing backup 'smb-rndnas02:backup/vzdump-qemu-105-2022_06_17-03_01_46.vma'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 105 (00:12:28)
INFO: Backup finished at 2022-06-24 03:33:48
INFO: Backup job finished with errors
TASK ERROR: job errors

Jun 28 09:09:44 lmsman-hive02 pvedaemon[2327875]: stop VM 101: UPID:lmsman-hive02:00238543:020A500E:62BAA938:qmstop:101:jmlms@pve:
Jun 28 09:09:44 lmsman-hive02 pvedaemon[1913]: <jmlms@pve> starting task UPID:lmsman-hive02:00238543:020A500E:62BAA938:qmstop:101:jmlms@pve:
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopping User Manager for UID 0...
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Main User Target.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Basic System.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Paths.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Sockets.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Timers.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: dirmngr.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG network certificate management daemon.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-browser.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-extra.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-ssh.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Removed slice User Application Slice.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Reached target Shutdown.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: systemd-exit.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Finished Exit the Session.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Reached target Exit the Session.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: user@0.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopped User Manager for UID 0.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopping User Runtime Directory /run/user/0...
Jun 28 09:09:47 lmsman-hive02 systemd[1]: run-user-0.mount: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: user-runtime-dir@0.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopped User Runtime Directory /run/user/0.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Removed slice User Slice of UID 0.
Jun 28 09:09:47 lmsman-hive02 pvedaemon[1912]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qm>
Jun 28 09:09:47 lmsman-hive02 pvedaemon[2327875]: VM 101 qmp command failed - VM 101 qmp command 'quit' failed - unable to connect to VM 101 qmp socket - tim>
Jun 28 09:09:47 lmsman-hive02 pvedaemon[2327875]: VM quit/powerdown failed - terminating now with SIGTERM
Jun 28 09:09:48 lmsman-hive02 pvestatd[1857]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp>
Jun 28 09:09:48 lmsman-hive02 pvestatd[1857]: status update time (6.160 seconds)
Jun 28 09:09:57 lmsman-hive02 pvedaemon[2327875]: VM still running - terminating now with SIGKILL
Jun 28 09:09:57 lmsman-hive02 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 28 09:09:57 lmsman-hive02 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 28 09:09:58 lmsman-hive02 qmeventd[997]: read: Connection reset by peer
Jun 28 09:09:58 lmsman-hive02 pvestatd[1857]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp>
Jun 28 09:09:58 lmsman-hive02 systemd[1]: 101.scope: Succeeded.
Jun 28 09:09:58 lmsman-hive02 systemd[1]: 101.scope: Consumed 38.411s CPU time.
Jun 28 09:09:58 lmsman-hive02 pvestatd[1857]: status update time (5.744 seconds)
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: Starting cleanup for 101
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: trying to acquire lock...
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: OK

could you please share the output of qm config 101 and tell us what type of storages you are using? If the issue can be reproduced reliably, could you please

Install pve-qemu-kvm-dbg and gdb
Start the VM 101
Run gdb --ex 'handle SIGUSR1 nostop noprint' --ex 'handle SIGPIPE nostop noprint' --ex 'set pagination off' --ex 'c' -p $(qm status 101 --verbose | grep pid: | cut -d: -f2)
Start the backup and wait for the VM to hang/crash.
Enter t a a bt in gdb and share the output here. If don't get a prompt in gdb (i.e. are not able to input commands), you might need to press Ctrl+C first.

EDIT: Fix typo in command

JMLMS · Jun 28, 2022

balloon: 1024
bios: ovmf
boot: order=virtio0;net0
cores: 30
cpu: host
description: virtio1%3A iscsi-rndnas02%3A0.0.2.scsi-36589cfc000000719a1ec4b624b03d049,backup=0,discard=on,iothread=1,size=1073741840K
memory: 65536
meta: creation-qemu=6.1.1,ctime=1651842250
name: buildbotworker2
net0: virtio=36:B5:E3

F:1E:AD,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
parent: VorUpdate_280622
scsihw: virtio-scsi-pci
smbios1: uuid=009f82ba-fd0d-4795-912e-c3cbde2e55d2
sockets: 1
vga: virtio
virtio0: ceph:vm-101-disk-0,discard=on,iothread=1,size=16G
virtio1: iscsi-rndnas02:0.0.4.scsi-36589cfc000000465f2cc65c0d7d7f874,backup=0,discard=on,iothread=1,size=1073741840K
vmgenid: 0774a5a0-13e0-4ffd-a144-d8e8754a2844

Sorry, I forgot to mention that this does not happen every time, so my issue is not that easy to reproduce. I also opened support-ticket 1134922, since my problem seems a bit different. Sorry for hijacking this thread...

vb87 · Jun 28, 2022

The PBS Host is reachable. All Backups of LXC-Containers, which are running before and after the VM are finishing successful.
I configured a second VM yesterday and this also fails.

JMLMS · Jun 29, 2022

There is a minor typo in the gdb command. It should be 'set pagination off'.

vb87 · Jul 7, 2022

I`ve created a second VM wich I won't Backup and get the same Error.
In syslog I can see the following Infos:

Code:

Jul 07 18:45:40 pve pveproxy[3385]: starting 1 worker(s)
Jul 07 18:45:40 pve pveproxy[3385]: worker 4127053 started
Jul 07 18:46:12 pve pvestatd[3322]: VM 301 qmp command failed - VM 301 qmp command 'query-proxmox-support' failed - unable to connect to VM 301 qmp socket - timeout after 301 retries
Jul 07 18:46:12 pve pvestatd[3322]: status update time (33.575 seconds)
Jul 07 18:46:27 pve pvedaemon[4123492]: VM 301 qmp command failed - VM 301 qmp command 'backup' failed - got timeout
Jul 07 18:46:27 pve pvedaemon[4123492]: ERROR: Backup of VM 301 failed - VM 301 qmp command 'backup' failed - got timeout
Jul 07 18:46:27 pve pvedaemon[4123492]: INFO: Backup job finished with errors
Jul 07 18:46:27 pve pvedaemon[4123492]: job errors

My nighlty E-Mail Overview looks like this:

VMID	NAME	STATUS	TIME	SIZE	FILENAME
100	nextcloud	OK	00:34:32	126.78GB	ct/100/2022-07-04T21:00:02Z
101	nas	OK	00:22:05	180.24GB	ct/101/2022-07-04T21:34:34Z
102	svn	OK	00:00:15	3.34GB	ct/102/2022-07-04T21:56:39Z
103	gitlab	OK	00:02:48	9.68GB	ct/103/2022-07-04T21:56:54Z
104	kuraiko	OK	00:00:13	1.81GB	ct/104/2022-07-04T21:59:42Z
105	wunschlisten	OK	00:00:12	1.73GB	ct/105/2022-07-04T21:59:55Z
106	proxy	FAILED	00:02:06	VM 106 qmp command 'backup' failed - got timeout
107	owncast	OK	00:00:06	802MB	ct/107/2022-07-04T22:02:13Z
110	carsten	OK	00:00:13	1.92GB	ct/110/2022-07-04T22:02:19Z
300	openhab	OK	00:01:27	2.49GB	ct/300/2022-07-04T22:02:32Z
301	debmatic	FAILED	00:02:06	VM 301 qmp command 'backup' failed - got timeout
TOTAL	01:06:03	328.77GB

vb87 · Jul 18, 2022

Todays Update to

Code:

qemu-guest-agent/jammy-updates,now 1:6.2+dfsg-2ubuntu6.3 amd64

and to

Code:

proxmox-ve: 7.2-1 (running kernel: 5.15.39-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-6
pve-kernel-helper: 7.2-6
pve-kernel-5.13: 7.1-9
pve-kernel-5.4: 6.4-5
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-7
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.4-1
proxmox-backup-file-restore: 2.2.4-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-11
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

did not solve the problem.

vb87 · Nov 21, 2023

Problem is unrsolved.

I found a workaroud for to backup the VMs.
I installed an LXC container with PBS and do the backup localy and than backup the LXC-Container with local PBS to remote PBS.

milew · Nov 21, 2023

INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 106 qmp command 'backup' failed - got timeout

guest agent on this machine is working?

vb87 · Nov 21, 2023

Yes I think so.
At least I can see the IP-Address in Proxmox-Web-Gui, which is not possible if gues-agent is missing or not running.

kode · Nov 22, 2023

I have the same vzdump output and backup failure with 1 VM in a cluster of +- 200VM's. The VM doesn't hang/crash, only the backup fails.
The only different with other VM's is that it's a very big VM with a 50TB disk on a ceph storage. This is getting backed up to a pbs server. The daily backup fails most of the times although it sometimes succeeds. If I run it manually however, it works. No other backup is running at that time. Any clue how to debug this further?

INFO: starting new backup job: vzdump 152 --mailto ***** --mailnotification failure --notes-template '{{guestname}}' --quiet 1 --mode snapshot --storage pbs0-week
INFO: Starting Backup of VM 152 (qemu)
INFO: Backup started at 2023-11-22 05:00:01
INFO: status = running
INFO: VM Name: ******
INFO: include disk 'scsi0' 'rbd:vm-152-disk-0' 100G
INFO: include disk 'scsi1' 'rbd:vm-152-disk-1' 50000G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/152/2023-11-22T04:00:01Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 152 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 152 failed - VM 152 qmp command 'backup' failed - got timeout
INFO: Failed at 2023-11-22 05:04:09
INFO: Backup job finished with errors
INFO: notified via target `*******`
INFO: notified via target `mail-to-root`
TASK ERROR: job errors

2023-11-22T05:00:01+01:00: starting new backup on datastore '*****' from ::ffff:****: "ns/week/vm/152/2023-11-22T04:00:01Z"
2023-11-22T05:00:01+01:00: download 'index.json.blob' from previous backup.
2023-11-22T05:00:01+01:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2023-11-22T05:00:01+01:00: download 'drive-scsi0.img.fidx' from previous backup.
2023-11-22T05:00:01+01:00: created new fixed index 1 ("ns/week/vm/152/2023-11-22T04:00:01Z/drive-scsi0.img.fidx")
2023-11-22T05:00:01+01:00: register chunks in 'drive-scsi1.img.fidx' from previous backup.
2023-11-22T05:03:33+01:00: download 'drive-scsi1.img.fidx' from previous backup.
2023-11-22T05:04:08+01:00: created new fixed index 2 ("ns/week/vm/152/2023-11-22T04:00:01Z/drive-scsi1.img.fidx")
2023-11-22T05:04:09+01:00: add blob "/mnt/datastore/zfs/****/ns/week/vm/152/2023-11-22T04:00:01Z/qemu-server.conf.blob" (428 bytes, comp: 428)
2023-11-22T05:04:09+01:00: backup ended and finish failed: backup ended but finished flag is not set.
2023-11-22T05:04:09+01:00: removing unfinished backup
2023-11-22T05:04:09+01:00: TASK ERROR: backup ended but finished flag is not set.

proxmox-ve: 8.0.2 (running kernel: 6.2.16-10-pve)
pve-manager: 8.0.9 (running version: 8.0.9/fd1a0ae1b385cdcd)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.5
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2: 6.2.16-19
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.7-pve1
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx6
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.6
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.10
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.5
proxmox-mail-forward: 0.2.1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.1
pve-cluster: 8.0.5
pve-container: 5.0.5
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.0.7
pve-qemu-kvm: 8.1.2-2
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.8
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3

niteshadow · Nov 22, 2023

Does this daily scheduled backup run at the same time as other backups ?
If so can you try to schedule it separately, say 1 hour before other backups ?

kode · Nov 22, 2023

No, I already thought of that and set the backup after all other backups on that cluster are finished. The backups on the cluster work together without any issues. Unless it's an issue on the pbs, as other backups from other clusters are possible at that time however we don't have any issues with those VM's either.

niteshadow · Nov 22, 2023

Since it works manually, I did suspect the timing.
Is anything else running on PBS at backup time such as verification, pruning ?
What are the specs of the PBS server ? What's the connection speed between PVE and PBS ?

mps000 · Nov 24, 2023

I have the same issue with an Ubuntu guest. PVE 8.0.9

ERROR: VM 108 qmp command 'cont' failed - got timeout

Nothing else running on host or PBS. 1Gb switch

fiona · Nov 24, 2023

Hi,

kode said:
I have the same vzdump output and backup failure with 1 VM in a cluster of +- 200VM's. The VM doesn't hang/crash, only the backup fails.
The only different with other VM's is that it's a very big VM with a 50TB disk on a ceph storage. This is getting backed up to a pbs server. The daily backup fails most of the times although it sometimes succeeds. If I run it manually however, it works. No other backup is running at that time. Any clue how to debug this further?

I feel like the huge disk could indeed be the cause of the issue here. Can you share the VM configuration qm config 152?

kode said:
2023-11-22T05:00:01+01:00: register chunks in 'drive-scsi1.img.fidx' from previous backup.
2023-11-22T05:03:33+01:00: download 'drive-scsi1.img.fidx' from previous backup.

This step is taking over three minutes. The timeout for the backup command for the VM is just over 2 minutes.

Unfortunately, I don't have time right now to look into the issue. I'll add it to my work pile, but could you please open a bug report referencing back to this thread, so we don't loose track of it and maybe somebody else has time to look at it: https://bugzilla.proxmox.com/

fiona · Nov 24, 2023

Hi,

mps000 said:
I have the same issue with an Ubuntu guest. PVE 8.0.9

ERROR: VM 108 qmp command 'cont' failed - got timeout

Nothing else running on host or PBS. 1Gb switch

that's a different error message, so might be a different issue

Or do you also have a huge disk? Please share the output of pveversion -v, qm config 108 and the full backup task log. Is the guest still responsive after such a failure?

Backup of VM fails

New Member

New Member

Proxmox Retired Staff

New Member

Proxmox Staff Member

New Member

New Member

New Member

New Member

New Member

New Member

Renowned Member

New Member

Active Member

Member

Active Member

Member

New Member

Proxmox Staff Member

Proxmox Staff Member