Backup of VM fails

vb87

New Member
Apr 20, 2022
11
0
1
Germany
Hello Experts,

I´m running a Proxmox-Server with several LXC Containers and one VM.
I´ve configured my Backup with PBS and it ran perfectly until May 27.
Now it fails every day when backing up the one VM and I´m not unterstanding why this happens. I tried some Forum-Entries, but nothing helped.
I also tried an manual Backup to local storage. This worked.
Hopefully someone can help me.

Here are some important additional Infos:

proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.4: 6.4-5
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;net0
cores: 2
memory: 2048
meta: creation-qemu=6.1.0,ctime=1643746981
name: proxy
net0: virtio=CE:80:9A:D2:A7:A1,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: data:vm-106-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=ff365e0b-0310-4c58-bb13-5ab9fd1adda8
sockets: 1
vmgenid: cd2bcdee-45b8-475a-a568-04581f131305
#qmdump#map:scsi0:drive-scsi0:data::

Static hostname: proxy
Icon name: computer-vm
Chassis: vm
Machine ID: ff365e0b03104c58bb135ab9fd1adda8
Boot ID: 1ddb95abe48846b998071d172d70d8dc
Virtualization: kvm
Operating System: Ubuntu 22.04 LTS
Kernel: Linux 5.15.0-39-generic
Architecture: x86-64
Hardware Vendor: QEMU
Hardware Model: Standard PC _i440FX + PIIX, 1996_
qemu-guest-agent/jammy-updates,now 1:6.2+dfsg-2ubuntu6.1

INFO: starting new backup job: vzdump 106 --remove 0 --storage backup --notes-template 'Manual: {{guestname}} ({{vmid}})' --node pve --mode snapshot
INFO: Starting Backup of VM 106 (qemu)
INFO: Backup started at 2022-06-20 20:41:24
INFO: status = running
INFO: VM Name: proxy
INFO: include disk 'scsi0' 'data:vm-106-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/106/2022-06-20T18:41:24Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 106 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 106 failed - VM 106 qmp command 'backup' failed - got timeout
INFO: Failed at 2022-06-20 20:43:30
INFO: Backup job finished with errors
TASK ERROR: job errors

Jun 20 20:41:24 pve pvedaemon[1666636]: INFO: starting new backup job: vzdump 106 --remove 0 --storage backup --notes-template 'Manual: {{guestname}} ({{vmid}})' --node pve --mode snapshot
Jun 20 20:41:24 pve pvedaemon[1666636]: INFO: Starting Backup of VM 106 (qemu)
Jun 20 20:41:58 pve pvedaemon[1541405]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - got timeout
Jun 20 20:42:03 pve pvestatd[3243]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 301 retries
Jun 20 20:42:04 pve pvestatd[3243]: status update time (33.574 seconds)
Jun 20 20:42:37 pve pvestatd[3243]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 301 retries
Jun 20 20:42:37 pve pvestatd[3243]: status update time (33.595 seconds)
Jun 20 20:43:10 pve pvestatd[3243]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 301 retries
Jun 20 20:43:11 pve pvestatd[3243]: status update time (33.612 seconds)
Jun 20 20:43:29 pve pvedaemon[1512066]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 301 retries
 
Hi,

can you ping your PBS host? It looks like it can't establish a connection.
 
It looks to me like I have the exact same issue, although my Setup is a bit different:

My error messages look the same. For me this issue occured after an Update on June 21. Since I am using the enterprise repositories I get the updates later, but I am on the same kernel as @vb87.

The differences: I am not using PBS, because the builtin Backups are good enough for me. I have a Windows VM on the same host that has no backup problems. The Linux VM is stuck after the Backup and can only be restarted/shut-down with "Stop" which ultimately kills the service.

Start-Date: 2022-06-21 13:23:43
Commandline: apt dist-upgrade
Install: pve-kernel-5.15.35-2-pve:amd64 (5.15.35-5, automatic)
Upgrade: dpkg:amd64 (1.20.9, 1.20.10)
cifs-utils:amd64 (2:6.11-3.1, 2:6.11-3.1+deb11u1)
libcups2:amd64 (2.3.3op2-3+deb11u1, 2.3.3op2-3+deb11u2)
tzdata:amd64 (2021a-1+deb11u3, 2021a-1+deb11u4)
pve-qemu-kvm:amd64 (6.2.0-7, 6.2.0-10)
pve-lxc-syscalld:amd64 (1.1.0-1, 1.1.1-1)
proxmox-backup-file-restore:amd64 (2.2.1-1, 2.2.3-1)
libpve-access-control:amd64 (7.1-8, 7.2-2)
rsyslog:amd64 (8.2102.0-2, 8.2102.0-2+deb11u1)
proxmox-backup-client:amd64 (2.2.1-1, 2.2.3-1)
libpve-common-perl:amd64 (7.2-1, 7.2-2)
pve-kernel-5.15:amd64 (7.2-3, 7.2-4)
libnozzle1:amd64 (1.22-pve2, 1.24-pve1)
libknet1:amd64 (1.22-pve2, 1.24-pve1)
pve-kernel-helper:amd64 (7.2-3, 7.2-4)
End-Date: 2022-06-21 13:24:34

INFO: starting new backup job: vzdump --mailnotification failure --mode stop --notes-template '{{guestname}} on {{cluster}}' --compress 0 --quiet 1 --all 1 --storage smb-rndnas02 --prune-backups 'keep-daily=7,keep-monthly=12,keep-weekly=4,keep-yearly=1'
INFO: skip external VMs: 100, 104, 106, 107, 108, 109
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2022-06-24 03:00:02
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: buildbotworker2
INFO: include disk 'virtio0' 'ceph:vm-101-disk-0' 16G
INFO: exclude disk 'virtio1' 'iscsi-rndnas02:0.0.4.scsi-36589cfc000000465f2cc65c0d7d7f874' (backup=no)
INFO: stopping virtual guest
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/mnt/pve/smb-rndnas02/dump/vzdump-qemu-101-2022_06_24-03_00_02.vma'
INFO: starting kvm to execute backup task
no efidisk configured! Using temporary efivars disk.
INFO: started backup task 'd1dc631e-5ab2-428e-ba19-8bbc78d0e6d6'
INFO: resuming VM again after 12 seconds
...
INFO: 92% (14.8 GiB of 16.0 GiB) in 19s, read: 712.7 MiB/s, write: 682.0 MiB/s
ERROR: VM 101 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 101 qmp command 'backup-cancel' failed - unable to connect to VM 101 qmp socket - timeout after 5988 retries
INFO: resuming VM again
ERROR: Backup of VM 101 failed - VM 101 qmp command 'cont' failed - unable to connect to VM 101 qmp socket - timeout after 450 retries
INFO: Failed at 2022-06-24 03:21:20
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2022-06-24 03:21:20
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: buildvm01
INFO: include disk 'scsi0' 'ceph:vm-105-disk-0' 1T
INFO: stopping virtual guest
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/mnt/pve/smb-rndnas02/dump/vzdump-qemu-105-2022_06_24-03_21_20.vma'
INFO: starting kvm to execute backup task
iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
INFO: started backup task 'ff78e3f0-fb5b-4460-ac3e-dbff5de0bf58'
INFO: resuming VM again after 49 seconds
INFO: 0% (3.3 GiB of 1.0 TiB) in 3s, read: 1.1 GiB/s, write: 1.1 GiB/s
...
INFO: 100% (1.0 TiB of 1.0 TiB) in 11m 27s, read: 2.5 GiB/s, write: 68.5 MiB/s
INFO: backup is sparse: 633.89 GiB (61%) total zero data
INFO: transferred 1.00 TiB in 687 seconds (1.5 GiB/s)
INFO: archive file size: 390.25GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-daily=7, keep-monthly=12, keep-weekly=4, keep-yearly=1
INFO: removing backup 'smb-rndnas02:backup/vzdump-qemu-105-2022_06_17-03_01_46.vma'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 105 (00:12:28)
INFO: Backup finished at 2022-06-24 03:33:48
INFO: Backup job finished with errors
TASK ERROR: job errors

Jun 28 09:09:44 lmsman-hive02 pvedaemon[2327875]: stop VM 101: UPID:lmsman-hive02:00238543:020A500E:62BAA938:qmstop:101:jmlms@pve:
Jun 28 09:09:44 lmsman-hive02 pvedaemon[1913]: <jmlms@pve> starting task UPID:lmsman-hive02:00238543:020A500E:62BAA938:qmstop:101:jmlms@pve:
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopping User Manager for UID 0...
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Main User Target.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Basic System.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Paths.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Sockets.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Timers.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: dirmngr.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG network certificate management daemon.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-browser.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-extra.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-ssh.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Removed slice User Application Slice.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Reached target Shutdown.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: systemd-exit.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Finished Exit the Session.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Reached target Exit the Session.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: user@0.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopped User Manager for UID 0.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopping User Runtime Directory /run/user/0...
Jun 28 09:09:47 lmsman-hive02 systemd[1]: run-user-0.mount: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: user-runtime-dir@0.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopped User Runtime Directory /run/user/0.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Removed slice User Slice of UID 0.
Jun 28 09:09:47 lmsman-hive02 pvedaemon[1912]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qm>
Jun 28 09:09:47 lmsman-hive02 pvedaemon[2327875]: VM 101 qmp command failed - VM 101 qmp command 'quit' failed - unable to connect to VM 101 qmp socket - tim>
Jun 28 09:09:47 lmsman-hive02 pvedaemon[2327875]: VM quit/powerdown failed - terminating now with SIGTERM
Jun 28 09:09:48 lmsman-hive02 pvestatd[1857]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp>
Jun 28 09:09:48 lmsman-hive02 pvestatd[1857]: status update time (6.160 seconds)
Jun 28 09:09:57 lmsman-hive02 pvedaemon[2327875]: VM still running - terminating now with SIGKILL
Jun 28 09:09:57 lmsman-hive02 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 28 09:09:57 lmsman-hive02 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 28 09:09:58 lmsman-hive02 qmeventd[997]: read: Connection reset by peer
Jun 28 09:09:58 lmsman-hive02 pvestatd[1857]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp>
Jun 28 09:09:58 lmsman-hive02 systemd[1]: 101.scope: Succeeded.
Jun 28 09:09:58 lmsman-hive02 systemd[1]: 101.scope: Consumed 38.411s CPU time.
Jun 28 09:09:58 lmsman-hive02 pvestatd[1857]: status update time (5.744 seconds)
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: Starting cleanup for 101
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: trying to acquire lock...
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: OK
 
Hi,
It looks to me like I have the exact same issue, although my Setup is a bit different:

My error messages look the same. For me this issue occured after an Update on June 21. Since I am using the enterprise repositories I get the updates later, but I am on the same kernel as @vb87.

The differences: I am not using PBS, because the builtin Backups are good enough for me. I have a Windows VM on the same host that has no backup problems. The Linux VM is stuck after the Backup and can only be restarted/shut-down with "Stop" which ultimately kills the service.

Start-Date: 2022-06-21 13:23:43
Commandline: apt dist-upgrade
Install: pve-kernel-5.15.35-2-pve:amd64 (5.15.35-5, automatic)
Upgrade: dpkg:amd64 (1.20.9, 1.20.10)
cifs-utils:amd64 (2:6.11-3.1, 2:6.11-3.1+deb11u1)
libcups2:amd64 (2.3.3op2-3+deb11u1, 2.3.3op2-3+deb11u2)
tzdata:amd64 (2021a-1+deb11u3, 2021a-1+deb11u4)
pve-qemu-kvm:amd64 (6.2.0-7, 6.2.0-10)
pve-lxc-syscalld:amd64 (1.1.0-1, 1.1.1-1)
proxmox-backup-file-restore:amd64 (2.2.1-1, 2.2.3-1)
libpve-access-control:amd64 (7.1-8, 7.2-2)
rsyslog:amd64 (8.2102.0-2, 8.2102.0-2+deb11u1)
proxmox-backup-client:amd64 (2.2.1-1, 2.2.3-1)
libpve-common-perl:amd64 (7.2-1, 7.2-2)
pve-kernel-5.15:amd64 (7.2-3, 7.2-4)
libnozzle1:amd64 (1.22-pve2, 1.24-pve1)
libknet1:amd64 (1.22-pve2, 1.24-pve1)
pve-kernel-helper:amd64 (7.2-3, 7.2-4)
End-Date: 2022-06-21 13:24:34

INFO: starting new backup job: vzdump --mailnotification failure --mode stop --notes-template '{{guestname}} on {{cluster}}' --compress 0 --quiet 1 --all 1 --storage smb-rndnas02 --prune-backups 'keep-daily=7,keep-monthly=12,keep-weekly=4,keep-yearly=1'
INFO: skip external VMs: 100, 104, 106, 107, 108, 109
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2022-06-24 03:00:02
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: buildbotworker2
INFO: include disk 'virtio0' 'ceph:vm-101-disk-0' 16G
INFO: exclude disk 'virtio1' 'iscsi-rndnas02:0.0.4.scsi-36589cfc000000465f2cc65c0d7d7f874' (backup=no)
INFO: stopping virtual guest
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/mnt/pve/smb-rndnas02/dump/vzdump-qemu-101-2022_06_24-03_00_02.vma'
INFO: starting kvm to execute backup task
no efidisk configured! Using temporary efivars disk.
INFO: started backup task 'd1dc631e-5ab2-428e-ba19-8bbc78d0e6d6'
INFO: resuming VM again after 12 seconds
...
INFO: 92% (14.8 GiB of 16.0 GiB) in 19s, read: 712.7 MiB/s, write: 682.0 MiB/s
ERROR: VM 101 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 101 qmp command 'backup-cancel' failed - unable to connect to VM 101 qmp socket - timeout after 5988 retries
INFO: resuming VM again
ERROR: Backup of VM 101 failed - VM 101 qmp command 'cont' failed - unable to connect to VM 101 qmp socket - timeout after 450 retries
INFO: Failed at 2022-06-24 03:21:20
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2022-06-24 03:21:20
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: buildvm01
INFO: include disk 'scsi0' 'ceph:vm-105-disk-0' 1T
INFO: stopping virtual guest
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/mnt/pve/smb-rndnas02/dump/vzdump-qemu-105-2022_06_24-03_21_20.vma'
INFO: starting kvm to execute backup task
iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
INFO: started backup task 'ff78e3f0-fb5b-4460-ac3e-dbff5de0bf58'
INFO: resuming VM again after 49 seconds
INFO: 0% (3.3 GiB of 1.0 TiB) in 3s, read: 1.1 GiB/s, write: 1.1 GiB/s
...
INFO: 100% (1.0 TiB of 1.0 TiB) in 11m 27s, read: 2.5 GiB/s, write: 68.5 MiB/s
INFO: backup is sparse: 633.89 GiB (61%) total zero data
INFO: transferred 1.00 TiB in 687 seconds (1.5 GiB/s)
INFO: archive file size: 390.25GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-daily=7, keep-monthly=12, keep-weekly=4, keep-yearly=1
INFO: removing backup 'smb-rndnas02:backup/vzdump-qemu-105-2022_06_17-03_01_46.vma'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 105 (00:12:28)
INFO: Backup finished at 2022-06-24 03:33:48
INFO: Backup job finished with errors
TASK ERROR: job errors

Jun 28 09:09:44 lmsman-hive02 pvedaemon[2327875]: stop VM 101: UPID:lmsman-hive02:00238543:020A500E:62BAA938:qmstop:101:jmlms@pve:
Jun 28 09:09:44 lmsman-hive02 pvedaemon[1913]: <jmlms@pve> starting task UPID:lmsman-hive02:00238543:020A500E:62BAA938:qmstop:101:jmlms@pve:
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopping User Manager for UID 0...
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Main User Target.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Basic System.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Paths.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Sockets.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Stopped target Timers.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: dirmngr.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG network certificate management daemon.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-browser.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-extra.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent-ssh.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: gpg-agent.socket: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Closed GnuPG cryptographic agent and passphrase cache.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Removed slice User Application Slice.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Reached target Shutdown.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: systemd-exit.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Finished Exit the Session.
Jun 28 09:09:47 lmsman-hive02 systemd[2327775]: Reached target Exit the Session.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: user@0.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopped User Manager for UID 0.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopping User Runtime Directory /run/user/0...
Jun 28 09:09:47 lmsman-hive02 systemd[1]: run-user-0.mount: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: user-runtime-dir@0.service: Succeeded.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Stopped User Runtime Directory /run/user/0.
Jun 28 09:09:47 lmsman-hive02 systemd[1]: Removed slice User Slice of UID 0.
Jun 28 09:09:47 lmsman-hive02 pvedaemon[1912]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qm>
Jun 28 09:09:47 lmsman-hive02 pvedaemon[2327875]: VM 101 qmp command failed - VM 101 qmp command 'quit' failed - unable to connect to VM 101 qmp socket - tim>
Jun 28 09:09:47 lmsman-hive02 pvedaemon[2327875]: VM quit/powerdown failed - terminating now with SIGTERM
Jun 28 09:09:48 lmsman-hive02 pvestatd[1857]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp>
Jun 28 09:09:48 lmsman-hive02 pvestatd[1857]: status update time (6.160 seconds)
Jun 28 09:09:57 lmsman-hive02 pvedaemon[2327875]: VM still running - terminating now with SIGKILL
Jun 28 09:09:57 lmsman-hive02 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 28 09:09:57 lmsman-hive02 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Jun 28 09:09:58 lmsman-hive02 qmeventd[997]: read: Connection reset by peer
Jun 28 09:09:58 lmsman-hive02 pvestatd[1857]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - unable to connect to VM 101 qmp>
Jun 28 09:09:58 lmsman-hive02 systemd[1]: 101.scope: Succeeded.
Jun 28 09:09:58 lmsman-hive02 systemd[1]: 101.scope: Consumed 38.411s CPU time.
Jun 28 09:09:58 lmsman-hive02 pvestatd[1857]: status update time (5.744 seconds)
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: Starting cleanup for 101
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: trying to acquire lock...
Jun 28 09:09:58 lmsman-hive02 qmeventd[2327988]: OK
could you please share the output of qm config 101 and tell us what type of storages you are using? If the issue can be reproduced reliably, could you please
  1. Install pve-qemu-kvm-dbg and gdb
  2. Start the VM 101
  3. Run gdb --ex 'handle SIGUSR1 nostop noprint' --ex 'handle SIGPIPE nostop noprint' --ex 'set pagination off' --ex 'c' -p $(qm status 101 --verbose | grep pid: | cut -d: -f2)
  4. Start the backup and wait for the VM to hang/crash.
  5. Enter t a a bt in gdb and share the output here. If don't get a prompt in gdb (i.e. are not able to input commands), you might need to press Ctrl+C first.
EDIT: Fix typo in command
 
Last edited:
balloon: 1024
bios: ovmf
boot: order=virtio0;net0
cores: 30
cpu: host
description: virtio1%3A iscsi-rndnas02%3A0.0.2.scsi-36589cfc000000719a1ec4b624b03d049,backup=0,discard=on,iothread=1,size=1073741840K
memory: 65536
meta: creation-qemu=6.1.1,ctime=1651842250
name: buildbotworker2
net0: virtio=36:B5:E3:DF:1E:AD,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
parent: VorUpdate_280622
scsihw: virtio-scsi-pci
smbios1: uuid=009f82ba-fd0d-4795-912e-c3cbde2e55d2
sockets: 1
vga: virtio
virtio0: ceph:vm-101-disk-0,discard=on,iothread=1,size=16G
virtio1: iscsi-rndnas02:0.0.4.scsi-36589cfc000000465f2cc65c0d7d7f874,backup=0,discard=on,iothread=1,size=1073741840K
vmgenid: 0774a5a0-13e0-4ffd-a144-d8e8754a2844

Sorry, I forgot to mention that this does not happen every time, so my issue is not that easy to reproduce. I also opened support-ticket 1134922, since my problem seems a bit different. Sorry for hijacking this thread...
 
The PBS Host is reachable. All Backups of LXC-Containers, which are running before and after the VM are finishing successful.
I configured a second VM yesterday and this also fails.
 
I`ve created a second VM wich I won't Backup and get the same Error.
In syslog I can see the following Infos:
Code:
Jul 07 18:45:40 pve pveproxy[3385]: starting 1 worker(s)
Jul 07 18:45:40 pve pveproxy[3385]: worker 4127053 started
Jul 07 18:46:12 pve pvestatd[3322]: VM 301 qmp command failed - VM 301 qmp command 'query-proxmox-support' failed - unable to connect to VM 301 qmp socket - timeout after 301 retries
Jul 07 18:46:12 pve pvestatd[3322]: status update time (33.575 seconds)
Jul 07 18:46:27 pve pvedaemon[4123492]: VM 301 qmp command failed - VM 301 qmp command 'backup' failed - got timeout
Jul 07 18:46:27 pve pvedaemon[4123492]: ERROR: Backup of VM 301 failed - VM 301 qmp command 'backup' failed - got timeout
Jul 07 18:46:27 pve pvedaemon[4123492]: INFO: Backup job finished with errors
Jul 07 18:46:27 pve pvedaemon[4123492]: job errors

My nighlty E-Mail Overview looks like this:
VMIDNAMESTATUSTIMESIZEFILENAME
100nextcloudOK00:34:32
126.78GB​
ct/100/2022-07-04T21:00:02Z
101nasOK00:22:05
180.24GB​
ct/101/2022-07-04T21:34:34Z
102svnOK00:00:15
3.34GB​
ct/102/2022-07-04T21:56:39Z
103gitlabOK00:02:48
9.68GB​
ct/103/2022-07-04T21:56:54Z
104kuraikoOK00:00:13
1.81GB​
ct/104/2022-07-04T21:59:42Z
105wunschlistenOK00:00:12
1.73GB​
ct/105/2022-07-04T21:59:55Z
106proxyFAILED00:02:06VM 106 qmp command 'backup' failed - got timeout
107owncastOK00:00:06
802MB​
ct/107/2022-07-04T22:02:13Z
110carstenOK00:00:13
1.92GB​
ct/110/2022-07-04T22:02:19Z
300openhabOK00:01:27
2.49GB​
ct/300/2022-07-04T22:02:32Z
301debmaticFAILED00:02:06VM 301 qmp command 'backup' failed - got timeout
TOTAL​
01:06:03328.77GB
 
Todays Update to
Code:
qemu-guest-agent/jammy-updates,now 1:6.2+dfsg-2ubuntu6.3 amd64
and to
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.39-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-6
pve-kernel-helper: 7.2-6
pve-kernel-5.13: 7.1-9
pve-kernel-5.4: 6.4-5
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-7
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.4-1
proxmox-backup-file-restore: 2.2.4-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-11
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
did not solve the problem.
 
Problem is unrsolved.

I found a workaroud for to backup the VMs.
I installed an LXC container with PBS and do the backup localy and than backup the LXC-Container with local PBS to remote PBS.
 
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 106 qmp command 'backup' failed - got timeout

guest agent on this machine is working?
 
Yes I think so.
At least I can see the IP-Address in Proxmox-Web-Gui, which is not possible if gues-agent is missing or not running.
 
I have the same vzdump output and backup failure with 1 VM in a cluster of +- 200VM's. The VM doesn't hang/crash, only the backup fails.
The only different with other VM's is that it's a very big VM with a 50TB disk on a ceph storage. This is getting backed up to a pbs server. The daily backup fails most of the times although it sometimes succeeds. If I run it manually however, it works. No other backup is running at that time. Any clue how to debug this further?

INFO: starting new backup job: vzdump 152 --mailto ***** --mailnotification failure --notes-template '{{guestname}}' --quiet 1 --mode snapshot --storage pbs0-week
INFO: Starting Backup of VM 152 (qemu)
INFO: Backup started at 2023-11-22 05:00:01
INFO: status = running
INFO: VM Name: ******
INFO: include disk 'scsi0' 'rbd:vm-152-disk-0' 100G
INFO: include disk 'scsi1' 'rbd:vm-152-disk-1' 50000G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/152/2023-11-22T04:00:01Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 152 qmp command 'backup' failed - got timeout
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 152 failed - VM 152 qmp command 'backup' failed - got timeout
INFO: Failed at 2023-11-22 05:04:09
INFO: Backup job finished with errors
INFO: notified via target `*******`
INFO: notified via target `mail-to-root`
TASK ERROR: job errors

2023-11-22T05:00:01+01:00: starting new backup on datastore '*****' from ::ffff:****: "ns/week/vm/152/2023-11-22T04:00:01Z"
2023-11-22T05:00:01+01:00: download 'index.json.blob' from previous backup.
2023-11-22T05:00:01+01:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2023-11-22T05:00:01+01:00: download 'drive-scsi0.img.fidx' from previous backup.
2023-11-22T05:00:01+01:00: created new fixed index 1 ("ns/week/vm/152/2023-11-22T04:00:01Z/drive-scsi0.img.fidx")
2023-11-22T05:00:01+01:00: register chunks in 'drive-scsi1.img.fidx' from previous backup.
2023-11-22T05:03:33+01:00: download 'drive-scsi1.img.fidx' from previous backup.
2023-11-22T05:04:08+01:00: created new fixed index 2 ("ns/week/vm/152/2023-11-22T04:00:01Z/drive-scsi1.img.fidx")
2023-11-22T05:04:09+01:00: add blob "/mnt/datastore/zfs/****/ns/week/vm/152/2023-11-22T04:00:01Z/qemu-server.conf.blob" (428 bytes, comp: 428)
2023-11-22T05:04:09+01:00: backup ended and finish failed: backup ended but finished flag is not set.
2023-11-22T05:04:09+01:00: removing unfinished backup
2023-11-22T05:04:09+01:00: TASK ERROR: backup ended but finished flag is not set.

proxmox-ve: 8.0.2 (running kernel: 6.2.16-10-pve)
pve-manager: 8.0.9 (running version: 8.0.9/fd1a0ae1b385cdcd)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.5
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2: 6.2.16-19
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.7-pve1
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx6
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.6
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.10
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.5
proxmox-mail-forward: 0.2.1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.1
pve-cluster: 8.0.5
pve-container: 5.0.5
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.0.7
pve-qemu-kvm: 8.1.2-2
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.8
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3
 
No, I already thought of that and set the backup after all other backups on that cluster are finished. The backups on the cluster work together without any issues. Unless it's an issue on the pbs, as other backups from other clusters are possible at that time however we don't have any issues with those VM's either.
 
Since it works manually, I did suspect the timing.
Is anything else running on PBS at backup time such as verification, pruning ?
What are the specs of the PBS server ? What's the connection speed between PVE and PBS ?
 
I have the same issue with an Ubuntu guest. PVE 8.0.9

ERROR: VM 108 qmp command 'cont' failed - got timeout

Nothing else running on host or PBS. 1Gb switch
 
Hi,
I have the same vzdump output and backup failure with 1 VM in a cluster of +- 200VM's. The VM doesn't hang/crash, only the backup fails.
The only different with other VM's is that it's a very big VM with a 50TB disk on a ceph storage. This is getting backed up to a pbs server. The daily backup fails most of the times although it sometimes succeeds. If I run it manually however, it works. No other backup is running at that time. Any clue how to debug this further?
I feel like the huge disk could indeed be the cause of the issue here. Can you share the VM configuration qm config 152?
2023-11-22T05:00:01+01:00: register chunks in 'drive-scsi1.img.fidx' from previous backup.
2023-11-22T05:03:33+01:00: download 'drive-scsi1.img.fidx' from previous backup.
This step is taking over three minutes. The timeout for the backup command for the VM is just over 2 minutes.

Unfortunately, I don't have time right now to look into the issue. I'll add it to my work pile, but could you please open a bug report referencing back to this thread, so we don't loose track of it and maybe somebody else has time to look at it: https://bugzilla.proxmox.com/
 
Hi,
I have the same issue with an Ubuntu guest. PVE 8.0.9

ERROR: VM 108 qmp command 'cont' failed - got timeout

Nothing else running on host or PBS. 1Gb switch
that's a different error message, so might be a different issue ;) Or do you also have a huge disk? Please share the output of pveversion -v, qm config 108 and the full backup task log. Is the guest still responsive after such a failure?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!