ProxMox backup job errors / qmp command 'query-backup' failed

AreUSirius

New Member
May 28, 2021
11
0
1
28
Hey there!

since a few days i have problems to do a backup from a vm. Everyday at 10 p.m. it autostarts the backup job but it ends most of the time with an error. I tried snapshot and stopped mode and lzo or zstd compression - backup is stored via nfs to a NAS (TrueNAS). Do you guys know how to fix this ?

found on another forum that open files limit can cause this error. changed the size but still the same error

backup job errors:

Code:
INFO: starting new backup job: vzdump 102 --compress zstd --mode stop --mailnotification always --storage st1-proxmox-backup --quiet 1
INFO: Starting Backup of VM 102 (qemu)
INFO: Backup started at 2021-12-14 22:00:02
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: xx
INFO: include disk 'virtio0' 'st1-proxmox-vms:102/vm-102-disk-0.qcow2' 50G
INFO: stopping virtual guest
INFO: creating vzdump archive '/mnt/pve/st1-proxmox-backup/dump/vzdump-qemu-102-2021_12_14-22_00_02.vma.zst'
INFO: starting kvm to execute backup task
INFO: started backup task 'e5d1d2d9-18f3-428e-aedc-776b25a2b55b'
INFO: resuming VM again after 16 seconds
INFO:   1% (781.3 MiB of 50.0 GiB) in 3s, read: 260.4 MiB/s, write: 107.9 MiB/s
INFO:   3% (1.9 GiB of 50.0 GiB) in 6s, read: 404.1 MiB/s, write: 27.5 MiB/s
INFO:   4% (2.3 GiB of 50.0 GiB) in 9s, read: 119.6 MiB/s, write: 113.5 MiB/s
INFO:   5% (2.7 GiB of 50.0 GiB) in 12s, read: 120.8 MiB/s, write: 117.6 MiB/s
INFO:   6% (3.5 GiB of 50.0 GiB) in 15s, read: 276.7 MiB/s, write: 74.6 MiB/s
INFO:   9% (4.6 GiB of 50.0 GiB) in 18s, read: 381.8 MiB/s, write: 26.5 MiB/s
INFO:  11% (5.7 GiB of 50.0 GiB) in 21s, read: 397.5 MiB/s, write: 29.5 MiB/s
INFO:  12% (6.2 GiB of 50.0 GiB) in 24s, read: 160.7 MiB/s, write: 151.7 MiB/s
INFO:  13% (6.6 GiB of 50.0 GiB) in 27s, read: 132.9 MiB/s, write: 129.1 MiB/s
INFO:  14% (7.0 GiB of 50.0 GiB) in 31s, read: 108.4 MiB/s, write: 103.0 MiB/s
INFO:  15% (7.6 GiB of 50.0 GiB) in 37s, read: 91.5 MiB/s, write: 87.4 MiB/s
INFO:  16% (8.1 GiB of 50.0 GiB) in 42s, read: 107.3 MiB/s, write: 89.0 MiB/s
INFO:  17% (8.5 GiB of 50.0 GiB) in 47s, read: 89.9 MiB/s, write: 84.4 MiB/s
INFO:  18% (9.0 GiB of 50.0 GiB) in 53s, read: 88.2 MiB/s, write: 84.8 MiB/s
INFO:  19% (9.6 GiB of 50.0 GiB) in 57s, read: 130.5 MiB/s, write: 96.8 MiB/s
INFO:  20% (10.1 GiB of 50.0 GiB) in 1m 1s, read: 132.3 MiB/s, write: 128.4 MiB/s
INFO:  21% (10.5 GiB of 50.0 GiB) in 1m 4s, read: 149.8 MiB/s, write: 136.8 MiB/s
INFO:  22% (11.1 GiB of 50.0 GiB) in 1m 8s, read: 145.0 MiB/s, write: 144.6 MiB/s
INFO:  23% (11.6 GiB of 50.0 GiB) in 1m 11s, read: 175.2 MiB/s, write: 150.6 MiB/s
INFO:  24% (12.1 GiB of 50.0 GiB) in 1m 14s, read: 174.0 MiB/s, write: 157.7 MiB/s
INFO:  25% (12.6 GiB of 50.0 GiB) in 1m 17s, read: 157.1 MiB/s, write: 154.9 MiB/s
INFO:  26% (13.1 GiB of 50.0 GiB) in 1m 21s, read: 141.2 MiB/s, write: 139.7 MiB/s
INFO:  27% (13.6 GiB of 50.0 GiB) in 1m 24s, read: 152.4 MiB/s, write: 111.9 MiB/s
INFO:  28% (14.1 GiB of 50.0 GiB) in 1m 36s, read: 46.5 MiB/s, write: 46.3 MiB/s
INFO:  29% (14.6 GiB of 50.0 GiB) in 1m 40s, read: 118.7 MiB/s, write: 118.5 MiB/s
INFO:  30% (15.0 GiB of 50.0 GiB) in 1m 44s, read: 120.5 MiB/s, write: 117.1 MiB/s
INFO:  31% (15.5 GiB of 50.0 GiB) in 1m 48s, read: 129.6 MiB/s, write: 105.1 MiB/s
INFO:  32% (16.4 GiB of 50.0 GiB) in 1m 51s, read: 297.2 MiB/s, write: 46.0 MiB/s
INFO:  35% (17.5 GiB of 50.0 GiB) in 1m 54s, read: 383.0 MiB/s, write: 27.0 MiB/s
INFO:  37% (18.8 GiB of 50.0 GiB) in 1m 57s, read: 437.4 MiB/s, write: 8.9 MiB/s
INFO:  40% (20.2 GiB of 50.0 GiB) in 2m, read: 466.4 MiB/s, write: 45.3 KiB/s
INFO:  43% (21.7 GiB of 50.0 GiB) in 2m 3s, read: 500.2 MiB/s, write: 184.0 KiB/s
INFO:  46% (23.1 GiB of 50.0 GiB) in 2m 6s, read: 502.7 MiB/s, write: 0 B/s
INFO:  48% (24.4 GiB of 50.0 GiB) in 2m 9s, read: 439.8 MiB/s, write: 16.2 MiB/s
INFO:  49% (24.8 GiB of 50.0 GiB) in 2m 12s, read: 117.2 MiB/s, write: 107.5 MiB/s
INFO:  50% (25.1 GiB of 50.0 GiB) in 2m 15s, read: 108.8 MiB/s, write: 100.6 MiB/s
INFO:  51% (25.5 GiB of 50.0 GiB) in 2m 19s, read: 113.3 MiB/s, write: 105.9 MiB/s
INFO:  52% (26.1 GiB of 50.0 GiB) in 2m 24s, read: 124.0 MiB/s, write: 104.7 MiB/s
INFO:  53% (26.5 GiB of 50.0 GiB) in 2m 27s, read: 140.2 MiB/s, write: 100.8 MiB/s
INFO:  54% (27.1 GiB of 50.0 GiB) in 2m 32s, read: 110.9 MiB/s, write: 107.7 MiB/s
INFO:  55% (27.6 GiB of 50.0 GiB) in 2m 37s, read: 103.7 MiB/s, write: 92.6 MiB/s
INFO:  56% (28.0 GiB of 50.0 GiB) in 2m 41s, read: 110.6 MiB/s, write: 104.1 MiB/s
INFO:  57% (28.6 GiB of 50.0 GiB) in 2m 45s, read: 138.7 MiB/s, write: 93.5 MiB/s
ERROR: VM 102 qmp command 'query-backup' failed - client closed connection
INFO: aborting backup job
ERROR: VM 102 not running
INFO: resuming VM again
trying to acquire lock...
 OK
no lock found trying to remove 'backup'  lock
INFO: restarting vm
ERROR: Backup of VM 102 failed - VM 102 not running
INFO: Failed at 2021-12-14 22:28:33
INFO: Backup job finished with errors
TASK ERROR: job errors

logs from another test:

Code:
Dec 15 10:20:21 hwdeka3 pvedaemon[1222123]: INFO: starting new backup job: vzdump 102 --compress zstd --remove 0 --mode snapshot --node hwdeka3 --storage st1-proxmox-backup
Dec 15 10:20:21 hwdeka3 pvedaemon[1222123]: INFO: Starting Backup of VM 102 (qemu)
Dec 15 10:20:21 hwdeka3 pvedaemon[3996390]: worker exit
Dec 15 10:20:21 hwdeka3 pvedaemon[13710]: worker 3996390 finished
Dec 15 10:20:21 hwdeka3 pvedaemon[13710]: starting 1 worker(s)
Dec 15 10:20:21 hwdeka3 pvedaemon[13710]: worker 1222139 started
Dec 15 10:20:50 hwdeka3 filebeat[12794]: 2021-12-15T10:20:50.484+0100#011INFO#011[monitoring]#011log/log.go:184#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":106905>
Dec 15 10:21:20 hwdeka3 filebeat[12794]: 2021-12-15T10:21:20.483+0100#011INFO#011[monitoring]#011log/log.go:184#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":61440}>
Dec 15 10:21:47 hwdeka3 postfix/qmgr[851724]: DCFF1608B4: from=<root@>, size=10706, nrcpt=1 (queue active)
Dec 15 10:21:47 hwdeka3 nslcd[12897]: [b860a5] <group/member="postfix"> ldap_result() failed: Can't contact LDAP server
Dec 15 10:21:47 hwdeka3 postfix/error[1222451]: DCFF1608B4: to=<>, relay=none, delay=393263, delays=393263/0.02/0/0.01, dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to [10.10.>
Dec 15 10:21:50 hwdeka3 filebeat[12794]: 2021-12-15T10:21:50.484+0100#011INFO#011[monitoring]#011log/log.go:184#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":-98304>
Dec 15 10:22:20 hwdeka3 filebeat[12794]: 2021-12-15T10:22:20.484+0100#011INFO#011[monitoring]#011log/log.go:184#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":139264>
Dec 15 10:22:45 hwdeka3 icinga2[1213134]: [2021-12-15 10:22:45 +0100] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min);
Dec 15 10:22:50 hwdeka3 filebeat[12794]: 2021-12-15T10:22:50.484+0100#011INFO#011[monitoring]#011log/log.go:184#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":-66764>
Dec 15 10:23:20 hwdeka3 filebeat[12794]: 2021-12-15T10:23:20.484+0100#011INFO#011[monitoring]#011log/log.go:184#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":-12206>
Dec 15 10:23:25 hwdeka3 icinga2[1213134]: [2021-12-15 10:23:25 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
Dec 15 10:23:39 hwdeka3 kernel: [2831711.135848] kvm[1221898]: segfault at 10 ip 0000557cd7339d10 sp 00007f2f25be16a8 error 4 in qemu-system-x86_64[557cd6e31000+545000]
Dec 15 10:23:39 hwdeka3 kernel: [2831711.135883] Code: ff ff ff e8 e2 7c af ff 48 8b 0d 8b a1 56 00 ba 21 00 00 00 be 01 00 00 00 48 8d 3d 7a 1c 1d 00 e8 a5 92 af ff e8 d0 a8 af ff <48> 83 7f 10 00 0f 95 c0 c3 0f 1f 80 00 00 00 00 48 8b >
Dec 15 10:23:39 hwdeka3 pvedaemon[1222123]: VM 102 qmp command failed - VM 102 qmp command 'query-backup' failed - client closed connection
Dec 15 10:23:39 hwdeka3 kernel: [2831711.184245] fwbr102i0: port 2(tap102i0) entered disabled state
Dec 15 10:23:39 hwdeka3 kernel: [2831711.184397] fwbr102i0: port 2(tap102i0) entered disabled state
Dec 15 10:23:39 hwdeka3 systemd[1]: 102.scope: Succeeded.
Dec 15 10:23:39 hwdeka3 systemd[1]: 102.scope: Consumed 2min 169ms CPU time.
Dec 15 10:23:39 hwdeka3 pvedaemon[1222123]: VM 102 qmp command failed - VM 102 not running
Dec 15 10:23:39 hwdeka3 pvedaemon[1222123]: VM 102 qmp command failed - VM 102 not running
Dec 15 10:23:40 hwdeka3 pvedaemon[1222123]: ERROR: Backup of VM 102 failed - VM 102 not running
Dec 15 10:23:40 hwdeka3 pvedaemon[1222123]: INFO: Backup job finished with errors
Dec 15 10:23:40 hwdeka3 pvedaemon[1222123]: job errors
Dec 15 10:23:40 hwdeka3 qmeventd[1222835]: Starting cleanup for 102
Dec 15 10:23:40 hwdeka3 kernel: [2831711.860561] fwbr102i0: port 1(fwln102i0) entered disabled state
Dec 15 10:23:40 hwdeka3 kernel: [2831711.860669] vmbr0: port 2(fwpr102p0) entered disabled state
Dec 15 10:23:40 hwdeka3 kernel: [2831711.862125] device fwln102i0 left promiscuous mode
Dec 15 10:23:40 hwdeka3 kernel: [2831711.862128] fwbr102i0: port 1(fwln102i0) entered disabled state
Dec 15 10:23:40 hwdeka3 kernel: [2831711.884456] device fwpr102p0 left promiscuous mode
Dec 15 10:23:40 hwdeka3 kernel: [2831711.884459] vmbr0: port 2(fwpr102p0) entered disabled state
Dec 15 10:23:40 hwdeka3 qmeventd[1222835]: Finished cleanup for 102

pveversion:

Code:
proxmox-ve: 7.1-1 (running kernel: 5.11.22-4-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

vm (runs debian buster):

Code:
qm status 102 --verbose

balloon: 12886999040
ballooninfo:
        actual: 12886999040
        free_mem: 10927874048
        last_update: 1639561440
        major_page_faults: 3117
        max_mem: 12886999040
        mem_swapped_in: 0
        mem_swapped_out: 0
        minor_page_faults: 3014327
        total_mem: 12602740736
blockstat:
        ide2:
                account_failed: 0
                account_invalid: 0
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 0
                flush_total_time_ns: 0
                idle_time_ns: 917614785791
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 152
                rd_merged: 0
                rd_operations: 4
                rd_total_time_ns: 80527
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 0
                wr_highest_offset: 0
                wr_merged: 0
                wr_operations: 0
                wr_total_time_ns: 0
        virtio0:
                account_failed: 1
                account_invalid: 1
                failed_flush_operations: 0
                failed_rd_operations: 0
                failed_unmap_operations: 0
                failed_wr_operations: 0
                flush_operations: 6666
                flush_total_time_ns: 181085740
                idle_time_ns: 1081979725
                invalid_flush_operations: 0
                invalid_rd_operations: 0
                invalid_unmap_operations: 0
                invalid_wr_operations: 0
                rd_bytes: 635087872
                rd_merged: 5393
                rd_operations: 24446
                rd_total_time_ns: 13961023686
                timed_stats:
                unmap_bytes: 0
                unmap_merged: 0
                unmap_operations: 0
                unmap_total_time_ns: 0
                wr_bytes: 293543936
                wr_highest_offset: 44884332544
                wr_merged: 17
                wr_operations: 21619
                wr_total_time_ns: 99755817049
cpus: 8
disk: 0
diskread: 635088024
diskwrite: 293543936
freemem: 10927874048
maxdisk: 53687091200
maxmem: 12886999040
mem: 1674866688
name: xx
netin: 7243210
netout: 163066439
nics:
        tap102i0:
                netin: 7243210
                netout: 163066439
pid: 1223965
proxmox-support:
        pbs-dirty-bitmap: 1
        pbs-dirty-bitmap-migration: 1
        pbs-dirty-bitmap-savevm: 1
        pbs-library-version: 1.2.0 (6e555bc73a7dcfb4d0b47355b958afd101ad27b5)
        pbs-masterkey: 1
        query-bitmap-info: 1
qmpstatus: running
running-machine: pc-i440fx-6.1+pve0
running-qemu: 6.1.0
status: running
uptime: 931
vmid: 102
 
Last edited:
It seems your VM crashes:
Code:
Dec 15 10:23:39 hwdeka3 kernel: [2831711.135848] kvm[1221898]: segfault at 10 ip 0000557cd7339d10 sp 00007f2f25be16a8 error 4 in qemu-system-x86_64[557cd6e31000+545000]

Is it always the same VM?
 
It seems your VM crashes:
Code:
Dec 15 10:23:39 hwdeka3 kernel: [2831711.135848] kvm[1221898]: segfault at 10 ip 0000557cd7339d10 sp 00007f2f25be16a8 error 4 in qemu-system-x86_64[557cd6e31000+545000]

Is it always the same VM?

yes its always the same vm. i got an information from another forum that moving the virtual disk to another storage and back can fix the problem. since that the last backup was successful.