Backup failed - timeout waiting on systemd

slot

Member
Mar 1, 2021
6
0
6
54
Hello,

I have problem with backup VM. I got error: "Backup of VM 100 failed - timeout waiting on systemd". Backups are making everyday at 23:30. Error show random, sometimes I have no problem during the week, sometimes I get 5 days in the row with this error.

Server supermicro, 128GB RAM, 2 disks SSD Samsung PM883 1.92TB (MZ7LH1T9HMLT-00005) with ZFS RAID 1 (rpool) and 2 disks HDD 10TB also ZFS RAID 1 (dane). Server age is about 10 month. I checked all disks with short and extended SMART.

VM I got Windows 10 with 64GB ram, 1TB disk on SSD pool, 1TB disk on HDD pool.

Backups are making on pool dane (HDD disks).

I upgraded proxmox from 7.0 to 7.1 but still i got this error.

Code:
pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-13
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-6-pve: 5.13.19-14
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-6
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

Code:
root@proxmox:~# qm config 100
agent: 1
boot: order=scsi0;net0
cores: 16
machine: pc-i440fx-5.2
memory: 65536
name: Win10
net0: virtio=46:8C:D4:C2:61:BB,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win10
scsi0: local-zfs:vm-100-disk-0,discard=on,size=1000G,ssd=1
scsi1: dane-zfs:vm-100-disk-0,discard=on,size=1000G
scsihw: virtio-scsi-pci
smbios1: uuid=a72d3f04-760f-4bef-ace0-2f02098a27b5
sockets: 1

Code:
INFO: starting new backup job: vzdump 100 --node proxmox --compress zstd --prune-backups 'keep-last=5' --quiet 1 --mode stop --mailnotification always --storage dane
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2022-03-23 23:30:03
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: Win10
INFO: include disk 'scsi0' 'local-zfs:vm-100-disk-0' 1000G
INFO: include disk 'scsi1' 'dane-zfs:vm-100-disk-0' 1000G
INFO: stopping virtual guest
INFO: creating vzdump archive '/dane/dump/vzdump-qemu-100-2022_03_23-23_30_03.vma.zst'
INFO: starting kvm to execute backup task
INFO: restarting vm
INFO: guest is online again after 32 seconds
ERROR: Backup of VM 100 failed - timeout waiting on systemd
INFO: Failed at 2022-03-23 23:30:35
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Hello! Have a same problem:

Code:
Log

303: 2022-04-19 01:00:03 INFO: Starting Backup of VM 303 (qemu)
303: 2022-04-19 01:00:03 INFO: status = running
303: 2022-04-19 01:00:03 INFO: backup mode: stop
303: 2022-04-19 01:00:03 INFO: ionice priority: 7
303: 2022-04-19 01:00:03 INFO: VM Name: Scorpius
303: 2022-04-19 01:00:03 INFO: include disk 'virtio1' 'local-zfs:vm-303-disk-0' 500G
303: 2022-04-19 01:00:03 INFO: stopping virtual guest
303: 2022-04-19 01:00:22 INFO: creating vzdump archive '/mnt/pve/Ev_backup/dump/vzdump-qemu-303-2022_04_19-01_00_03.vma.zst'
303: 2022-04-19 01:00:22 INFO: starting kvm to execute backup task
303: 2022-04-19 01:00:27 INFO: restarting vm
303: 2022-04-19 01:00:33 INFO: guest is online again after 30 seconds
303: 2022-04-19 01:00:33 ERROR: Backup of VM 303 failed - timeout waiting on systemd
Code:
Syslog

Apr 19 01:00:03 rohan pvescheduler[3623153]: INFO: starting new backup job: vzdump 301 302 303 --compress zstd  --quiet 1 --storage Ev_backup --mode stop --mailnotification always
Apr 19 01:00:03 rohan pvescheduler[3623153]: INFO: Starting Backup of VM 303 (qemu)
Apr 19 01:00:04 rohan qm[3623219]: <root@pam> starting task UPID:rohan:0037533B:04D8D659:625DDF64:qmshutdown:303:root@pam:
Apr 19 01:00:04 rohan qm[3625787]: shutdown VM 303: UPID:rohan:0037533B:04D8D659:625DDF64:qmshutdown:303:root@pam:
Apr 19 01:00:11 rohan pmxcfs[2387]: [status] notice: received log
Apr 19 01:00:12 rohan pvedaemon[2759608]: VM 303 qmp command failed - VM 303 qmp command 'guest-ping' failed - got timeout
Apr 19 01:00:13 rohan QEMU[999895]: kvm: terminating on signal 15 from pid 1988 (/usr/sbin/qmeventd)
Apr 19 01:00:18 rohan qmeventd[1988]: cleanup failed, terminating pid '999895' with SIGKILL
Apr 19 01:00:20 rohan kernel: [813226.645296]  zd0: p1 p2
Apr 19 01:00:20 rohan qm[3623219]: <root@pam> end task UPID:rohan:0037533B:04D8D659:625DDF64:qmshutdown:303:root@pam: OK
Apr 19 01:00:20 rohan pvestatd[2665]: VM 303 qmp command failed - VM 303 qmp command 'query-proxmox-support' failed - unable to connect to VM 303 qmp socket - No such file or directory
Apr 19 01:00:20 rohan pvedaemon[2608753]: VM 303 qmp command failed - VM 303 qmp command 'query-proxmox-support' failed - unable to connect to VM 303 qmp socket - No such file or directory
Apr 19 01:00:20 rohan pmxcfs[2387]: [status] notice: received log
Apr 19 01:00:22 rohan systemd[1]: 303.scope: Succeeded.
Apr 19 01:00:22 rohan systemd[1]: Stopped 303.scope.
Apr 19 01:00:22 rohan systemd[1]: 303.scope: Consumed 1d 9h 53min 17.874s CPU time.
Apr 19 01:00:28 rohan qm[3655408]: start VM 303: UPID:rohan:0037C6F0:04D8DFC6:625DDF7C:qmstart:303:root@pam:
Apr 19 01:00:28 rohan qm[3655407]: <root@pam> starting task UPID:rohan:0037C6F0:04D8DFC6:625DDF7C:qmstart:303:root@pam:
Apr 19 01:00:28 rohan pmxcfs[2387]: [status] notice: received log
Apr 19 01:00:31 rohan pmxcfs[2387]: [status] notice: received log
Apr 19 01:00:32 rohan systemd[1]: Started 303.scope.
Apr 19 01:00:32 rohan systemd-udevd[3655520]: Using default interface naming scheme 'v247'.
Apr 19 01:00:32 rohan systemd-udevd[3655520]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 19 01:00:32 rohan qmeventd[3655525]: Starting cleanup for 303
Apr 19 01:00:32 rohan qmeventd[3655525]: trying to acquire lock...
Apr 19 01:00:33 rohan systemd-udevd[3655520]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 19 01:00:33 rohan qmeventd[3655525]:  OK
Apr 19 01:00:33 rohan qmeventd[3655525]: vm still running
Apr 19 01:00:33 rohan qm[3655407]: <root@pam> end task UPID:rohan:0037C6F0:04D8DFC6:625DDF7C:qmstart:303:root@pam: OK
Apr 19 01:00:33 rohan pvescheduler[3623153]: ERROR: Backup of VM 303 failed - timeout waiting on systemd
Apr 19 01:00:33 rohan pvescheduler[3623153]: INFO: Backup job finished with errors
Code:
pveversion -v

proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
Code:
qm config 303

agent: 1
balloon: 0
boot: cdn
bootdisk: virtio1
cores: 18
memory: 200000
name: Scorpius
net0: e1000=BE:47:84:38:E9:82,bridge=vmbr0
net1: virtio=DE:75:15:0E:70:38,bridge=vmbr0
numa: 0
onboot: 1
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=e97b1fa4-e953-493d-82a1-e72a673f28b6
sockets: 2
virtio1: local-zfs:vm-303-disk-0,cache=writethrough,size=500G
 
Hello,

I have a workaround for this. Before backup task I rebooting whole Proxmox server (cron job) . This is not very elegant way but it works. After 1 week with this workaround I stop rebooting the machine. Now I have 16 days of uptime with 0 failed backup. I don't know what happened but it working now without any problems...
 
On Sunday we tried to make a backup twice. Both times through the scheduler. At 1.45 the attempt was unsuccessful, at 19.10 the backup was successful. Between attempts, the node was not rebooted, no action was taken with it.

A similar situation was observed on all three nodes of the cluster.

Code:
vzdump 301 303 302 --mode stop --storage Ev_backup --quiet 1 --compress zstd --mailnotification always 

301: 2022-04-24 01:45:02 INFO: Starting Backup of VM 301 (qemu)
301: 2022-04-24 01:45:02 INFO: status = running
301: 2022-04-24 01:45:02 INFO: backup mode: stop
301: 2022-04-24 01:45:02 INFO: ionice priority: 7
301: 2022-04-24 01:45:02 INFO: VM Name: Sirius
301: 2022-04-24 01:45:02 INFO: include disk 'ide0' 'local-zfs:vm-301-disk-0' 250G
301: 2022-04-24 01:45:02 INFO: exclude disk 'scsi1' 'local-zfs:vm-301-disk-1' (backup=no)
301: 2022-04-24 01:45:02 INFO: stopping virtual guest
301: 2022-04-24 01:45:25 INFO: creating vzdump archive '/mnt/pve/Ev_backup/dump/vzdump-qemu-301-2022_04_24-01_45_02.vma.zst'
301: 2022-04-24 01:45:25 INFO: starting kvm to execute backup task
301: 2022-04-24 01:45:30 INFO: restarting vm
301: 2022-04-24 01:45:36 INFO: guest is online again after 34 seconds
301: 2022-04-24 01:45:36 ERROR: Backup of VM 301 failed - timeout waiting on systemd


vzdump 301 303 302 --storage Ev_backup --mode stop --quiet 1 --compress zstd --mailnotification always 

301: 2022-04-24 19:10:10 INFO: Starting Backup of VM 301 (qemu)
301: 2022-04-24 19:10:10 INFO: status = running
301: 2022-04-24 19:10:10 INFO: backup mode: stop
301: 2022-04-24 19:10:10 INFO: ionice priority: 7
301: 2022-04-24 19:10:10 INFO: VM Name: Sirius
301: 2022-04-24 19:10:10 INFO: include disk 'ide0' 'local-zfs:vm-301-disk-0' 250G
301: 2022-04-24 19:10:10 INFO: exclude disk 'scsi1' 'local-zfs:vm-301-disk-1' (backup=no)
301: 2022-04-24 19:10:10 INFO: stopping virtual guest
301: 2022-04-24 19:10:23 INFO: creating vzdump archive '/mnt/pve/Ev_backup/dump/vzdump-qemu-301-2022_04_24-19_10_04.vma.zst'
301: 2022-04-24 19:10:23 INFO: starting kvm to execute backup task
301: 2022-04-24 19:10:26 INFO: started backup task '4dccba3e-047b-4405-be78-abd77f2ce26b'
301: 2022-04-24 19:10:26 INFO: resuming VM again after 16 seconds
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!