VM's freezez during backup to PBS

plewandowskiopc

New Member
Mar 2, 2022
1
0
1
43
Good morning.
We have a problem after implementing proxmox backup server.
We have several Ceph nodes, additionally we have 7 machines running virtual machines.
Backup on proxmox cluster is configured - many machines from all pve1-7 at one hour.
Backup on PBS is launched (backups start at midnight, 7 backup processes are started - each from one pve).

At the moment when the virtual machine is being backed up - it is suspended for a long time - it freezes.
The services hosted on it are not available for some time.

In such a moment, zabbix monitoring reports a problem with the connection with the zabbix agent on the backuped machine.
Monitoring also indicates the unavailability of other services on the backuped machine.

An example message from zabbix:

Code:
2022.03.01  00:08:14
Name: Zabbix agent on hostname is unreachable for 5 minutes
Host: hostname
Severity: Average
Item value: Up (1)

During backup in logs i see:

Code:
INFO: Starting Backup of VM 136 (qemu)
INFO: Backup started at 2022-03-02 23:59:04
INFO: status = running
INFO: VM Name: testmachine
INFO: include disk 'scsi0' 'rbddata-v2:vm-1366-disk-1' 32772M
INFO: include disk 'scsi2' 'rbddata-v2:vm-1366-disk-0' 500G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/1366/2022-03-02T22:59:04Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'cedc2164-16d4-463c-b9ca-0a121b4b8a26'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (1.1 GiB of 32.0 GiB dirty)
INFO: scsi2: dirty-bitmap status: OK (7.0 GiB of 500.0 GiB dirty)

Technical details:
pve1-7 has that same H/W / software:

Code:
root@pve1:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-4-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-10
pve-kernel-5.13: 7.1-7
pve-kernel-5.4: 6.4-12
pve-kernel-5.13.19-4-pve: 5.13.19-9
pve-kernel-5.4.162-1-pve: 5.4.162-2
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-9-pve: 4.15.18-30
ceph: 14.2.21-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-6
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

Proxmox backup server:

Code:
root@proxmoxbackup:~# proxmox-backup-manager versions
proxmox-backup-server 2.1.5-1 running version: 2.1.5

24 x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (2 Sockets)
RAM: 256GB
Datastore: local softfware RAID-5
 
I assume it's due to slow write performance on the PBS storage.
The faster the chunk can be written, the faster the reads can be done, thus the VM freezes less notable.
I suggest to use the new PBS feature "Tuning Options: Sync Level: File" - you can find it in the Datastore Options.

Alex
 
Thanks! That links contains the information I was looking for :-)
 
I assume it's due to slow write performance on the PBS storage.
The faster the chunk can be written, the faster the reads can be done, thus the VM freezes less notable.
I suggest to use the new PBS feature "Tuning Options: Sync Level: File" - you can find it in the Datastore Options.

Alex
That is not issue, i changed OS on same metal to truenas core, same config and hardware, now it works normaly