Trouble backing up VM running a database that is in use 24/7 and is constantly supervising\pinging racks of hardware

Clint84

New Member
Nov 12, 2024
9
4
3
I am having issues implementing a PVE backup solution for a couple of my VMs and was hoping for some help/guidance. Here is my setup:

I have pve cluster running several VMs. Two of those VMs are a matched pair that handles our business's core software/database with a clustering agent to duplicate the database to the second VM for failover.
  • VM1 runs a processing application for a rack of hardware receivers that captures data from our customer premise equipment and writes it to a database also running on VM1. The processing application also supervises the connection with the hardware receivers by replying to pings sent from the receivers to make sure that everything stays connected and communicating.
  • VM2 runs a clustering program to duplicate the database to the second VM and it also supervises the database at the same time so that if the database on VM1 becomes unresponsive or missing (in the case of VM1 failure/crash) it will trigger a launch of the database on VM2 and then launch the processing application to communicate with the receivers.
  • If the system fails over to VM2, then VM1 will need to be repaired, then the clustering program activates so that VM1 now becomes the backup machine.
These VMs used to be two physical servers with Acronis backup setup to backup the entire machines every night and had no issues performing backups. Now that they have moved to VMs we were hoping to use the builtin backup tools to make snapshots and store them on our NAS via PBS. Upon trying to run the backup process, it runs fine on whichever VM is currently running the clustering software. If we run the snapshot on the VM running database and processing application the hardware receivers all start freaking out and beeping because they aren't getting response from the processing application, and the client software on remote PC that is monitored 24/7 by employees will freeze because it can't communicate with the database. I tried running the backup with fleecing enabled and using the Local-LVM NVME storage as target to reduce the IO on the VM but that didn't help. I let the backup run for 15 minutes and it only had 10% of a 250GB disk (~70GB used) backed up before I cancelled the backup.

Is there any other settings I can try to make the VM backup without causing such interruption? The VM is 4 Core @ 3.0GHz, 16GB RAM, 250GB disk (~70GB used), 10Gb network. Local storage on host is enterprise NVME SSD on a HBA setup for CEPH, and a separate paired RAID 0 NVME 100GB to host PVE. Backing up to NAS running TrueNAS with SATA 7200RPM Enterprise drives in RAIDZ-6.