Backup to PBS of running OPNSense VM causes issues

rfox · Sunday at 13:43

I'm running latest OPNsense version in a VM on a standalone host (N305 based) - and perform a backup once a week to a PBS server on another machine - Backup runs at 3 am on Sunday mornings - takes about 30 minutes using snapshot mode (60G disk image) - I get an e-mail report status backup was successful . . . BUT

Many times after that, although the VM is still running - serveral OPNsense services have stopped and the internet is broke - next morning, when I log into the OPNsense interface, I manually restart the dead services and everything works again - or a reboot is necessary.

Any suggestions? Should I not be using snapshot mode and use suspend or stop instead?

This does not happen every time - just somethimes. I thought it was a fluke - but the recurrance is annoying.

Thx in advance

Chris · Monday at 08:55

rfox said:
I'm running latest OPNsense version in a VM on a standalone host (N305 based) - and perform a backup once a week to a PBS server on another machine - Backup runs at 3 am on Sunday mornings - takes about 30 minutes using snapshot mode (60G disk image) - I get an e-mail report status backup was successful . . . BUT

Many times after that, although the VM is still running - serveral OPNsense services have stopped and the internet is broke - next morning, when I log into the OPNsense interface, I manually restart the dead services and everything works again - or a reboot is necessary.

Any suggestions? Should I not be using snapshot mode and use suspend or stop instead?

This does not happen every time - just somethimes. I thought it was a fluke - but the recurrance is annoying.

Thx in advance

Hi,
do you see any errors in the VMs system logs? Maybe the VM I/O is starved because the backup target is not fast enough? In that case a fleecing image will help, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing

rfox · Monday at 09:13

Chris said:
Hi,
do you see any errors in the VMs system logs? Maybe the VM I/O is starved because the backup target is not fast enough? In that case a fleecing image will help, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing

Thanks for the tip!

What exactly am I looking for in the logs to identify this suspicion ? Why would that affect the continuous operation of the VM when performing a "snapshot" - I thought the VM resumed very quickly and the backup happens independently ?? Up until recently, this wan't a problem and it seems to be sporadic . .

As an alternative, I have switched the weekly backup of that particular VM to local storage on the same node instead of over the network to the PBS currently living on a QNap NAS device . . . until I figure out what's happening . . .

Chris · Monday at 09:16

rfox said:
What exactly am I looking for in the logs to identify this suspicion ?

Any IO related errors on the virtual disks of the VM around the time of the backup. How are your disks attached? Do you use sata or scsi controller?

rfox · Monday at 09:37

Chris said:
Any IO related errors on the virtual disks of the VM around the time of the backup. How are your disks attached? Do you use sata or scsi controller?

Just checked logs - no errors whatsoever related to IO - just "backup successful"

Chris · Monday at 10:07

rfox said:
Just checked logs - no errors whatsoever related to IO - just "backup successful"

What logs did you check? You should check the syslogs within the VM, would not expect to see any errors on the host.

rfox said:
I thought the VM resumed very quickly and the backup happens independently ??

Yes, the snapshot mode creates a consistent state of the VM for backup, so it does not need to be powered down. But since the backup process is copy-before-write, any newly written data will block until the old data is written to the backup target. The fleecing image reduces this by locally storing these data chunks before having them written to the backup target.

rfox · Monday at 10:28

Chris said:
What logs did you check? You should check the syslogs within the VM, would not expect to see any errors on the host.

I think I found something - every Sunday @ 3am (when the backup job starts) I get a series of processes being killed and failure to reclaim memory - bit this is OPNSense running on BSD under the hood - not sure if this is relevant. All i can say is, this backup worked for many months prior - only change I can think of which possibly could have an effect is updating to the Linux 6.14.5-1-bpo12-pve kernel from 6.11 on the node ?!?

Search

Search

Backup to PBS of running OPNSense VM causes issues

rfox

Active Member

Chris

Proxmox Staff Member

rfox

Active Member

Chris

Proxmox Staff Member

rfox

Active Member

Chris

Proxmox Staff Member

rfox

Active Member

We value your privacy