VMs freezing and unreachable when backup server is slow

christian.g

Member
Jun 4, 2020
43
17
13
We are having a hard time too. The combination of dirty maps in conjunction with a mid-fast PBS is giving us nightmares. We have > 40 VMs and also have a few VMs which are quite big database servers (>3TB). Sometimes they need updates and a reboot is required, which in turn invalidates the dirty maps and a full backup is the result. This again delays the whole backup of all other VMs on the node and makes those database VMs unusable for hours. And hard resetting such big database server and log recovery make things even worse.

Is there any design progress?
What about incorporating Ceph Snapshots if Ceph is in use instead of using qemu dirty maps?
I know you try to make a solution which works in every case an hence use qemu but these blocker/delays/freezes/hard resets are a big problem.
 
Last edited:

RolandK

Member
Mar 5, 2019
162
19
23
49
i wonder why that cache is memory only and why it doesn't get send to disk also/instead when the cache is getting full. if network or backup server has slowness issue, it's unacceptable that VMs get IO error because of this
 
  • Like
Reactions: christian.g

phs

Active Member
Dec 3, 2015
36
2
28
thats is a critical issue, it just can not be that backup is crashing vm, is this being worked on? is there usable workaround?
 

Stefano Giunchi

Active Member
Jan 17, 2016
78
8
28
48
Forlì, Italy
www.soasi.com
this is how the backup works, it intercepts write calls from the vm and backups the relevant block (detailed info here: https://git.proxmox.com/?p=pve-qemu...16aeb06259c2bcd196e4949c;hb=refs/heads/master)

we already cache some blocks in memory, but if the backup storage is too slow, symptoms such as this can happen

From the file I read
* slow backup storage can slow down VM during backup
It is important to note that we only do sequential writes to the backup storage. Furthermore one can compress the backup stream. IMHO, it is better to slow down the VM a bit. All other solutions creates large amounts of temporary data during backup.

In fact, depending on the backup speed, VMs are not slowed down a bit: they are slowed down a lot, freezed or even crashed.

On Windows machines, I receive the ESENT/508 error: svchost (1008) SoftwareUsageMetrics-Svc: A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3043328 (0x00000000002e7000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (15 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

I'm going to give my +1 to https://bugzilla.proxmox.com/show_bug.cgi?id=3631
 

RolandK

Member
Mar 5, 2019
162
19
23
49
i think it would be best to do a showcase by using some network bandwidth throttling tool and some io throttling tool on the PBS, just to demonstrate , how badly things can behave...

the easiest way to do should be setting up a virtual PBS and limiting virtual nic and disk in the hardware settings dialogue.
 
Last edited:

Darkk

Member
Sep 2, 2019
16
3
8
52
I have a customer with a database on ceph/nvme disks. They write on the database 24 hours a day. The backup server has 5400rpm disks. When the backup starts the database VM slows down. Please note that is not a network problem because proxmox->pbs connections is 10gbit.
I repeat: to do an optimization of backup process you think is "acceptable" to bind VM disks speed to backup unit disks speed. This is a classical example of "perfect is enemy of good".
Do you know we have so many problems due to this choice?
Where can I fill a complain or request to change this behaviour? At least for ceph filesystem.
Thanks,
Mario

I know what you're going through trying to deal with VMs thats running on CEPH. I've experienced the same slowdowns or sometimes freezing during backups. This is before PBS came around. I ended up trashing the servers in favor of a different solution for now. I will get back to it when the subscription runs out.

I run two node ProxMox for my home lab. Based on my trial and error I realize trying to back up super large VMs that hold 1-2TB of data is pretty much fruitless. It seems easier to create small VMs just run the apps that can be backed up quickly while the data actually reside on TrueNAS using it's own backup system. I use NFS shares. This is just an example.

You would shutdown SQL on the VM and then back up the VM in a powered off state. This way you have a full working backup image of it. Then use SQL backup to run your daily backups. For recovery you just restore the database server VM and then restore from SQL backup.

I can tell you using PBS to backup small SQL database servers does not pose a problem but always use the native SQL backup just in case. Very busy SQL servers during VM backups are asking for problems with corruption.
 

Ultranium

New Member
Aug 2, 2022
1
0
1
Yeah, this is a major problem.

I have a busy VM with a ~3TB disk in it. The backup takes almost 6 hours, and during this time all sorts of disk IO-related errors happen inside the VM, and some programs just crash because of it.
I had to disable the backup for this VM just to make it usable.

Can't Proxmox use ZFS snapshots for the live backup? I never have troubles using ZFS snapshots on a running VM, not even a slight slowdown.
 

christian.g

Member
Jun 4, 2020
43
17
13
Till now a few suggestion have been provided by the community like

- manually increase the memory buffer size
- add a fast and large enough buffering device like a PCI NVMe or a ZFS mirror of them
- use storage snapshots if available (ZFS/CEPH)
- use backup fleecing

Is Proxmox working on any of them? Any Feedback from the Proxmox Team?

Thanks
 
Last edited:
  • Like
Reactions: DerDanilo

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!