Backup of VMs Without Downtime?

pixel24

Well-Known Member
Dec 11, 2019
106
2
58
46
Hi @all,

I’m using PVE 8.3.3 in a private environment on a Supermicro server with HBA and ZFS as storage. What I’m still missing is a good backup strategy for my entire VMs. I already back up the working data inside the VMs.

PVE backups provide me with a simple way to quickly restore VMs, for example, in case of hardware failure.

Since the system also runs a TV server and a streaming server for external users, I have the problem that scheduling a backup that requires stopping the VM is difficult. There are people who listen to music or watch TV at night.

At some point, I read that the Snapshot backup method, which does not require stopping the VM, is not considered safe. Is that still the case?

with best
pixel24
 
Depends on services in VM. Snapshot is backup of the running state, not when everything in VM is off. So for file server it's almost ok, for db server not.

There is no way to create backup with running VM as if VM is off.
 
  • Like
Reactions: alexskysilk
stop mode provides the highest consistency guarantees (since the VM is shut down in an orderly fashion, all services running within flush out their data). snapshot and suspend are more like what the disks would look like if you just pull the power plug (a bit better if you have the guest agent and fsfreeze enabled, but what exactly that entails depends on the guest OS and how it is configured). in most situations that's fine, but it really depends on your environment.
 
  • Like
Reactions: Kingneutron
Just an idea for testing.

Proxmox allows a clone to be made on a running VM. If IRC this actually makes a snapshot of the running VM. So that clone would suffer the same pitfalls of any snapshot. But what you could try:

Create a script to do the following:

1. Clone the running VM.
2. Shutdown the running VM & startup the Clone.
3. Create a full backup of the Stopped VM.
4. Shutdown the Clone VM & Start the VM.
5. Delete the Clone.

Achievement: Downtime will be limited to points 2 & 4 above.

Now for the Pitfalls:

a. On a large VM, the process is probably going to be too long to be worth it.
b. That running Clone will only be as good as a snapshot of the VM - with all the pitfalls that entails. A snapshot is like the running state of the VM.
c. Any changes made to the Clone while it is running (2. above) will be lost.


What you probably need in your case (if downtime is so critical for you) is a cluster with HA/replication. That way your downtime will be minimal.
 
  • Like
Reactions: Kingneutron
in most cases, just thinking about "what would happen to this system if power was lost" is enough - if you don't break out in a sweat thinking about losing days of not-persisted work that just lives in memory, or are using some file system that can potentially die completely in such an event, then you are most likely fine with the level of consistency snapshot mode backups provide. a compromise might be to use snapshot mode during the week, and stop mode once over the weekend (or whatever schedule makes sense for your work load). note that with PBS, stop mode backup comes with the downside of clearing the dirty bitmap and thus requiring a full read of the VM data for that backup run.
 
Thanks for the ideas. I'm currently thinking about the best strategy to achieve my goal and will test a few things to understand the runtime of each process.

Regardless of that, I have a dependency issue with my backup using the "Stop" mode. My VM "SRV01" hosts all data (music, movies, etc.) as well as the entire Active Directory (DHCP, DNS, users, groups, shares, etc.). The VM "MEDIA02" runs Streammaster for IPTV and Jellyfin (streaming server). This VM mounts the data (shares) from SRV01 via SMB.

The VM "SRV01" takes about 1:45 h for a backup in "Stop" mode. The VM "MEDIA02" takes 15 minutes. This creates a situation where MEDIA02 starts up after completing its backup while SRV01 is still busy with its own backup. This causes issues with Jellyfin, as it sometimes performs an automatic media scan or removes tracks from playlists if it can't find them.

Is there a way in PVE to define dependencies, such as:

  • Stop VM 103 (MEDIA02) first
  • Then stop VM 100 (SRV01)
  • Perform the backup for both VMs
  • Start VM 103 only after VM 100 is running again
?
 
for VMs, the downtime for stop mode backups is actually quite short:
- shutdown VM
- start VM paused
- start backup
- resume VM and allow it to boot up
- backup runs until completion

are you maybe talking about containers? there, stop mode backup will actually stop the container for the full duration of the backup.. (for containers, the difference between snapshot and suspend is also a lot bigger, and if your storage supports snapshots, snapshot mode is a lot more efficient/faster and has way less downtime).
 
  • Like
Reactions: Kingneutron
> Since the system also runs a TV server and a streaming server for external users, I have the problem that scheduling a backup that requires stopping the VM is difficult. There are people who listen to music or watch TV at night
> I already back up the working data inside the VMs

Look into clustering, and question how often these VMs really need a bare-metal backup if their OS internals don't change much. You could probably get away with weekly or every other week with a backup at e.g. Sunday 4am.

You didn't mention if you have Proxmox Backup Server implemented - if not, look into it. Take advantage of dedup and dirty bitmaps.