Disaster Recovery Scenario

erazmus

New Member
May 8, 2023
3
0
1
We are working on a remote cluster of PVE machines and a PBS machine to act as an off-site backup of our local PBS, as well as provide some disaster recovery options. Ideally, we would build out the disaster recovery site to be identical to the local cluster, however costs are prohibitive at this time.

PBS has the ability to do a 'live restore' where a VM boots up and uses the PBS storage until the VM has been restored to local(or ceph) storage. Is there any ability to run a VM fromthe PBS backup image without moving the data locally? In other words, in a DR scenario, we would like to spin up low-performance VMs using less-than-optimum PBS storage (which is already consumed) while restoring high-performance VMs to local storage in the DR cluster.
 
PBS has the ability to do a 'live restore'
Is there any ability to run a VM fromthe PBS backup image without moving the data locally?
No. Not as far as I know.

It would need to write data to the PBS in this scenario, potentially within that backup (or at least "somewhere"). That would be a no-go for me...
 
Every time I see a post about Disaster Recovery, I check it to see if anyone has a way to deliver near-realtime DR.

So far, the answer is no.

  • The one real, supported answer is ceph cluster storage replication. I've not managed to stand up 2 ceph systems in order to test it yet, but my anticipation is that the impact on performance of the VMs themselves is going to be horrid.

  • The only other native answer is PBS 'remote sync'. And that is a sync of a backup. And those backups have impact, so you cant run them all the time. So your RPO is determined by how many times you think its ok to run (and sync) a backup. In practical terms, your RPO is 24 hours.

  • The answer I would really like to see is that normal PVE 'storage replication' feature that can be configured to replicate machines between servers in a cluster ... I would like to see that adapted so that it can replicate to a remote site. Right now, you can't do that, because this feature only works between cluster members, and lag between cluster members should be below 10ms, so it won't work with a remote site.

Unfortunately, that's the entire list of GUI-supported, in-the-app options.
None of those options deliver sub-5-minute RPO for site-to-site because of their respective flaws.
 
I've looked at this Vinchin thing. I checked that link. Ya, it does backups. Other than the China problem, it seems ok, but we've got all that already in PBS. Same with Veeam. Who cares? We have PBS, and these systems don't bring anything new to the party. Feel free to correct me. I wish i was wrong.

Here's the basic requirement from the business we service.
  • Lose no more than 5 minutes of data.
  • Continual replication to the geographically remote site.
  • Restore live operations with that current data to that geographically remote site within 2 hours.

The only product that I know if that does that and leaves the VMs performant is Zerto.
(ceph will do that, but the performance sux.)
And Zerto doesn't support PVE.

No solution so far.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!