High Availability + Replication = Disaster

Ashkaan4

Active Member
Feb 16, 2019
23
2
43
39
Hi, I've configured HA and replication, but when I actually reboot a server, the failover fails stating that the volume already exists at the target node.

Some helpful info:
  • I have 8x nodes
  • my goal is to have replications every 15m
  • when there's an event that causes a failover, either a quick delta to get the latest, or just run the 15m old version
  • I have each vm set to failover and replicate to 2x other nodes (just in case)
  • I have groups configured with priorities
  • I separately have replications configured from Source to Destination A and Destination B
 
@Ashkaan Hassan
1. What's the status of replicaion?
2. Are you using traditional storage replication or pve-zsync?
3. Maybe you can delete your replication tasks and remove the dangling vmdisk from the other nodes and do another clean replication
 
Wouldn't it be easier to create shared storage with Ceph on the 8 nodes instead of asynchronously replicating the VM disks?
Or is that a non-local cluster with latencies too high to allow Ceph?
Ceph is too slow for our needs.
 
@Ashkaan Hassan
1. What's the status of replicaion?
2. Are you using traditional storage replication or pve-zsync?
3. Maybe you can delete your replication tasks and remove the dangling vmdisk from the other nodes and do another clean replication
  1. Error. "2024-03-11 08:54:04 110-0: volume 'rpool/data/vm-110-disk-0' already exists"
  2. I just went to Replication under Datacenter and configured each one there.
  3. Then the replication will be happy and the next time I need to failover, it will error again.
 
I'm still having this issue. Can anyone help with replication not working (as expected)?

Again, the goal is just to have replication jobs running and successful so that we when we reboot a server, the VM migrates QUICKLY without needing to send a full replication at that moment. This is possible with VMWare and Hyper-V. It's got to be possible here.
 
Am I the only person that needs to do updates to Proxmox hosts?

How are you guys managing these things?
 
I'll be happy to pay a $100 bounty to the first person that solves this issue for me. Please DM me or post here. I'm happy to run a Zoom meeting so you can see clearly where I'm having trouble with this platform.
 
  1. Error. "2024-03-11 08:54:04 110-0: volume 'rpool/data/vm-110-disk-0' already exists"
  2. I just went to Replication under Datacenter and configured each one there.
  3. Then the replication will be happy and the next time I need to failover, it will error again.

The whole HA and ZFS replication is a bit of a joke, all it took for me was one response [1] from PVE project lead and I got the message it was some sort of leftover and no pays attention to how well it works. You may also look at the reply I got in the filed Bugzilla ticket. Sooner or later you will run into issues where you have to manually intervene, I just made one test and run into it. It certainly does not scale well for 8 hosts.

[1] https://forum.proxmox.com/threads/what-is-wrong-with-high-availability.139056/#post-620923
 
  • Like
Reactions: Ashkaan4
I see. Thank you for this. I guess it's time to move back to VMWare.

I suppose you are attempting to get some sort of reaction from PVE staff. Sometimes (EU business hours) you get it here, but you will soon notice there's some setups they like to support more than others (e.g. shared storage instead of replicated ZFS).

If another solution works better for you, without making it like a snide remark, sure go for it - I would argue there's other options than Broadcom's and PVE. With 8 nodes, you have a good choice. For PVE, the HA implementation is far from perfect, the scheduler is e.g. very primitive.
 
... the failover fails stating that the volume already exists at the target node.
For my setups (mostly dual node clusters for now), this means there was an older VM with the same id (eg id 100) that already had replication set up.

But when that older VM was deleted, the replicated disk on "the other node" wasn't cleared out for some reason. The left over volume on the other node can block replication for a new VM (with the same id) unless these left over volumes are cleared out first.

If you're wanting to do this stuff in an automated way, then the pvesr cli tool is what you want. That's used for setting up replication jobs, and for removing them.

You can also just directly run the appropriate zfs destroy command on the target node to clear out old VMs too. Both ways will work.

Anyway, once you've cleared out any left over volumes (of the same VM id) on the target node, then your replication setup should work ok. In theory. :)
 
Last edited:
  • Like
Reactions: Ashkaan4
pvesr cli tool is what you want. That's used for setting up replication jobs, and for removing them.

You can also just directly run the appropriate zfs destroy command on the target node to clear out old VMs too. Both ways will work.

I just noticed the OP was having yet another issue, which is far worse:
https://forum.proxmox.com/threads/high-availability-with-local-zfs-storage.122922/#post-684207

Also with 8 nodes, I don't want to imagine all the manual replication setups to manage 3-way for potentially 100s of VMs that way ...
 
  • Like
Reactions: justinclift
Ouch. Something causing from-scratch replication (ie complete volumes) to occur instead of deltas. I've not seen that before.

Personally I'd definitely double check if that's indeed what's happening (just in case), because it seems really unusual.

If that's indeed is what's going wrong, then that's a bad bug which will need tracking down and fixing.
 
Last edited:
IMHO if a distributed storage (Ceph or DRBD) is too slow for the application then the application has to do the replication itself.
I.e. run a Galera cluster with local storage. It does not matter if a node (and its VM) goes down because the other members of the Galera cluster run on other Proxmox nodes.
 
I think I just have a tough use case.
Can you describe that more in details? It's uncommon that CEPH is NOT the way to go with that high number of nodes. It normally just means that the hardware is not on the required level. ZFS is a great filesystem, yet not for a cluster and HA with ZFS is IMHO no real HA due to all' the problem you're describing. Dedicated or distributed shared storage is the only way to have a fast and easy maintainable cluster.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!