[solved] Replication schedule

Elleni

Active Member
Jul 6, 2020
145
6
38
51
As I see that replication duration in our case takes just 1.6 to ~6 seconds per vm, I am thinking about to shorten the period to */1. Is this ok, or would you not recommend such short intervals?
 
As I see that replication duration in our case takes just 1.6 to ~6 seconds per vm, I am thinking about to shorten the period to */1. Is this ok, or would you not recommend such short intervals?
I think there is nothing preventing you from doing that.

But to somewhat hijack the thread: does anyone know if the replication logic has a lock to prevent execution when the same task is running already?
 
  • Like
Reactions: Elleni
Thanks for your reply. I also have an additional question. I read in documentation, that for HA / failover / fencing shared storage is needed. But then again, I heard somewhere that live migration of VMs also require shared storage, but apparently this is also possible without shared storage.

With a replicationi nterval of one minute I also "almost" have a shared storage. So my question would be if I can configure HA without shared storage.
 
What does work is migrate a running vm. I was under the impression that this also required a shared storage some time ago, but a ProxMox experienced user told me that it works without shared storage nowadays, which it does in my case.

Thats why I want to know, if automatic start of a vm on a replicated node (thats how I understand HA) is also possible in my setup, and a shared storage is not mandatory anymore.
 
Last edited:
True. Thats why I am wondering if also HA and automatic failover to the second node is possible without shared storage.
 
True. Thats why I am wondering if also HA and automatic failover to the second node is possible without shared storage.
no, this is not possible as replication is not real-time (asynchronous replication.)
 
  • Like
Reactions: Elleni
I think this needs a bit mode clarification. While technically it is possible to run HA with ZFS replication, you need to be aware that if HA kicks in and starts the VM on the other node, it will do so with disk as it was during the last replication run.

Depending on how short the replication schedule is and what kind of services are running, you might be okay with the data loss that you will have. If that is not an option you will need a real shared storage as @tom mentioned.
 
Great, thanks for clarifying, so I might try to set this up as a short breakage / dataloss until the second node is ready to server the vm is acceptable for such a small environnement that we are.
 
Just for my understanding. Our VMs are on node A or node B, but not on both. The storages of each node are replicated so the vm disks exist on both nodes. Now for configuring HA failover without shared storage - is it just a matter of adding entries for the VMs in HA? How will that work then. Are the VMs of node A automatically created on the node B if node A goes down? Or am I supposed to create identical VMs on both nodes with their respective local storage so that the VM would automatically boot on the other node in case of a failure of their primary node?
 
Last edited:
The storage should be named exactly the same on both nodes for replication to work. In case one of the node is going down, the HA stack will move the VM config to the other node and start it. Since the storages are named the same, everything should work as expected.

How many nodes do you have in the cluster? Only 2? Do you have a QDevice[0] or a third node present to still have quorum if one of the nodes is going down?

You can try to set up a (virtualized) test setup and see if it behaves as expected if you stop one node.

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
 
  • Like
Reactions: Elleni
Thanks for clarification - that should work then. The storages are named identically and disks are regularely replicated. Yes, two nodes with a QDevice configured on our pbs. Yeah, or maybe even do a livetest with a shutdown of a node or just unplugging it from the storage network, once our employes are not working. Thank you.
 
Last edited:
Yeah, or maybe even do a livetest with a shutdown of a node or just unplugging it from the storage network, once our employes are not working. Thank you.
If you can do it sure, be sure to unplug the network used by corosync as this service is used to determine if a node is still alive and part of the cluster.
 
The failover works fine, meaning the VMs are started on the remaining node as expected when one node is shutdown. One question I still have though. When the failed node comes back, is it normal / by design that I have to migrate the VMs back manually, or should they get migrated back automatically in this scenario of 2 node cluster without shared storage and with replicated datastores and with a Qdevice for third vote?
 
One question I still have though. When the failed node comes back, is it normal / by design that I have to migrate the VMs back manually, or should they get migrated back automatically in this scenario of 2 node cluster without shared storage and with replicated datastores and with a Qdevice for third vote?
AFAIK this is normal. If you want certain guests to prefer one node over the other, you can define HA groups and set priorities for the nodes. So in a 2 node scenario, you could create 2 groups which favor one or the other nodes and place the VMs in those groups. Should the node come back, they should migrate back after a bit of time.
 
Great. Thanks for your confirmation. I will maybe check this out, or perhaps we even leave it as is, because I am not sure yet if it is desirable that the VMs migrate back automatically. The failed node may need some attention or fix after all, before putting it back on service. But good to know how this could be done with groups.
 
Last edited:
For the purpose of testing created two groups prefer_node1 and prefer_node2 and assigned the VMs to those groups. I have a not booted stopped VM on node 2 but I assigned it to prefer_node1 group. I have configured max. restart 0, max. relocate 1 and request state: stopped.

My expectation would be that the vm would be migrated to node1 but it doesn't move. Is that only working while a vm is started, or what am I missing here?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!