Proxmox HA setup with only two servers running VMs

simoncechacek

New Member
Jun 21, 2023
23
0
1
wordpresscare.net
Hello,

previously, I asked about using GRAID cards and how to get a stable setup with it. After the input from the community, we decided to remove the GRAID card completely and switch to ZFS.

I want to ask about the HA setup (ideally we would love to get Fault tolerance, but I am afraid its not possible with PVE).

I understand you need to have at least three servers as witnesses, but do you need all the three servers to also run the VMs? Or can I have two high power servers with 200Gbit link for replicating the data between them (as I do not want to run shared storage as it is not that fast and its a single point of failure) and one server as a witness that will not be able to take over the VMs?

I saw countless videos showing PVE HA, but they were all between 3 RPis, or 3 same servers.

They also used CEPH storage a lot, but I am afraid CEPH will be much slower for those 2 servers as I want to run a Gen4 NVMe ZFS pool.

I am sorry if my questions are stupid, we trying to setup the best enviroment for our customers websites to keep them online even with hardware issues.

Thank you!
 
Hello,

previously, I asked about using GRAID cards and how to get a stable setup with it. After the input from the community, we decided to remove the GRAID card completely and switch to ZFS.

I want to ask about the HA setup (ideally we would love to get Fault tolerance, but I am afraid its not possible with PVE).

I understand you need to have at least three servers as witnesses, but do you need all the three servers to also run the VMs? Or can I have two high power servers with 200Gbit link for replicating the data between them (as I do not want to run shared storage as it is not that fast and its a single point of failure) and one server as a witness that will not be able to take over the VMs?

You can have 2 servers with ZFS replication going on and HA enabled, then you can have a QDevice for the quorum. The QDevice may be outside of the network with no of the stringent requirements on latency, throughput etc.

I saw countless videos showing PVE HA, but they were all between 3 RPis, or 3 same servers.

RPis??

They also used CEPH storage a lot, but I am afraid CEPH will be much slower for those 2 servers as I want to run a Gen4 NVMe ZFS pool.

The only caveat is that how much stale data you can tolerate should the HA need to migrate those VMs, e.g. zfs replication is delta only, but I would not want to schedule it every minute for many VMs. If you do not mind your HA-migrated VM starts up from dataset that is minutes old, it's good enough and more "performant" than CEPH would be on 3 nodes even.

I am sorry if my questions are stupid, we trying to setup the best enviroment for our customers websites to keep them online even with hardware issues.

Thank you!
 
Yes, I saw videos of HA clusters on Raspberry Pis :D But I am not planning on doing that.
The only caveat is that how much stale data you can tolerate should the HA need to migrate those VMs, e.g. zfs replication is delta only, but I would not want to schedule it every minute for many VMs. If you do not mind your HA-migrated VM starts up from dataset that is minutes old, it's good enough and more "performant" than CEPH would be on 3 nodes even.
We currently have one VM with Plesk panel and MySQL. I plan to split them into two, so one would be data and webserver and the second will host databases. How often can I replicate that data? From reading previous posts, I understood I can setup data replication of the whole ZFS pools that would be running as fast as possible. These two servers would be in one rack connected with private 200Gbit, so I can move the data very, very quickly.


I would love to have the data stale in seconds at maximum. Is that possible? or do I need to have CEPH storage and three servers for that? I checked and we changer around 5-20GBs of data every 30 minutes and that is including customers cloning their sites or dev projects.

Thank you for you time and answer!
 
We currently have one VM with Plesk panel and MySQL. I plan to split them into two, so one would be data and webserver and the second will host databases.

So apparently the webserver is not an issue, the DB might be. Replication jobs have to run scheduled, not that it won't be possible to run them every second (but how will they be finishing?:)), but PVE allows for 60secs minimum I think. If you were to e.g. have that DB HA migrated, it would start up with up to 60secs old data - I am not sure how you would tolerate that, I would not.

How often can I replicate that data? From reading previous posts, I understood I can setup data replication of the whole ZFS pools that would be running as fast as possible.

You need to schedule it manually for every VM, in 2-node cluster trivial, but just saying. See the UI, 1min granularity I believe.

These two servers would be in one rack connected with private 200Gbit, so I can move the data very, very quickly.

I do not see why manually you would not be able to do the ZFS replication faster, but it's not going to be "supported" officially.

I would love to have the data stale in seconds at maximum. Is that possible? or do I need to have CEPH storage and three servers for that?

See above. Or shared storage. I do not think CEPH makes sense on e.g. 3 nodes even (which is minimum).

I checked and we changer around 5-20GBs of data every 30 minutes and that is including customers cloning their sites or dev projects.

Thank you for you time and answer!
 
Ok, thank you for the insight.
Another idea is to have only the webdata replicated and the database in a MySQL cluster, Then it should have the data synced between those MySQL VMs and should be able to switch.

One minute on user data is probably doable as DB is the most important one.
 
Thank you @bbgeek17 for the info! Do you have any recommended ways how to run MySQL/MariaDB replicated? I read something about Percona, but this replication is new for me. My biggest issue is that websites are usually made to access DB under one IP only, so I am not sure how to always deliver the data when the one VM dies.
Maybe using some plugin to connect multiple MySQL servers into WordPress?
 
Hi @simoncechacek, I dont have any specific recommendations for you regarding MySQL replication. As far as I am aware there are built-in ways as well as 3rd party solutions. There are many articles online that cover this subject, ie "mysql cluster replication tutorial failover ip".

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi there and thank you.

My current plan is to have the Web VM synced as often as possible. Then use Database cluster on both nodes so the DB will be always online even when one host dies.

As for the replication, I read the https://pve.proxmox.com/wiki/Storage_Replication page and there is just something I want to clarify. Is there a way to run this continuously, or if not, can I start it once a minute and somehow check the last replication is done? Ideally I want to sync these ZFS storages continuously, I will even have a dedicated mutliGbit link for that between those two nodes.

I also looked into options like https://github.com/jimsalterjrs/sanoid but I dont know if I can achieve what I want with that and if syncoid will be usable with Proxmox's HA solution.

If am asking too much or being stupid, just let me know. I am trying to understand but this is the first time I am at the same time learning PVE and also setting up a HA. I only used Vmare's fault tolerancy in a lab enviroment.
 
I read the https://pve.proxmox.com/wiki/Storage_Replication page and there is just something I want to clarify. Is there a way to run this continuously, or if not, can I start it once a minute

There's literally a sentence there in your link:
"The minimum replication interval is one minute, and the maximal interval once a week"

It is unfortunately all based on the same basis as pvesr above.

I am not sure, even if you were to hack it, what would happen if you were trying to run it too early, say every second, in case the previous job has not finished yet. After all it's just ZFS snapshot replica being sent over to the other node, that's a delta to the previous state stored there.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!