Proxmox HA setup with only two servers running VMs

simoncechacek · Jan 15, 2024

Hello,

previously, I asked about using GRAID cards and how to get a stable setup with it. After the input from the community, we decided to remove the GRAID card completely and switch to ZFS.

I want to ask about the HA setup (ideally we would love to get Fault tolerance, but I am afraid its not possible with PVE).

I understand you need to have at least three servers as witnesses, but do you need all the three servers to also run the VMs? Or can I have two high power servers with 200Gbit link for replicating the data between them (as I do not want to run shared storage as it is not that fast and its a single point of failure) and one server as a witness that will not be able to take over the VMs?

I saw countless videos showing PVE HA, but they were all between 3 RPis, or 3 same servers.

They also used CEPH storage a lot, but I am afraid CEPH will be much slower for those 2 servers as I want to run a Gen4 NVMe ZFS pool.

I am sorry if my questions are stupid, we trying to setup the best enviroment for our customers websites to keep them online even with hardware issues.

Thank you!

esi_y · Jan 15, 2024

simoncechacek said:
Hello,

previously, I asked about using GRAID cards and how to get a stable setup with it. After the input from the community, we decided to remove the GRAID card completely and switch to ZFS.

I want to ask about the HA setup (ideally we would love to get Fault tolerance, but I am afraid its not possible with PVE).

I understand you need to have at least three servers as witnesses, but do you need all the three servers to also run the VMs? Or can I have two high power servers with 200Gbit link for replicating the data between them (as I do not want to run shared storage as it is not that fast and its a single point of failure) and one server as a witness that will not be able to take over the VMs?

You can have 2 servers with ZFS replication going on and HA enabled, then you can have a QDevice for the quorum. The QDevice may be outside of the network with no of the stringent requirements on latency, throughput etc.

simoncechacek said:
I saw countless videos showing PVE HA, but they were all between 3 RPis, or 3 same servers.

RPis??

simoncechacek said:
They also used CEPH storage a lot, but I am afraid CEPH will be much slower for those 2 servers as I want to run a Gen4 NVMe ZFS pool.

The only caveat is that how much stale data you can tolerate should the HA need to migrate those VMs, e.g. zfs replication is delta only, but I would not want to schedule it every minute for many VMs. If you do not mind your HA-migrated VM starts up from dataset that is minutes old, it's good enough and more "performant" than CEPH would be on 3 nodes even.

simoncechacek said:
I am sorry if my questions are stupid, we trying to setup the best enviroment for our customers websites to keep them online even with hardware issues.

Thank you!

simoncechacek · Jan 15, 2024

tempacc346235 said:
RPis??

Yes, I saw videos of HA clusters on Raspberry Pis

But I am not planning on doing that.

tempacc346235 said:
The only caveat is that how much stale data you can tolerate should the HA need to migrate those VMs, e.g. zfs replication is delta only, but I would not want to schedule it every minute for many VMs. If you do not mind your HA-migrated VM starts up from dataset that is minutes old, it's good enough and more "performant" than CEPH would be on 3 nodes even.

We currently have one VM with Plesk panel and MySQL. I plan to split them into two, so one would be data and webserver and the second will host databases. How often can I replicate that data? From reading previous posts, I understood I can setup data replication of the whole ZFS pools that would be running as fast as possible. These two servers would be in one rack connected with private 200Gbit, so I can move the data very, very quickly.

I would love to have the data stale in seconds at maximum. Is that possible? or do I need to have CEPH storage and three servers for that? I checked and we changer around 5-20GBs of data every 30 minutes and that is including customers cloning their sites or dev projects.

Thank you for you time and answer!

esi_y · Jan 15, 2024

simoncechacek said:
We currently have one VM with Plesk panel and MySQL. I plan to split them into two, so one would be data and webserver and the second will host databases.

So apparently the webserver is not an issue, the DB might be. Replication jobs have to run scheduled, not that it won't be possible to run them every second (but how will they be finishing?

), but PVE allows for 60secs minimum I think. If you were to e.g. have that DB HA migrated, it would start up with up to 60secs old data - I am not sure how you would tolerate that, I would not.

simoncechacek said:
How often can I replicate that data? From reading previous posts, I understood I can setup data replication of the whole ZFS pools that would be running as fast as possible.

You need to schedule it manually for every VM, in 2-node cluster trivial, but just saying. See the UI, 1min granularity I believe.

simoncechacek said:
These two servers would be in one rack connected with private 200Gbit, so I can move the data very, very quickly.

I do not see why manually you would not be able to do the ZFS replication faster, but it's not going to be "supported" officially.

simoncechacek said:
I would love to have the data stale in seconds at maximum. Is that possible? or do I need to have CEPH storage and three servers for that?

See above. Or shared storage. I do not think CEPH makes sense on e.g. 3 nodes even (which is minimum).

simoncechacek said:
I checked and we changer around 5-20GBs of data every 30 minutes and that is including customers cloning their sites or dev projects.

Thank you for you time and answer!

esi_y · Jan 15, 2024

See: https://pve.proxmox.com/wiki/Storage_Replication#_command_line_interface_examples

It's really the design decision here to make anything below 60secs not practicable. I am sure zfs send|receive would run on that even each 5 secs. Nothing prevents you from scheduling it yourself, but I would test that thoroughly.

simoncechacek · Jan 15, 2024

Ok, thank you for the insight.
Another idea is to have only the webdata replicated and the database in a MySQL cluster, Then it should have the data synced between those MySQL VMs and should be able to switch.

One minute on user data is probably doable as DB is the most important one.

bbgeek17 · Jan 15, 2024

simoncechacek said:
Another idea is to have only the webdata replicated and the database in a MySQL cluster,

Application level DR is always a better choice than backend storage replication, especially with database.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

simoncechacek · Jan 16, 2024

Thank you @bbgeek17 for the info! Do you have any recommended ways how to run MySQL/MariaDB replicated? I read something about Percona, but this replication is new for me. My biggest issue is that websites are usually made to access DB under one IP only, so I am not sure how to always deliver the data when the one VM dies.
Maybe using some plugin to connect multiple MySQL servers into WordPress?

bbgeek17 · Jan 16, 2024

Hi @simoncechacek, I dont have any specific recommendations for you regarding MySQL replication. As far as I am aware there are built-in ways as well as 3rd party solutions. There are many articles online that cover this subject, ie "mysql cluster replication tutorial failover ip".

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

simoncechacek · Jan 18, 2024

Hi there and thank you.

My current plan is to have the Web VM synced as often as possible. Then use Database cluster on both nodes so the DB will be always online even when one host dies.

As for the replication, I read the https://pve.proxmox.com/wiki/Storage_Replication page and there is just something I want to clarify. Is there a way to run this continuously, or if not, can I start it once a minute and somehow check the last replication is done? Ideally I want to sync these ZFS storages continuously, I will even have a dedicated mutliGbit link for that between those two nodes.

I also looked into options like https://github.com/jimsalterjrs/sanoid but I dont know if I can achieve what I want with that and if syncoid will be usable with Proxmox's HA solution.

If am asking too much or being stupid, just let me know. I am trying to understand but this is the first time I am at the same time learning PVE and also setting up a HA. I only used Vmare's fault tolerancy in a lab enviroment.

esi_y · Jan 18, 2024

simoncechacek said:
I read the https://pve.proxmox.com/wiki/Storage_Replication page and there is just something I want to clarify. Is there a way to run this continuously, or if not, can I start it once a minute

There's literally a sentence there in your link:
"The minimum replication interval is one minute, and the maximal interval once a week"

It is unfortunately all based on the same basis as pvesr above.

I am not sure, even if you were to hack it, what would happen if you were trying to run it too early, say every second, in case the previous job has not finished yet. After all it's just ZFS snapshot replica being sent over to the other node, that's a delta to the previous state stored there.

Search

Search

Proxmox HA setup with only two servers running VMs

simoncechacek

New Member

esi_y

Renowned Member

simoncechacek

New Member

esi_y

Renowned Member

esi_y

Renowned Member

simoncechacek

New Member

bbgeek17

Distinguished Member

simoncechacek

New Member

bbgeek17

Distinguished Member

simoncechacek

New Member

esi_y

Renowned Member

We value your privacy