Proper Network Setup for HA Clustering??

modem · Aug 18, 2024

Good afternoon all!

I've been toying with Proxmox for about two years on and off in a sandbox environment in my home lab. I had just never pulled the trigger to moving all of my stable Hyper-V VM's over to it but now I really feel comfortable to do so.

Currently I have 5 servers (several HPE DL380p's, PE 540's & 630's). While I have got clustering down pretty well, I would like to move to High Availability networking on these. I do NOT have a drive shelf, all storage is on the various servers. Each server will have anywhere from 10-20TB of capacity and 192GB or more of RAM.

I would like to get a 10GB switch (SFP+ preferably) and have multi-mode fiber setup between each of the servers for their redundancy / HA capacity while the VM's would continue to have their own dedicated NIC's on a 1GB switch for serving the various needs (Plex, AD, DHCP, etc etc).

Here's my main question. I read about setting up a separate LAN (or vlan which I'll do) for the HA communications/migrations. What is the best method for this? Good reliable articles/videos for this? I've read in places that XX amount of RAM needs to be available for HA cluster networking?

I just need more details as I find this very similar to VMWare we use for my day job, but want to broaden my horizon's even more.

Thanks!

sw-omit · Aug 18, 2024

I can't help you with all questions, but I might be able to give a few pointers at least:

First of all, try to dedicate 1 port (doesn't need to be SFP/10GB, a 1 GB line works fine) for JUST Corosync if you're using HA, so no user-traffic and especially no storage/replication traffic (preferably even no proxmox-management traffic).
Corosync does not use a lot of bandwidth but is susceptible to latency, and if something decides to fully use that connection for too long, and cause one or more of the servers to lose quorum with eachother, it might literally reset-kill server's so that other servers can take over it's task (so no graceful shutdown of VM's, no pausing them, just kill-off and reboot the host). This is so that it knows that the other servers in the cluster can safely boot up the VM on one of the other nodes without things like the storage being in use or an IP-address conflict.

Secondly, yes on the separate networks (and possibly vlan's) for the different traffic, else it will not know what route/port to take to the other side of the cluster. For the storage-network, there are plenty of ways to do it, all with their pros and cons. I would always personally put the storage on a bond, in either failover (easiest and works always, but not as quick and no combining of the speeds) or one of the other options depending on what your switch supports (you might want to look into this before deciding on what switch to get). You can put the IP's directly on this bond, or, if you want to connect (some of) your VM's/Containers over this 10GB-link as well, you add the IP on a bridge that you connect to the bond.

For the part about RAM, that MIGHT be talking about CEPH as your storage-backend which USED to use up to 50% of your ram if you didn't change things, although I believe semi-recently it was lowered to 10% or X amount of GB (don't use CEPH so I don't remember all the details of it from my training, since I knew it wouldn't be useful for my company in the short-term at least).

Finally, what personally helped me a lot (especially to finally decide "yes this is the route we should take away from esx/broadcom") was following the full proxmox-training (did the bundled, on-site, training myself, but there are also online courses available): https://www.proxmox.com/en/services/training (the other thing that helped though was the ESXi-import tool releasing DURING my training

)

Also as a side-note, we personally don't use HA, just cluster (and with a shared-storage backing), so maybe also wait for input with those with more hands-on experience too.

UdoB · Aug 18, 2024

sw-omit said:
For the part about RAM, that MIGHT be talking about CEPH as your storage-backend which USED to use up to 50% of your ram if you didn't change things, although I believe semi-recently it was lowered to 10%

"has been 50%; is now 10% by default": you are talking about ZFS, not Ceph ;-)

For Ram usage per Ceph daemon look here: https://docs.ceph.com/en/nautilus/start/hardware-recommendations/

sw-omit · Aug 18, 2024

See, that's what I meant by only remembering parts of it since the it wasn't something I had to use

Thanks for clarifying/correcting that @UdoB

modem · Aug 19, 2024

sw-omit said:
See, that's what I meant by only remembering parts of it since the it wasn't something I had to use
Thanks for clarifying/correcting that @UdoB

@sw-omit Thanks for the reply and the indepth information. I had read a little bit on corosync. Corosync is essentially sending heartbeat signals between nodes that if one goes down, the others vote, correct?

The switch(es) I'm looking at are some Brocade ICX6610 that has 8 SFP+ ports in the front and 2 x 40GB ports that can have 2 breakout cables for 8 more SFP+ 10GB cables for a total of 16. I'm also looking at the Brocade ICX7250 that only has 8 SFP+ ports total.

I doubt I'll do bonding right off the bat until I get more comfortable in the cluster/HA setup as HA will be a personal goal/requirement. Especially if the need arises for specific VM's.

The data rack will have it's own switch (whichever one I decide on) and will have a 10 GB link back to an upstreak Ubiquiti switch which has PC clients and devices. I doubt I'll do any VM's on 10GB networking, but that could be a good idea to maintain 10GB from VM through server switch to client switch, but clients 1GbE would bottleneck it.

So the way it looks I'll have the following vlans:

- Corosync (1 GbE on regular RJ45 ports)
- VM Failover Migrations [Isolated] (10 GB on a SFP+ port)
- VM server to client regular network (1 GbE on RJ45 ports)

Anything really wrong with that thinking? I'm open to whatever options may be best as teaching me ProxMox best practices.

Thanks!

sw-omit · Aug 19, 2024

For Corosync indeed, or at least partly, every change (including things like turning on a VM/Container) gets voted on, as long as more then half (so not exactly half, that's why you want uneven clusters) says it is ok (the user has the permission, the VM is on that server, the VM is not busy with a different change, the configuration is the latest version, etc.) the action happens. So even when all servers are ok, this vote happens, just since everything is good it's near-instant.

Also, for corosync you have the option to set multiple links, use that, I would set it's secondary to the storage-network and the third link to the regular network.
That way if the switch the Corosync is linked to needs to reboot or a cable needs reconnecting, you don't need to stop the entire HA or things like that, it has the option to self-fix temporary interruptions (although taking care and possibly pausing some systems might still be a good plan)

As for those switches, they both support 802.3ad according to the spec-sheets I found with a quick google, so should be fine for any bonding setups, although I don't have any experience with Brocade/Ruckus (I do with Unifi/Ubiquiti though, was lucky enough to be able to snag one of their large aggregating switches for one of our customers before they sold out again)

As for the rest, from the information provided at least I don't see anything wrong with it. As for the teaching, as you say you have been playing around with it a lot already, probably the advanced course, which focuses on things like the clusters and the like, should probably be "enough", there are plenty of online courses for it as well, in both English and other languages, or if there is an in-person training close-by, that's an option too. The prices for all the courses are the same and should teach the same material (although the way it is taught might vary a little from person to person and company to company). Again, see the link I posted before on more info from that, I learned most I knew from some basic Linux knowledge I've gathered over the years as well as that (in my case) 4-day course (although someone there without much if any Linux experience could follow along just fine too).

Oh, one tip that didn't come up during training though:
Hard-set your network-names [1], create the .link files based on the mac to set a name that won't conflict but will show up (for example for our cluster we use endXpY as format for Device X, Port Y (so my second onboard port would be end0p2, and my fourth port on the second external device would be end2p4))
I've had the name of one of my ethernet-devices be changed slightly (they all now had an added "f0" behind it, in-validating the network-config) after an update, luckily while it was still a single node as I was in the process of setting up. It's a bit of a chore once to set it up, depending on how many ports you got, but after that you don't have to look back on it, plus if you have to change a card or add a new one you can set that up beforehand (as long as you know the mac's, as it doesn't apply the changes till a reboot), and know everything is up and running on the next boot right away... except if you find out they manufacturer swapped the counting order of the ports.

[1] https://pve.proxmox.com/wiki/Network_Configuration#_naming_conventions

Search

Search

Proper Network Setup for HA Clustering??

modem

Member

sw-omit

Well-Known Member

UdoB

Distinguished Member

sw-omit

Well-Known Member

modem

Member

sw-omit

Well-Known Member

We value your privacy