Sick of VMware, questions about setting up HA on Proxmox

rsr911

Member
Nov 16, 2023
60
5
8
Hello, new here.

I have VMware 7.0 that I've struggled with to setup a two node HA vsan cluster with a witness node. Here is my hardware:

3 Lenovo dual CPU RD440 servers with 128gb ram each. All three have 118gb optane drives for caching and 1tb ssds for boot. All three have 8 port Avago tri mode raid/hba cards. Servers hold 8 drives but I have the boot ssds installed where the DVD used to be. My two servers for VM/vsan have 8 1tb ssds currently in raid 5 with hot spare. Third server has boot drive, optane as mentioned, unused 1tb ssd, and 5 8tb HDDs I plan to use with a backup VM.

I need to run two windows servers and eventually an SQL server and want to do this on an HA cluster. My plan was two physical servers to hold vsan data and be the two nodes and use the third server as a witness (monitor in proxmox) by running the esxi witness appliance. This would leave resources free on that server to run a backup solution likely in a VM.

For networking all three servers have the 1g hardware management port (IPMI I believe) 2 1g onboard NICs currently unused. 2 spp+ 10g dual port cards and 2 10g RJ45 dual port cards. At present 1 SFP card in each machine is attached to a standalone L3 switch for vsan traffic, the other SFP plugs into my L3 frontend switch, and the RJ45 cards will also plug into the L3 frontend. Actually in time I will move one server to another building using existing fiber and existing L3 switch identical to the current frontend switch. These switches are connected by a pair of fiber connections and a pair of RJ45 over cat6a.

My goal: have two windows domain controllers with one serving a database and the other a file server. In time this Access databases back end will be migrated to SQL. I need HA for these servers. I would like to not have to buy a fourth server. My backup storage is enough for nightly, weekly, monthly and annual backups. I don't have a lot of data, around 2tb at most. Only a small portion needs nightly backups, around 200gb and I only need to backup changed files.

SO! Can I do two main storage nodes using CEPH or something like it and use the third server just for the quorum? Can the third server then run my backup solution (looking at proxmox here as well)? Is it ok to keep my hardware raid intact? Can I cache the storage traffic with my optane drives? Lastly can I set all this up on the free version for testing and then easily add subscriptions later?

In short I am about to just ditch the $7k we spent on VMware essentials because its a pain in the butt, not intuitive, clumsy interface, limited hardware compatibility (my servers can't run 8.0) etc. I'd rather pay annually for Proxmox if it can make my life easier. I'm small business that normally could run on a single server and a backup solution but the need for a database for production has arisen and it will need to run on HA, even of its minimalist as every part of my production process will have a PC running the frontend of the database for data entry to get us off a paper system. The database is built and has been beta tested. Now I just need the hardware and virtual HA solution to run it on and I'm not impressed with VMware.

So I guess my main point is SELL me on Proxmox! Can it do what I think it can do and what recommendations might you experts have because I've not even installed it yet but some reading online and watching videos is really pushing towards it.
 
SO! Can I do two main storage nodes using CEPH or something like it and use the third server just for the quorum? Can the third server then run my backup solution (looking at proxmox here as well)?
Well, there two things to differentiate here though: For the Proxmox VE / corosync cluster itself you'll need at least three nodes, which can also be done using two servers and a third, external QDevice: Corosync External Vote Support, which very well can be run on a Proxmox Backup Server as well. Similar to the ESXi witness appliance.

Preface: I'm definitely not a Ceph/hyper-converged storage expert. Anyway, Ceph needs at least three nodes/monitors. It can work with two nodes/monitors, but it's discouraged. That's a general rule for anything hyper-converged though.
But if you are fine with the disadvantages/risks associated, you can of course go that route. Maybe someone else can eloborate on that a bit more.

Storage replication might be a topic worth mentioning here. It's possible (see also the disclaimer in the admin guide) to achieve "pseudo" HA (for the lack of a better term). But that involves longer downtimes and possible data loss - which is less then ideal for a (production) database though.

Is it ok to keep my hardware raid intact?
Depends. If you want to use ZFS, no. But ZFS natively supports software RAID, which often is a good choice anyway IMHO (as you don't have that blackbox between the OS and the disks).
If you plan on using ext4 or XFS, then yes, you can keep using a hardware RAID.

Can I cache the storage traffic with my optane drives?
Again, depends a bit. If you use ZFS, then you can add them as ZFS ZIL SLOGs. You can just search for this term on the internet, there's a lot of information around on that topic, e.g. https://www.servethehome.com/what-is-the-zfs-zil-slog-and-what-makes-a-good-one/.

Lastly can I set all this up on the free version for testing and then easily add subscriptions later?
Yes! The no-subscription version has no restrictions and has the same exact feature set as the enterprise version. The main difference in packages is that the enterprise version features are more stable / better tested versions, i.e. no-subscription is not considered production-ready.

In general, our Admin Guide is very extensive and should cover everything you might need.
So you can just install it and try things out! Glad to hear you are considering Proxmox VE and were immediately drawn to it. :)

If you have anymore question don't hesitate to ask!
 
  • Like
Reactions: rsr911
Thank you for the detailed reply.

A couple questions. My Raid cards are Avago 9400 series which will run in either Raid or HBA mode but not both. I'm not married to running raid but wouldn't mind making the boot disks a Raid 1 array which would mean to not use raid for the rest of the disks attached to the cards I would need to run the other 6 drives on the card as individual raid 0 "arrays". Do you think this would work with CEPH i.e. would it see them as individual drives? Alternately I run my home PC on three NVME's in linux raid, is it possible during setup to create a bootable software raid array? Or would I be OK with just having current backups of the boot drive?

When I say I need HA i don't mean like in a large office environment. I have 18 users total. Downtime of up to an hour isn't preferred but would be acceptable. I expect that in the future I will add a third data node as these servers and SSD's have gotten cheaper. To that end it seems CEPH or ZFS make the most sense and I am leaning towards CEPH.

I don't care about the optane drives. VMware required them but they really are a single point of failure. If I don't need them, or there is no real advantage it makes more sense to drop in another dual port nic and have two bonded pairs in each server for the storage traffic, a main and a failover. This would give me 20g connection over fiber with a failover, because I'd be freeing up a PCI slot.

Just so I am clear. I can use my third machine as monitor only for now and add more servers (nodes) later? I kind of like that idea. But I do wonder, can CEPH run three data nodes and four monitor nodes sort of like a raid 5? What I mean is if I added a fourth server (third data server) could I gain close to 100% of it's capacity?

And yes from what I am seeing so far I really like Proxmox in concept AND I get my beloved command line it seems!!!
 
Do you think this would work with CEPH i.e. would it see them as individual drives?
That should be avoided.

Alternately I run my home PC on three NVME's in linux raid, is it possible during setup to create a bootable software raid array?
Only when installing Debian 12 using the Debian installer as a mdadm raid1 and then PVE on top. But the PVE installer supports ZFS if you need a software raid1.

Just so I am clear. I can use my third machine as monitor only for now and add more servers (nodes) later? I kind of like that idea. But I do wonder, can CEPH run three data nodes and four monitor nodes sort of like a raid 5? What I mean is if I added a fourth server (third data server) could I gain close to 100% of it's capacity?
I'm no no ceph expert, but as far as I know you usually want 3 copies of everything spread across all nodes and then you want enough free space so when a server fails there is enough space for the remaining server to compensate the capacity loss. So of the 33% usable capacity you got you again want 33% be alwys free with a 3 node cluster. So more like 22% of usable raw capacity and not the 100% you want...
 
  • Like
Reactions: rsr911
is it possible during setup to create a bootable software raid array?
I hope not to confuse matters here: I only once used the Proxmox-ISO to install my first server, to have little chance of mistakes. I don't recall the option being there in that version (PVE6, maybe 5).

As you might have read, Proxmox runs on plain Debian. Ever since I installed Debian (with all its features during installation) and added Proxmox on top to full satisfaction.

One curious installation uses two micro SD cards in mirrored LVM (onboard micro SD slot + micro SD adapter in onboard USB A connector) to save a precious SATA port on the small server. I'm not sure whether it had been possible with the standard Proxmox installer, but with Debian installer: not a problem. (Mind you, as an example of the flexibility of Debian + Proxmox, not as a best practice for HA nodes ;-) )
 
  • Like
Reactions: rsr911
One curious installation uses two micro SD cards in mirrored LVM (onboard micro SD slot + micro SD adapter in onboard USB A connector) to save a precious SATA port on the small server. I'm not sure whether it had been possible with the standard Proxmox installer, but with Debian installer: not a problem. (Mind you, as an example of the flexibility of Debian + Proxmox, not as a best practice for HA nodes ;-) )
Have a look at the minimum hardware requirements and you will see that SD-cards aren't fitting that and at least a hard disk is required. It`s highly recommended not to use any (non-industrial) SD cards and pen drives as PVE is writing a lot and this cheap flash storage without proper controller chip doing wear leveling, DRAM caching so on can't handle that well.
That's probably why no one cares about adding support for eMMC or SD cards to the installer.
 
Last edited:
  • Like
Reactions: rsr911 and wbk
I have 18 users total. Downtime of up to an hour isn't preferred but would be acceptable.
In that case: how much of the need for Ceph is wanting to play with its features and the servers, and how much of it is need for 99.999% uptime (ie, 5 minutes per 365 days of 24 hours)? It matters of course whether your company runs 24/7 or more or less business hours.

Does the added complexity (and load) of Ceph weigh up to having a simpler mulcti-node cluster? Seeing you expect less than half a terabyte of data, restoring from backup would take less than five minutes over 10 Gbit connections. You could play with snapshot/backup frequency to minimize data loss in case of a total meltdown of one of the nodes.
 
  • Like
Reactions: rsr911
Server downtime once the database is up and running will be a pain. Daytime not so bad but we have two shifts and may add a third. Meaning I'd have to come in in the middle of the night to fix a server. So that's the main driver behind and HA cluster. Our data is mostly MS office stuff and this database. The other issue involves the FDA and my customer audits. When I go to completely electronic record keeping there is no going back to paper. There are government regulations as well as customer requirements in play here. So any downtime of the virtual servers means production will stop and cost whatever the combined hourly rate of my workers is while they stand around waiting on a way to enter their data. We make generic OTC patches. We aren't a big company but we are growing, so a lot of this is future proofing the system. I imaging in 3-5 years having all brand new servers and very possibly double the sales and double the production people. Building it out now with what I have should (at least I hope) be easier than trying to do this when I HAVE to and will be under pressure to get it down. I migrated my Windows server VMs to a barebones server with a HDD raid array while I set this system up. It's stressing that old system but it's holding up for now.

Based on advice so far this is my plan:

1) Destroy my raid arrays and change the cards to HBA mode on servers one and two for data.
2) Remove the unneeded optane drives
3) in their place add more networking for more failover and load balancing on both the cluster side and the public side.
4) Go ahead and use the third server just as a monitor for now.
5) Look for a fourth server and configure it just like one and two and have a normal 3 node cluster.
6) Not worry about mirrored boot arrays. Regular backups can handle this task I think.

Anything I am missing?

I installed proxmox on all three today. Since the optanes are not used at this time I made a three node cluster and three node CEPH and created a Ubuntu VM. Even with just one network card configured I am able to migrate it. Setting up HA next. So far this is far easier and intuitive than VMware ever was. I made more progress in one day than in months of messing with VMware.

Also I purchased another server and drives. Going to go the recommended route of three nodes. The fourth will keep it's raid with proxmox backup installed. It will be connected on the public network 10g with at least one dual port NIC.
 
Oh one other question. When I setup CEPH I think I did something wrong. I have three monitors showing but only one has an IP, the other two show unknown. Also have three managers and all three have unique IPs.
 
Well thanks for the responses. Ive decided that Monday I will wipe out my raid arrays and switch my Raid controllers to HBA mode and then wipe all the drives. Going ahead with Proxmox from here on out.

Does it make sense to collect all my SSDs (24 total, 8 per server, plus 3 cold spares) serial numbers and map where they are in the cluster on a spreadsheet? I'm assuming I can look at drives in the shell or web interface. I'm home for the weekend, is there a way to view SMART data from the web interface? Mainly doing periodic checks of drive health.

How is a dead drive removed and replaced? I'm used to Raid with backplanes and hotswap. I know nothing about ZFS or CEPH so software arrays like this are somewhat new to me. I'm used to Raid cards beeping at me and looking at the server and physically seeing a red light on a dead drive tray.

I'm sure these are all newbie questions. I appreciate the help!
 
Does it make sense to collect all my SSDs (24 total, 8 per server, plus 3 cold spares) serial numbers and map where they are in the cluster on a spreadsheet? I'm assuming I can look at drives in the shell or web interface. I'm home for the weekend, is there a way to view SMART data from the web interface? Mainly doing periodic checks of drive health.
"Yes", on both counts, I'd say.

  • Apart from writing down the serial and the location, you could might label the disks physically and write down the label info together with the serial and location;
  • You can see the info of the drives in the web interface, see attached screenshot; double click for SMART values;
1700459505951.png

How is a dead drive removed and replaced? I'm used to Raid with backplanes and hotswap.
Your current setup is also hotswap, I suppose? Here the registration of the physical location of the disk comes in handy when the need arises. My own hardware is only half-baked in that respect, so I'll leave useful answers to others.
 
  • Like
Reactions: rsr911
Thanks wbk! I've been at home away from my servers for the weekend so I can't see the interface but thanks much for the screenshot. This is perfect as my servers are all hotswap backplanes connected to Avago Raid controllers that I will be switching to HBA mode. Knowing where a dead drive is in my rack will be great with a serial number map.

Being able to see smart data easily is also great because it ties IT into doing something production has always done. That is logging hours on machinery, doing preventative maintenance etc. Now in addition to regular monthly and annual checks on workstations and servers we can check and log things like TBW so we replace drives before failure. This has always been a downside to raid controllers IMO because by hiding smart data from the OS you have to use their software to view smart and sometimes that software doesnt play nice (was a royal pain to get working in ESXI). In HBA mode being able to see it will be great.

My entire system is intended overkill. I'm getting older and have more responsibilities at the company. And being small the younger guy I hired to help with IT also has other responsibilities. Making the system HA and easy to manage is important and frankly has been my biggest draw toward Proxmox. In short I'd like us both to be doing work that earns the business money than spending tons of time attending to infrastructure. In truth the whole company could run on a bare metal windows server and backup solution. But having virtualization, HA, a secondary domain controller etc should mean we aren't panicked when something goes down. I'm tired of late nights fixing servers so the whole plan here is to build in as much redundancy as possible. Couple that with proactive maintenance and I am hoping its a recipe for planned downtime rather than emergency downtime.

As I stated I bought a fourth server, drives, RAM and NICs. And posted in another thread asking how to best lay that out. 3 node cluster plus backup server or 4 nodes with backups on a VM with both data and backup drives spread across four servers. I've spent my weekend watching YouTube videos on the topic. 45drives had some interesting info on adding nodes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!