Good point. However we are a small family owned company and as a 25 year employee and vice president that family has been transferring me stock to the point that when our President (also their brother) retires, I will own controlling interest in the company. In short I have no reason to leave. Second Proxmox wasn't my idea. I've hired a younger guy I'm training as my assistant but to also be my main IT guy. After months of struggling with VMware trying to make the two node plus witness vsan we were told we could make only to find we had outdated hardware at every turn this younger guy suggested we just give Proxmox a try. Since he's been busy doing other things for me I started working on Proxmox. I had a cluster up and running on a single nic in less than a week. Then I tore that down and put in all my drives and NICs, ran into a now solved issue with ceph IPs and have this test cluster up and running.
I emphasize test at this point. We've put up a Linux VM and Windows server VM and have been doing failure testing. Aka pulling power on a server, unplugging network cables, pulling drives etc. No rebuilding a missing OSD isn't as simple as old school HW raid a three node, eventually a five node, is far more fault tolerant.
Why old equipment? Well our fire happened in May 2021 at the height of pandemic shortages. Our insurance payout did not cover the increased costs of building materials and other inflated items. We're done rebuilding now but we had our first losing year since the 2008 recession. I already had three RD440's running on single CPUs and only 32gb ram each. Now I have 4, soon to be 5, with both CPUs and 128gb ram each. The long term plan is to replace nodes one by one as budget allows. That's actually one reason I knew about the issue of using the same name and IP for a replacement and why I've reserved IPs in my network for future nodes. My idea, right or wrong, is to buy new servers, fail an old one, install the new, and let ceph populate it. I'd rather save money now and keep giving my people deserved raises than just dump a fortune into new or newish hardware. Plus I'm thinking I can transfer much of what I have to new servers like the dual port NICs of which I have four per server. At 10g everywhere my network is 10x faster than its ever been.
I bought these optanes because VMware needed them for vsan cache. May as well use them if I can. Failing that replace the current consumer drives with enterprise drives, like I need to do on the ceph cluster itself.
So in short, even with old hardware, this cluster is a rather big performance jump from two single CPUs running individually and on HDDs to five dual CPU, four times the ram, 10g networking, and all ssd storage. In baseline testing my VMs are clearly much faster. I'm not bottlenecked by a slow network, slow drives, not enough cores and ram etc.
Seems to me the solution here is use the optanes as described or install small enterprise drives for boot. Maybe not even worry about backups of the boot drives.
Having said that, please correct me if I'm wrong, but it seems like clusters are a lot like hardware raid cards. Replacement servers need to be "clean" just like raid cards will only use fresh empty drives to rebuild on. Which means to me if I have a server fail just replace what's broken, do a clean install and configure networking and join it to the cluster.
I would argue that even commodity equipment when run in a cluster gives you better resiliency than beefed up redundant server gear from 10 years ago. Of course if it can handle the load. This does not concern you, but when someone buys a 10yo server for a homelab, what they would have been better off doing is e.g. buying 5 NUCs for that homelab and run all sorts of clusters.
Yes, cluster node should be something without value of its own. You have the VMs/CTs and you have them backed up to be able to put them back in case of some catastrophic event (e.g. fire). But under normal circumstances (i.e. hardware parts failing) you just go exchanging parts. Someone might argue a system drive without RAID has no way of self-detecting bit rot, but for a system drive of a node ... well you will see it behaving oddly, logs becoming strange and simply shut it down, replace, rinse, repeat.
BTW There are other solutions than PVE, I am not saying better, but somehow people treat it like there's just all those expensive commercial ones and then PVE as the alternative. There's the not-so-comparable alternatives like OpenStack, but also the little forgotten ones like XCP-ng. If you have time to experiment, you may want to give the latter a try too. But it is very different to PVE, after all it's Xen. As for containers, there's even LXD (they added VMs later on), but last time I remember it was not as complete in terms of High Availability, etc.
One more thing .. if you plan to run HA anything, do a lot of tests with PVE, especially with networking, pulling cables, etc. There's enough horror stories of people trying to run HA ending up with lower total uptime because of self-fencing and endless reboots by the watchdog. If you do not plan to use any HA, it's not so critical, even with corosync impaired, it certainly won't go on to reboot your nodes.