I'll have to read more about fencing but as i understood it, if I do NOT use it and not add these nodes to the fencing domain, and replicate the VMs that need replication but which are largely static (though perhaps I could sync them weekly or something for the occasional change) then the issue you describe above wouldnt be an actual issue, correct?
But what's the point of the third node, if VMs can't ever migrate to it?
My understanding is that it's a bigger problem with shared storage which wouldn't be used. I'd have all of my most needed VMs replicated to the spare from both "main" devices whereby, when the spare spins up, it would start whichever VMs are needed and not anything extraneous.
How will you be replicating to a node that is off? You will be scripting those auto-starts without HA?
When the main one comes back online the VMs would sync anyway and handoff would occur again albeit with another delay after the shutdown of the backup.
That is another HA migration? Because VMs do not migrate back in PVE unless the node fails.
Because the thing is, high availability is NOT my main goal. I prefer the simplicity of running all of the nodes in a cluster
There's a nothing simple about clustering in PVE, it's a brittle system, it also shreds your SSDs (more). If HA is not a goal, it is literally simpler to have one host with all the VMs, not even PVE.
and my issue has always been that they wont run with only a single node without changing the quorum setup as you alluded to above (change one of them to have two votes which may be the route I go but offers no redundancy, or use a Q device which I dont like and has no redundancy).
I am not sure what you mean with Q device providing no redudancy, it literally allows - reliably - 2 nodes to be in a cluster which otherwise would be limited features setup.
What I want you could perhaps call 'medium availability'. It's a homelab so nothing is mission critical, if I have to wait 5 minutes that's ok
Then I would prefer host with disaster recover = good backups. If I already had two hosts, I would run them separately and keep replicating without clustering.
, but I'd prefer to not have HAProxy,my VPN, or Home Assistant go down for much longer than that. Apart from historical data on home assistant, these VMs in particular would be fine if ran with data that's a week or two old.
See above.
Essentially I'm looking for a 2-node cluster with a cold spare for only the more critical VMs, which in normal circumstances may be running on either of those nodes.
In my opinion, you are not looking for a cluster, you are using ill-chosen solution for the use case and letting it force you to believe you require a cluster because that's the only thing that it can provide.
Is there another way you can think of to do that?
If you insisted on PVE, then it would need (without Q device if you insist on that too) the sort of options* from corosync that Proxmox team do not vouch for: last_man_standing and do not use HA or if you insist on HA (as implemented by PVE), that would be a bit of gamble but wait_for_all.
*
https://manpages.debian.org/unstable/corosync/votequorum.5.en.html
I can't run everything on one node so HA with a Q device isn't going to work since one node failing will leave me without services
I am not sure I got you on this one.
and yet, paying for another 130W of continuous load (old Broadwell V4 servers) just isnt doable for me financially either.
This is a recurrent topic on the forum, I cannot know from your limited description what else on those old-gen servers cannot be easily substituted, but generally speaking a single new mini PC, compute & storage wise can easily handle similar workloads at ~25W in my experience. For some homelabbing, the servers could be on-demand only then, or simply virtualised on its own.
If it were say a database of customer info, I'd totally agree with you...but given the use case do you really think it'd cause lots of problems?
Your originally described setup? If you ignored everything above and pressed on with PVE on 2 old hosts, I still believe it's overly complicated setup as opposed to corosync config options.