Small Cloud Cluster design and strategy

mkultra

New Member
Jan 12, 2025
6
3
3
Currently in the process of redesigning our infrastructure. We currently have 2 bare metal linux servers cloud server.

We have a very limited budget for about 3 rather small servers. I want to take advantage of virtualisation to separate our workloads, redundancy, take advantage of snapshots, quickly deploy environments for our dev team. This is the general design I came up with :
  • Two Proxmox nodes running on about those specs : 8C/16T 32GB DDR4 2x1 TB SSD
  • A smaller server hosting a firewall (bare metal), probably pfSense/OPNsense, to act a gateway for the hypervisors.
I expect about 4-5 VMs to be run full time. Initially, I initially wanted those three servers to Proxmox nodes, but I prefer to not expose the hypervisor itself to the internet.

Each servers has two NIC, one for internet connectivity (which wouldn't be used for the two Proxmox nodes) and one connecting to the cloud provider private network. Private bandwidth is about 1Gb/s.

With those constrains, it seems to me that using Stardwind vSAN is the only reasonable option for such a small cluster. Still, vSAN require the use of multiples private NIC (heartbeat/replication).

As for quorum : FreeBSD has a port for corosync. Kind of a hacky solution, but the firewall could (maybe) be used a qDevice. If not, provision the smallest usable server/vps/instance to act as qDevice.

I have little to no experience with that sorts of deployements. What would be the best way to leverage our limited ressources ?
 
Why not run 3 node cluster if getting three servers? Gives you wider options and means one less single point of failure.

Could then run pfsense/OPNsense as a VM with the benefit you can have it sync (using ZFS for example since down to the second updates prob not needed) to the other nodes and fail over.

May need think about network setup so for example install dual nic in each server just for firewall, with one being External and one being internal. Assuming you have a switch you could setup a VLAN group for all the external network to keep that isolated on the switch from the internal network.
 
Why not run 3 node cluster if getting three servers? Gives you wider options and means one less single point of failure.

Could then run pfsense/OPNsense as a VM with the benefit you can have it sync (using ZFS for example since down to the second updates prob not needed) to the other nodes and fail over.

May need think about network setup so for example install dual nic in each server just for firewall, with one being External and one being internal. Assuming you have a switch you could setup a VLAN group for all the external network to keep that isolated on the switch from the internal network.

I felt exposing the hypervisor itself is not that great of an idea.

Since those are cloud servers, I have no control over the underlying infrastructure and network layout of the datacenter. I do know however that each server have two NICs, one external and internal.
 
  • Like
Reactions: Johannes S
Another Option might be to install ProxmoxVE on the third node without adding it to the cluster. So you could use it for a PFsense and ProxmoxBackupServer VM. The PBS vm then can also serve as qdevice.
I have no experience with Starwinds VSAn, so I leave that part to others
 
Last edited:
  • Like
Reactions: mkultra
Another Option might be to install ProxmoxVE on the third node without adding it to the cluster. So you could use it for a PFsense and ProxmoxBackupServer VM. The PBS vm then can also serve as qdevice.
I have no experience with Starwinds VSAn, so I leave that part to others

Thank you for the suggestion. I did not think of it. PBS would also be a much needed solution for backing up the cluster. I could order a second public IP to assign directly to my *sense VM. Now I just need to think about hardening security for hypervisor itself.
 
  • Like
Reactions: Johannes S
What do you mean ?
Ofc, nobody should leave 8006 port open the wild,
Setup firewall rules (with assisted Proxmox VE interface or iptables).
I guess I'm a bit paranoid. I like my firewall to be it's own device in front of the hypervisor.
Having two public IP (one for the firewall, one for proxmox) would probably make this process much easier without having to pass the entire NIC to the VM.
 
Thank you for the suggestion. I did not think of it. PBS would also be a much needed solution for backing up the cluster. I could order a second public IP to assign directly to my *sense VM. Now I just need to think about hardening security for hypervisor itself.
You would still need a solution for a offsite backup in case your datacenter ends up in fire like the OVH one in Straßburg.
PBS allows to sync between PBS so that would be the road I would go ( e.g. via a small Server in your office)
 
You would still need a solution for a offsite backup in case your datacenter ends up in fire like the OVH one in Straßburg.
PBS allows to sync between PBS so that would be the road I would go ( e.g. via a small Server in your office)
That would be ideal. But I'm already pushing by asking a 3rd server.
Until I can convince management, I might as well install a PBS vm on my decently powerful work computer. Not great, but It's better than nothing.
 
Another way would be to rent a PBS cloud Service from a Provider like tuxis.nl or inett for offsite backup. Of course you would than need a monthly Budget for backup costs.
Or your repurose a old Desktop PC as Office PBS.
Management should really be taught on the 3-2-1 rule though
 
  • Like
Reactions: mkultra
That would be ideal. But I'm already pushing by asking a 3rd server.
Until I can convince management, I might as well install a PBS vm on my decently powerful work computer. Not great, but It's better than nothing.
Ask the business what Recovery Point Objective (RPO) (time to be back live) and Recovery Time Objective (RTO) (amount of data loss) they want..... then tell them the budget they will need for it and hopefully get a big of budget for something realistic :)

We played with Starwind a few years back and also looked at other option. If you are only running 1Gb/s internal network then that would become a bottleneck especially if you were running synchronous (has to write to both storage before treating it as committed).

Could look at ZFS replication between nodes but comes back to RTO/RPO objectives.
 
  • Like
Reactions: Johannes S
Ask the business what Recovery Point Objective (RPO) (time to be back live) and Recovery Time Objective (RTO) (amount of data loss) they want..... then tell them the budget they will need for it and hopefully get a big of budget for something realistic :)

We played with Starwind a few years back and also looked at other option. If you are only running 1Gb/s internal network then that would become a bottleneck especially if you were running synchronous (has to write to both storage before treating it as committed).

Could look at ZFS replication between nodes but comes back to RTO/RPO objectives.
ZFS replication works great but then one should also really have a dedicated network for data transfer. And one need to live with the asyncronity, meaning that you will always loose some data in case of an unplanned takeover event (planned maintenance is different of course), depending on the set replication schedule. The default is fifteen minutes, mean that the data of fifteen minutes will get lost. This can be extended to several hours or reduced to one minute but depending on the application this still might be too much. For SQL databases for example it might be worth a shot to setup a cluster constisting of two VMS (one for each ProxmoxVE node) which replicate their data between them via native database features.
Relevant doc:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pvesr
https://pve.proxmox.com/wiki/Storage_Replication
 
  • Like
Reactions: mkultra
We have a very limited budget for about 3 rather small servers.
Since you are already looking to deploy in the cloud, why not just use VPS for your applications? VPS already offers you backing HA, snapshots, etc; Why bother doing that twice?

With regards to firewalls- use local firewalls for the instance (which should also be provided by the VPS), and a WAF for your applications. Not everything needs a heavy lift solution.
 
  • Like
Reactions: Johannes S
Since you are already looking to deploy in the cloud, why not just use VPS for your applications? VPS already offers you backing HA, snapshots, etc; Why bother doing that twice?

With regards to firewalls- use local firewalls for the instance (which should also be provided by the VPS), and a WAF for your applications. Not everything needs a heavy lift solution.
I made several propositions. One completely ditched virtualisation in favor of using public cloud style instances, managed databases and private networks. It would be far easier to administer, with HA and redundancy built in.

We do have some "specifc" requirements, such as IPsec, BGP and GRE.
 
  • Like
Reactions: Johannes S