Small Cloud Cluster design and strategy

mkultra · Jan 12, 2025

Currently in the process of redesigning our infrastructure. We currently have 2 bare metal linux servers cloud server.

We have a very limited budget for about 3 rather small servers. I want to take advantage of virtualisation to separate our workloads, redundancy, take advantage of snapshots, quickly deploy environments for our dev team. This is the general design I came up with :

Two Proxmox nodes running on about those specs : 8C/16T 32GB DDR4 2x1 TB SSD
A smaller server hosting a firewall (bare metal), probably pfSense/OPNsense, to act a gateway for the hypervisors.

I expect about 4-5 VMs to be run full time. Initially, I initially wanted those three servers to Proxmox nodes, but I prefer to not expose the hypervisor itself to the internet.

Each servers has two NIC, one for internet connectivity (which wouldn't be used for the two Proxmox nodes) and one connecting to the cloud provider private network. Private bandwidth is about 1Gb/s.

With those constrains, it seems to me that using Stardwind vSAN is the only reasonable option for such a small cluster. Still, vSAN require the use of multiples private NIC (heartbeat/replication).

As for quorum : FreeBSD has a port for corosync. Kind of a hacky solution, but the firewall could (maybe) be used a qDevice. If not, provision the smallest usable server/vps/instance to act as qDevice.

I have little to no experience with that sorts of deployements. What would be the best way to leverage our limited ressources ?

NuttyNat · Jan 12, 2025

Why not run 3 node cluster if getting three servers? Gives you wider options and means one less single point of failure.

Could then run pfsense/OPNsense as a VM with the benefit you can have it sync (using ZFS for example since down to the second updates prob not needed) to the other nodes and fail over.

May need think about network setup so for example install dual nic in each server just for firewall, with one being External and one being internal. Assuming you have a switch you could setup a VLAN group for all the external network to keep that isolated on the switch from the internal network.

mkultra · Jan 12, 2025

NuttyNat said:
Why not run 3 node cluster if getting three servers? Gives you wider options and means one less single point of failure.

Could then run pfsense/OPNsense as a VM with the benefit you can have it sync (using ZFS for example since down to the second updates prob not needed) to the other nodes and fail over.

May need think about network setup so for example install dual nic in each server just for firewall, with one being External and one being internal. Assuming you have a switch you could setup a VLAN group for all the external network to keep that isolated on the switch from the internal network.

I felt exposing the hypervisor itself is not that great of an idea.

Since those are cloud servers, I have no control over the underlying infrastructure and network layout of the datacenter. I do know however that each server have two NICs, one external and internal.

Johannes S · Jan 12, 2025

Another Option might be to install ProxmoxVE on the third node without adding it to the cluster. So you could use it for a PFsense and ProxmoxBackupServer VM. The PBS vm then can also serve as qdevice.
I have no experience with Starwinds VSAn, so I leave that part to others

_gabriel · Jan 12, 2025

mkultra said:
I prefer to not expose the hypervisor itself to the internet.

What do you mean ?
Ofc, nobody should leave 8006 port open to the wild,
Setup firewall rules (with assisted Proxmox VE interface or iptables).

mkultra · Jan 12, 2025

Johannes S said:
Another Option might be to install ProxmoxVE on the third node without adding it to the cluster. So you could use it for a PFsense and ProxmoxBackupServer VM. The PBS vm then can also serve as qdevice.
I have no experience with Starwinds VSAn, so I leave that part to others

Thank you for the suggestion. I did not think of it. PBS would also be a much needed solution for backing up the cluster. I could order a second public IP to assign directly to my *sense VM. Now I just need to think about hardening security for hypervisor itself.

mkultra · Jan 12, 2025

_gabriel said:
What do you mean ?
Ofc, nobody should leave 8006 port open the wild,
Setup firewall rules (with assisted Proxmox VE interface or iptables).

I guess I'm a bit paranoid. I like my firewall to be it's own device in front of the hypervisor.
Having two public IP (one for the firewall, one for proxmox) would probably make this process much easier without having to pass the entire NIC to the VM.

Johannes S · Jan 12, 2025

mkultra said:
Thank you for the suggestion. I did not think of it. PBS would also be a much needed solution for backing up the cluster. I could order a second public IP to assign directly to my *sense VM. Now I just need to think about hardening security for hypervisor itself.

You would still need a solution for a offsite backup in case your datacenter ends up in fire like the OVH one in Straßburg.
PBS allows to sync between PBS so that would be the road I would go ( e.g. via a small Server in your office)

mkultra · Jan 12, 2025

Johannes S said:
You would still need a solution for a offsite backup in case your datacenter ends up in fire like the OVH one in Straßburg.
PBS allows to sync between PBS so that would be the road I would go ( e.g. via a small Server in your office)

That would be ideal. But I'm already pushing by asking a 3rd server.
Until I can convince management, I might as well install a PBS vm on my decently powerful work computer. Not great, but It's better than nothing.

Johannes S · Jan 12, 2025

Another way would be to rent a PBS cloud Service from a Provider like tuxis.nl or inett for offsite backup. Of course you would than need a monthly Budget for backup costs.
Or your repurose a old Desktop PC as Office PBS.
Management should really be taught on the 3-2-1 rule though

NuttyNat · Jan 12, 2025

mkultra said:
That would be ideal. But I'm already pushing by asking a 3rd server.
Until I can convince management, I might as well install a PBS vm on my decently powerful work computer. Not great, but It's better than nothing.

Ask the business what Recovery Point Objective (RPO) (time to be back live) and Recovery Time Objective (RTO) (amount of data loss) they want..... then tell them the budget they will need for it and hopefully get a big of budget for something realistic

We played with Starwind a few years back and also looked at other option. If you are only running 1Gb/s internal network then that would become a bottleneck especially if you were running synchronous (has to write to both storage before treating it as committed).

Could look at ZFS replication between nodes but comes back to RTO/RPO objectives.

Johannes S · Jan 12, 2025

NuttyNat said:
Ask the business what Recovery Point Objective (RPO) (time to be back live) and Recovery Time Objective (RTO) (amount of data loss) they want..... then tell them the budget they will need for it and hopefully get a big of budget for something realistic

We played with Starwind a few years back and also looked at other option. If you are only running 1Gb/s internal network then that would become a bottleneck especially if you were running synchronous (has to write to both storage before treating it as committed).

Could look at ZFS replication between nodes but comes back to RTO/RPO objectives.

ZFS replication works great but then one should also really have a dedicated network for data transfer. And one need to live with the asyncronity, meaning that you will always loose some data in case of an unplanned takeover event (planned maintenance is different of course), depending on the set replication schedule. The default is fifteen minutes, mean that the data of fifteen minutes will get lost. This can be extended to several hours or reduced to one minute but depending on the application this still might be too much. For SQL databases for example it might be worth a shot to setup a cluster constisting of two VMS (one for each ProxmoxVE node) which replicate their data between them via native database features.
Relevant doc:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pvesr
https://pve.proxmox.com/wiki/Storage_Replication

alexskysilk · Jan 12, 2025

mkultra said:
We have a very limited budget for about 3 rather small servers.

Since you are already looking to deploy in the cloud, why not just use VPS for your applications? VPS already offers you backing HA, snapshots, etc; Why bother doing that twice?

With regards to firewalls- use local firewalls for the instance (which should also be provided by the VPS), and a WAF for your applications. Not everything needs a heavy lift solution.

mkultra · Jan 12, 2025

alexskysilk said:
Since you are already looking to deploy in the cloud, why not just use VPS for your applications? VPS already offers you backing HA, snapshots, etc; Why bother doing that twice?

With regards to firewalls- use local firewalls for the instance (which should also be provided by the VPS), and a WAF for your applications. Not everything needs a heavy lift solution.

I made several propositions. One completely ditched virtualisation in favor of using public cloud style instances, managed databases and private networks. It would be far easier to administer, with HA and redundancy built in.

We do have some "specifc" requirements, such as IPsec, BGP and GRE.

Search

Search

Small Cloud Cluster design and strategy

mkultra

New Member

NuttyNat

New Member

mkultra

New Member

Johannes S

Renowned Member

_gabriel

Famous Member

mkultra

New Member

mkultra

New Member

Johannes S

Renowned Member

mkultra

New Member

Johannes S

Renowned Member

NuttyNat

New Member

Johannes S

Renowned Member

alexskysilk

Distinguished Member

mkultra

New Member

We value your privacy