HCI Cluster HA, Scalable, multiple Questions

pille99 · Jul 13, 2025

hello together
have been quite a while that i used proxmox. Now i need to setup a new Datacenter and have some questions.
goal:
Firewall Cluster = datacenter 1 for OPNsense (3 nodes, HA) ----- connected to VMs on Datacenter 2
VM Cluster = datacenter 2 for VMs (up to 10 Nodes, starting with 5, HA, Scalable) Connected to VMs on datacenter 1 and Datacenter 3
Bacup Cluster = datacenter 3 Backup Server, Up to 3 Nodes, maybe 5 if more storage is needed, HA, Scalable) connected to VMs on Datacenter 2

1. is the issue sorted with internal communication from px server to another px Server (same datacenter) ?

2. is there finally a backup software to backup servers on VM level storage level) and in the OS itself, restore whole VM and /or single files should be possible.

3. the last Cluster i had was Ceph (OSD) but without CepfFS in top of. whats the advantages/disadvantages with CephFS in Top of a ceph Cluster ?
as i understand OSD Pool and CephFS is not the same. i read something about selfhealing but that feature is inherit from ceph itself. i dont see any advantages to add a cephfs in top of ceph. please enlighten me

4. each server gets 6 Nics from datacenter 1 (1x1GB for Proxmox management and OS, 100 gb VM traffic extern, 100 gb vm traffic intern 10 gb live migration, 1 gb corosync, 10 gb ceph)
each server gets 5 Nics from datacenter 2 (1x1GB for Proxmox management and OS, 100 gb VM traffic, 100 gb live migration, 1 gb corosync, 100 gb ceph)
each server gets 5 Nics from datacenter 3 (1x1GB for Proxmox management and OS, 100 gb VM traffic, 100 gb live migration, 1 gb corosync, 100 gb ceph)
anything wrong with that ?

5. i have seen the datacenter management. very nice job. when the finally release will be published ?

6. on datacenter 2 will be round about 50 small networks (and config). i would love to use the Proxmox firewall option to each Server (for instance: VM, Database server accepts only connection from VMx on port y). is that possible ? even if the VM moves to another Server in the Datacenter 2 ? i would love to have only proxmox firewall running to limit each subnet but i think its not possible. opnsense has a lot of options which are needed, especially monitoring and alarming, that the proxmox firewall can replace opnsense. planed is that each network gets an opnsense as gateway, which manage and manipulate the traffic. something wrong with that ?

7. is there any professional help for the setup if needed, its not an issue if if costs something ?

its everything for the moment. look forward for your input. thx guys and nice sunday

VictorSTS · Jul 13, 2025

pille99 said:
1. is the issue sorted with internal communication from px server to another px Server (same datacenter) ?

By "px" you mean "pve"? Which issue exactly are you refering to?

pille99 said:
2. is there finally a backup software to backup servers on VM level storage level) and in the OS itself, restore whole VM and /or single files should be possible.

With PBS you backup the VM and can recover both the whole VM and files/directories inside the VM's OS, though you can't recover files/directories directly inside the VM: you have to download to an admin PC and upload them back to the VM by other means.

pille99 said:
3. the last Cluster i had was Ceph (OSD) but without CepfFS in top of. whats the advantages/disadvantages with CephFS in Top of a ceph Cluster ?
as i understand OSD Pool and CephFS is not the same. i read something about selfhealing but that feature is inherit from ceph itself. i dont see any advantages to add a cephfs in top of ceph. please enlighten me

CephFS in a PVE cluster has limited usefulness as VMs and LXC disks are stored in pools. CephFS is useful i.e. to store snippets o ISO files that will be available cluster wide. Think of it as an NFS but with replicas and without an SPOF.

pille99 said:
4. each server gets 6 Nics from datacenter 1 (1x1GB for Proxmox management and OS, 100 gb VM traffic extern, 100 gb vm traffic intern 10 gb live migration, 1 gb corosync, 10 gb ceph)
each server gets 5 Nics from datacenter 2 (1x1GB for Proxmox management and OS, 100 gb VM traffic, 100 gb live migration, 1 gb corosync, 100 gb ceph)
each server gets 5 Nics from datacenter 3 (1x1GB for Proxmox management and OS, 100 gb VM traffic, 100 gb live migration, 1 gb corosync, 100 gb ceph)
anything wrong with that ?

Two critical issues here:

You missing at least one more redundant corosync link, which given your main one is a dedicated nic, could be a vlan running on any of the "VM traffic" nics or even both.
There's no redundacy at nic level, which means that each single nic for each type of traffic is connected to a single switch. That switch is an SPOF and if it breaks/needs maintenance/whatever will take down the whole cluster, no matter how many nodes and redundacy you set at other levels.

pille99 said:
5. i have seen the datacenter management. very nice job. when the finally release will be published ?

Have no info about yet

pille99 said:
6. on datacenter 2 will be round about 50 small networks (and config). i would love to use the Proxmox firewall option to each Server (for instance: VM, Database server accepts only connection from VMx on port y). is that possible ? even if the VM moves to another Server in the Datacenter 2 ? i would love to have only proxmox firewall running to limit each subnet but i think its not possible. opnsense has a lot of options which are needed, especially monitoring and alarming, that the proxmox firewall can replace opnsense. planed is that each network gets an opnsense as gateway, which manage and manipulate the traffic. something wrong with that ?

Just remember that the OPNsense ones can only limit traffic between L2 segments (i.e. vlans/networks) and that PVE one can add protection between VMs of the same L2 segment, so they can complement each other at the cost of having to deal with two firewalls.

pille99 said:
7. is there any professional help for the setup if needed, its not an issue if if costs something ?

There is a full list of partners that can help you out at any step of the desing, deployment and operation [1].

Regards,

[1] https://www.proxmox.com/en/partners/find-partner/explore

pille99 · Jul 17, 2025

VictorSTS said:
By "px" you mean "pve"? Which issue exactly are you refering to?

yes, i mean pve (the last time i used it was so long ago that i even didnt remeber the "correct" shortform. The last time i remember the cluster wide communication from VMs hosted on different PVE could not communicate out of box to each other. as i remember i needed to configure an sdn for cluster wide communication. is that still the same ?

VictorSTS said:
With PBS you backup the VM and can recover both the whole VM and files/directories inside the VM's OS, though you can't recover files/directories directly inside the VM: you have to download to an admin PC and upload them back to the VM by other means.

than PBS is not an option for me. a restore of files needs to go without any hussle and the user should be able to do it themself (for example with bacup exec its a scripting option). to restore the whole vm and extract the files is not very practically. as i read only bacula can backup from ceph and from inside the vm, but as i understand there is no version supported for server 2025. can you recommend some other solution ?

i think the PBS is a great add on and i may go with 2 solutions: PBS for ceph backup and backup exec for inside the VM.

VictorSTS said:
CephFS in a PVE cluster has limited usefulness as VMs and LXC disks are stored in pools. CephFS is useful i.e. to store snippets o ISO files that will be available cluster wide. Think of it as an NFS but with replicas and without an SPOF.

so, cephfs is nothing what is needed. thx for info

VictorSTS said:
Two critical issues here:

You missing at least one more redundant corosync link, which given your main one is a dedicated nic, could be a vlan running on any of the "VM traffic" nics or even both.

There's no redundacy at nic level, which means that each single nic for each type of traffic is connected to a single switch. That switch is an SPOF and if it breaks/needs maintenance/whatever will take down the whole cluster, no matter how many nodes and redundacy you set at other levels.

to keep the environment redundant, is that enough to add only a second corosync uplin on a seperate switch ?

i understand the SPOF as the switch, but in 20 years i have not seen one switch to break down. but i keep it in mind. thx

VictorSTS said:
Have no info about yet

Just remember that the OPNsense ones can only limit traffic between L2 segments (i.e. vlans/networks) and that PVE one can add protection between VMs of the same L2 segment, so they can complement each other at the cost of having to deal with two firewalls.

i am aware of the "cost". but for an additional security layer i think is ok. for the firewalls on proxmox itself (VM level) i need some help but some partners are on the list which i think can do such job

VictorSTS said:
There is a full list of partners that can help you out at any step of the desing, deployment and operation [1].

Regards,

[1] https://www.proxmox.com/en/partners/find-partner/explore

thx for your answer so far

VictorSTS · Jul 18, 2025

pille99 said:
The last time i remember the cluster wide communication from VMs hosted on different PVE could not communicate out of box to each other. as i remember i needed to configure an sdn for cluster wide communication. is that still the same ?

Don't know what you are referring to. You simply get the hosts to a set of switches, configure L2 connectivity (bonds, vlans, trunks, etc) and configure a bridge in each PVE connected to the appropriate hosts nic(s). VMs will be able to communicate with each other even on different hosts. You can use SDN for that, but you don't have to use SDN for that to work.

pille99 said:
i think the PBS is a great add on and i may go with 2 solutions: PBS for ceph backup and backup exec for inside the VM.

PBS is infrastructure level backup (whole VMs), backup exec (or Veeam or many other backup solutions) are data level backups. Each fullfills different roles and can be used together.

pille99 said:
to keep the environment redundant, is that enough to add only a second corosync uplin on a seperate switch ?

No, that will provide redundacy at corosync level only

pille99 said:
i understand the SPOF as the switch, but in 20 years i have not seen one switch to break down. but i keep it in mind. thx

I've seen some switch breaking, but that's not the only situation that will happen: cables get lose, gbics die, a switch port can get an ESD and die, a power socket stops working, a maintenance to apply firmware update to the switch has to be done, etc, etc. Having a cluster with just one switch is non sense.

Search

Search

HCI Cluster HA, Scalable, multiple Questions

pille99

Active Member

VictorSTS

Distinguished Member

pille99

Active Member

VictorSTS

Distinguished Member

We value your privacy