HCI Cluster HA, Scalable, multiple Questions

pille99

Active Member
Sep 14, 2022
362
28
33
hello together
have been quite a while that i used proxmox. Now i need to setup a new Datacenter and have some questions.
goal:
Firewall Cluster = datacenter 1 for OPNsense (3 nodes, HA) ----- connected to VMs on Datacenter 2
VM Cluster = datacenter 2 for VMs (up to 10 Nodes, starting with 5, HA, Scalable) Connected to VMs on datacenter 1 and Datacenter 3
Bacup Cluster = datacenter 3 Backup Server, Up to 3 Nodes, maybe 5 if more storage is needed, HA, Scalable) connected to VMs on Datacenter 2

1. is the issue sorted with internal communication from px server to another px Server (same datacenter) ?

2. is there finally a backup software to backup servers on VM level storage level) and in the OS itself, restore whole VM and /or single files should be possible.

3. the last Cluster i had was Ceph (OSD) but without CepfFS in top of. whats the advantages/disadvantages with CephFS in Top of a ceph Cluster ?
as i understand OSD Pool and CephFS is not the same. i read something about selfhealing but that feature is inherit from ceph itself. i dont see any advantages to add a cephfs in top of ceph. please enlighten me

4. each server gets 6 Nics from datacenter 1 (1x1GB for Proxmox management and OS, 100 gb VM traffic extern, 100 gb vm traffic intern 10 gb live migration, 1 gb corosync, 10 gb ceph)
each server gets 5 Nics from datacenter 2 (1x1GB for Proxmox management and OS, 100 gb VM traffic, 100 gb live migration, 1 gb corosync, 100 gb ceph)
each server gets 5 Nics from datacenter 3 (1x1GB for Proxmox management and OS, 100 gb VM traffic, 100 gb live migration, 1 gb corosync, 100 gb ceph)
anything wrong with that ?

5. i have seen the datacenter management. very nice job. when the finally release will be published ?

6. on datacenter 2 will be round about 50 small networks (and config). i would love to use the Proxmox firewall option to each Server (for instance: VM, Database server accepts only connection from VMx on port y). is that possible ? even if the VM moves to another Server in the Datacenter 2 ? i would love to have only proxmox firewall running to limit each subnet but i think its not possible. opnsense has a lot of options which are needed, especially monitoring and alarming, that the proxmox firewall can replace opnsense. planed is that each network gets an opnsense as gateway, which manage and manipulate the traffic. something wrong with that ?

7. is there any professional help for the setup if needed, its not an issue if if costs something ?

its everything for the moment. look forward for your input. thx guys and nice sunday
 
1. is the issue sorted with internal communication from px server to another px Server (same datacenter) ?
By "px" you mean "pve"? Which issue exactly are you refering to?

2. is there finally a backup software to backup servers on VM level storage level) and in the OS itself, restore whole VM and /or single files should be possible.
With PBS you backup the VM and can recover both the whole VM and files/directories inside the VM's OS, though you can't recover files/directories directly inside the VM: you have to download to an admin PC and upload them back to the VM by other means.

3. the last Cluster i had was Ceph (OSD) but without CepfFS in top of. whats the advantages/disadvantages with CephFS in Top of a ceph Cluster ?
as i understand OSD Pool and CephFS is not the same. i read something about selfhealing but that feature is inherit from ceph itself. i dont see any advantages to add a cephfs in top of ceph. please enlighten me
CephFS in a PVE cluster has limited usefulness as VMs and LXC disks are stored in pools. CephFS is useful i.e. to store snippets o ISO files that will be available cluster wide. Think of it as an NFS but with replicas and without an SPOF.

4. each server gets 6 Nics from datacenter 1 (1x1GB for Proxmox management and OS, 100 gb VM traffic extern, 100 gb vm traffic intern 10 gb live migration, 1 gb corosync, 10 gb ceph)
each server gets 5 Nics from datacenter 2 (1x1GB for Proxmox management and OS, 100 gb VM traffic, 100 gb live migration, 1 gb corosync, 100 gb ceph)
each server gets 5 Nics from datacenter 3 (1x1GB for Proxmox management and OS, 100 gb VM traffic, 100 gb live migration, 1 gb corosync, 100 gb ceph)
anything wrong with that ?
Two critical issues here:
  • You missing at least one more redundant corosync link, which given your main one is a dedicated nic, could be a vlan running on any of the "VM traffic" nics or even both.
  • There's no redundacy at nic level, which means that each single nic for each type of traffic is connected to a single switch. That switch is an SPOF and if it breaks/needs maintenance/whatever will take down the whole cluster, no matter how many nodes and redundacy you set at other levels.

5. i have seen the datacenter management. very nice job. when the finally release will be published ?
Have no info about yet

6. on datacenter 2 will be round about 50 small networks (and config). i would love to use the Proxmox firewall option to each Server (for instance: VM, Database server accepts only connection from VMx on port y). is that possible ? even if the VM moves to another Server in the Datacenter 2 ? i would love to have only proxmox firewall running to limit each subnet but i think its not possible. opnsense has a lot of options which are needed, especially monitoring and alarming, that the proxmox firewall can replace opnsense. planed is that each network gets an opnsense as gateway, which manage and manipulate the traffic. something wrong with that ?
Just remember that the OPNsense ones can only limit traffic between L2 segments (i.e. vlans/networks) and that PVE one can add protection between VMs of the same L2 segment, so they can complement each other at the cost of having to deal with two firewalls.

7. is there any professional help for the setup if needed, its not an issue if if costs something ?
There is a full list of partners that can help you out at any step of the desing, deployment and operation [1].

Regards,


[1] https://www.proxmox.com/en/partners/find-partner/explore
 
  • Like
Reactions: Johannes S
By "px" you mean "pve"? Which issue exactly are you refering to?
yes, i mean pve (the last time i used it was so long ago that i even didnt remeber the "correct" shortform. The last time i remember the cluster wide communication from VMs hosted on different PVE could not communicate out of box to each other. as i remember i needed to configure an sdn for cluster wide communication. is that still the same ?
With PBS you backup the VM and can recover both the whole VM and files/directories inside the VM's OS, though you can't recover files/directories directly inside the VM: you have to download to an admin PC and upload them back to the VM by other means.
than PBS is not an option for me. a restore of files needs to go without any hussle and the user should be able to do it themself (for example with bacup exec its a scripting option). to restore the whole vm and extract the files is not very practically. as i read only bacula can backup from ceph and from inside the vm, but as i understand there is no version supported for server 2025. can you recommend some other solution ?

i think the PBS is a great add on and i may go with 2 solutions: PBS for ceph backup and backup exec for inside the VM.

CephFS in a PVE cluster has limited usefulness as VMs and LXC disks are stored in pools. CephFS is useful i.e. to store snippets o ISO files that will be available cluster wide. Think of it as an NFS but with replicas and without an SPOF.
so, cephfs is nothing what is needed. thx for info
Two critical issues here:
  • You missing at least one more redundant corosync link, which given your main one is a dedicated nic, could be a vlan running on any of the "VM traffic" nics or even both.
  • There's no redundacy at nic level, which means that each single nic for each type of traffic is connected to a single switch. That switch is an SPOF and if it breaks/needs maintenance/whatever will take down the whole cluster, no matter how many nodes and redundacy you set at other levels.
to keep the environment redundant, is that enough to add only a second corosync uplin on a seperate switch ?

i understand the SPOF as the switch, but in 20 years i have not seen one switch to break down. but i keep it in mind. thx
Have no info about yet


Just remember that the OPNsense ones can only limit traffic between L2 segments (i.e. vlans/networks) and that PVE one can add protection between VMs of the same L2 segment, so they can complement each other at the cost of having to deal with two firewalls.
i am aware of the "cost". but for an additional security layer i think is ok. for the firewalls on proxmox itself (VM level) i need some help but some partners are on the list which i think can do such job
There is a full list of partners that can help you out at any step of the desing, deployment and operation [1].

Regards,


[1] https://www.proxmox.com/en/partners/find-partner/explore
thx for your answer so far
 
The last time i remember the cluster wide communication from VMs hosted on different PVE could not communicate out of box to each other. as i remember i needed to configure an sdn for cluster wide communication. is that still the same ?
Don't know what you are referring to. You simply get the hosts to a set of switches, configure L2 connectivity (bonds, vlans, trunks, etc) and configure a bridge in each PVE connected to the appropriate hosts nic(s). VMs will be able to communicate with each other even on different hosts. You can use SDN for that, but you don't have to use SDN for that to work.

i think the PBS is a great add on and i may go with 2 solutions: PBS for ceph backup and backup exec for inside the VM.
PBS is infrastructure level backup (whole VMs), backup exec (or Veeam or many other backup solutions) are data level backups. Each fullfills different roles and can be used together.

to keep the environment redundant, is that enough to add only a second corosync uplin on a seperate switch ?
No, that will provide redundacy at corosync level only

i understand the SPOF as the switch, but in 20 years i have not seen one switch to break down. but i keep it in mind. thx
I've seen some switch breaking, but that's not the only situation that will happen: cables get lose, gbics die, a switch port can get an ESD and die, a power socket stops working, a maintenance to apply firmware update to the switch has to be done, etc, etc. Having a cluster with just one switch is non sense.
 
  • Like
Reactions: Johannes S