Single or multiple cluster

kameleon

New Member
Oct 19, 2022
5
1
3
Mississippi, USA
Currently we have 6 servers with identical hardware. Currently only 3 are for PVE PoC and the other are production on vmware. Once we finish PoC on PVE and are ready to migrate I want to have 3 of these servers moved to an off site colocation so we have some redundancy. We currently have 4x 10g links to each server but only a single 10g link to the colocation. I know I can put all 6 servers in a cluster but it is the link between our location and the colocation I worry about. With that in mind, would it be better to setup my servers in one cluster of 6 and just have 3 here and 3 there or do two separate clusters and only use the backup/replication to get the servers to the remote site?
 
If you are worried about the link, then you should not split your servers as a single cluster. Doubly so with an equal amount of nodes in each.
If/when the link goes down you will have an equal amount of nodes on each side, meaning there is no majority on either side.

You could split it 4/2, hoping that you will not have a double-failure. I.e. failure of the inter-site link and one of the 4 nodes at the same time.

What advantage do you foresee in having a 6-node cluster vs 3&3? Are you planning on moving the VMs often between sites? Do you have appropriate storage backing your infrastructure?



Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
If you are worried about the link, then you should not split your servers as a single cluster. Doubly so with an equal amount of nodes in each.
If/when the link goes down you will have an equal amount of nodes on each side, meaning there is no majority on either side.

You could split it 4/2, hoping that you will not have a double-failure. I.e. failure of the inter-site link and one of the 4 nodes at the same time.

What advantage do you foresee in having a 6-node cluster vs 3&3? Are you planning on moving the VMs often between sites? Do you have appropriate storage backing your infrastructure?



Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
The reason I would like a 6 node cluster is so that we can move workloads around easier. As soon as we implement the dual data center setup I will want to move the entire workload over to the offsite so I can do some cable cleanup and management on our local side. Then move the stuff back once done. Beyond that, there is a few times a year when our building loses water so our chillers do not work and it would be advantageous to migrate fully to the remote location until cooling returns.

As for storage, no host has any storage to speak of. Simply a 500GB nvme to install base OS on. We use multiple storage arrays. On the primary side we have a Pure storage array (iSCSI and NFS) and a Nimble array (iSCSI only) . For the Secondary we will have 2x Nimble arrays (iSCSI only). I am trying to figure out the best way to connect the storage to the Nimbles currently but the Pure is connected via NFS to the 3 test machines so I can test snapshotting and such (qcow2 images). I may use a small linux vm per host to bridge the iSCSI/NFS gap though. Ideally I want thin provisioning and snapshots so I'll need NFS or the ZFS over iSCSI.

I am just trying to get all my ducks in a row before we do anything. The last thing I want to do is set everything up and have to redo it because I should have done XX instead. :)
 
  • Like
Reactions: floh8
Speaking of storage: A VM's virtual disk (image) is located on one particular storage array at any given time. Imagine your VM1 on Host1 has storage on SAN1 in Site1. When you migrate the VM to Site2 Host4, the storage is still in Site1. It will be accessed over that single link you have. That is assuming that SAN1 is fully available to all hosts on both sites.

You can treat SAN storage as "local", in which case a controlled failover will involve a full data copy from SAN1 to SAN2 over the single link.

You can create a "metro-cluster" on your Pure or Nimble to address storage locality. Whether the single link is sufficient - your vendor would need to advise you.
You can keep NFS storage in Site1, available for all Nodes/VMs. Again, your cross-site link becomes a single point of failure.

I don't see how running a VM on each node for iSCSI/ZFS translation will help.

Sounds like you have some serious hardware involved and the best route may be to sit down with Proxmox Partner/Architect for a bulletproof design.

Good luck!


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The reason I would like a 6 node cluster is so that we can move workloads around easier. As soon as we implement the dual data center setup I will want to move the entire workload over to the offsite so I can do some cable cleanup and management on our local side. Then move the stuff back once done. Beyond that, there is a few times a year when our building loses water so our chillers do not work and it would be advantageous to migrate fully to the remote location until cooling returns.

As for storage, no host has any storage to speak of. Simply a 500GB nvme to install base OS on. We use multiple storage arrays. On the primary side we have a Pure storage array (iSCSI and NFS) and a Nimble array (iSCSI only) . For the Secondary we will have 2x Nimble arrays (iSCSI only). I am trying to figure out the best way to connect the storage to the Nimbles currently but the Pure is connected via NFS to the 3 test machines so I can test snapshotting and such (qcow2 images). I may use a small linux vm per host to bridge the iSCSI/NFS gap though. Ideally I want thin provisioning and snapshots so I'll need NFS or the ZFS over iSCSI.

I am just trying to get all my ducks in a row before we do anything. The last thing I want to do is set everything up and have to redo it because I should have done XX instead. :)
Why do you need thin provisioning on the PVE? Pure and Nimble do thin provisioning in the storage pool anyway.

The one 10GBit link between the sites will be the bottleneck and all storage vendors always recommend redundant links. Therefore, a metro cluster will not be supported.

If the storage for the second data center has not yet been procured, it would be better to invest the money in local disks for the hosts and work with ZFS replication.
This allows you to set the replication interval to 1 minute and live migrations are also possible with ZFS replicas.

You can of course also use asynchronous replication of the storages, but this increases the complexity enormously.

I have customers with stretched storage configurations, but always with transparent failover and redundant links. At least 2x 25 / 40 / 100 GBit and often an additional 2x 10 GBit+ for VM traffic.
 
Currently we have 6 servers with identical hardware. Currently only 3 are for PVE PoC and the other are production on vmware. Once we finish PoC on PVE and are ready to migrate I want to have 3 of these servers moved to an off site colocation so we have some redundancy. We currently have 4x 10g links to each server but only a single 10g link to the colocation. I know I can put all 6 servers in a cluster but it is the link between our location and the colocation I worry about. With that in mind, would it be better to setup my servers in one cluster of 6 and just have 3 here and 3 there or do two separate clusters and only use the backup/replication to get the servers to the remote site?

Without proper redudancy on that link, you will bring yourself more headache than benefit. See e.g. https://forum.proxmox.com/threads/6-node-ha-cluster-split-brain.152081/#post-689280
 
Why do you need thin provisioning on the PVE? Pure and Nimble do thin provisioning in the storage pool anyway.

The one 10GBit link between the sites will be the bottleneck and all storage vendors always recommend redundant links. Therefore, a metro cluster will not be supported.

If the storage for the second data center has not yet been procured, it would be better to invest the money in local disks for the hosts and work with ZFS replication.
This allows you to set the replication interval to 1 minute and live migrations are also possible with ZFS replicas.

You can of course also use asynchronous replication of the storages, but this increases the complexity enormously.

I have customers with stretched storage configurations, but always with transparent failover and redundant links. At least 2x 25 / 40 / 100 GBit and often an additional 2x 10 GBit+ for VM traffic.
It is not so much the thin provisioning as it is the snapshot feature. I absolutely can not have a VM stunned for hours on end while a backup is being made. On my personal PVE servers with local storage the backups run much smoother with snapshot enabled only stunning the vm momentarily where when I run backup without snapshot on it freezes it until the backup is complete. In this environment I can not have that happen. Am I missing something for backups?

I am aware I need multiple links but I am being bottle necked by our governing body that provides the links. That is why I am asking the hive mind to make sure I am not missing something.

Storage, unfortunately, will have to be Pure arrays going forward. I love my Nimbles, bought them pre-HP acquisition, and they have been bulletproof and just work. Internal storage has gone the way of the Dodo here.

I could find a way to do the ZFS over iSCSI and have benefits of both. I am just not sure if any of our current arrays support that. If any do, the Pure probably would be the best bet as they are pretty configurable.
 
...as it is the snapshot feature. I absolutely can not have a VM stunned for hours on end while a backup is being made.

A Backup of a PVE VM is always made "in the background". The VM has not to stop for this. Even in "Stop mode" it is just shutdown, a KVM/Qemu-Snapshot is created and then VM is started again, immediately.

Note that this is NOT a "storage-snapshot" but an KVM/Qemu-internal feature which is completely file-system agnostic.

Best regards
 
I could find a way to do the ZFS over iSCSI and have benefits of both. I am just not sure if any of our current arrays support that. If any do, the Pure probably would be the best bet as they are pretty configurable.
You can be confident that none of your current arrays supports ZFS/iSCSI scheme natively.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: Falk R.

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!