Best practic cluster proxmox

pelip · Sep 24, 2025

Hello, colleagues.I'm building a 3-node cluster. Each node has 2 10G ports and 2 1G ports.I assume the 10G ports will be used for replication.The diagram looks like this.However, with this configuration, the cluster only ran for 5 minutes, and then nodes 2 and 3 became unavailable. What am I doing wrong?

bbgeek17 · Sep 24, 2025

Hi @pelip , welcome to the forum.

To answer your question directly : it is impossible for anyone to say as you provided insufficient data for analyses.

Note that 10G is an overkill for Cluster communication. Are you running Ceph as well?

Beyond this, you need to carefully analyze the logs of the servers: journalctl -b0

PS looking a bit closer at you diagram - you have no predictability of ARP/MAC advertisement with your current config. The packets could potentially be sent via wrong port and not reach the intended target. You should put a switch on the back and reduce the complexity.

Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

pelip · Sep 24, 2025

By cluster communication, I mean two situations:
1. Backup
2. If one node fails, quorum must be enabled in the cluster, and the virtual machines will start on another node.

bbgeek17 · Sep 24, 2025

pelip said:
1. Backup

There is no "backup" traffic in basic PVE cluster. It is something you can add with an external node, ie PBS. Do you mean ZFS replication?

pelip said:
2. If one node fails, quorum must be enabled in the cluster, and the virtual machines will start on another node.

If you do not have shared storage, then the VMs cannot failover when node fails. Are you planning to use ZFS replication?

Your bridge config is wrong. Linux will pick one (active) port and you will have asymmetric routing and broken cluster communication.
Place a basic L2 switch on the back and it should solve some of your issues.

Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

pelip · Sep 24, 2025

Yes, ZFS replication.
How would you assemble this circuit to make it reliable?

bbgeek17 · Sep 24, 2025

pelip said:
How would you assemble this circuit to make it reliable?

You should place a switch on the back-end network.

Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

bbgeek17 · Sep 24, 2025

Also, note that 192.90 is not a private IP space

https://datatracker.ietf.org/doc/html/rfc1918#:~:text=3. Private Address Space

Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

LnxBil · Sep 25, 2025

pelip said:
Yes, ZFS replication.
How would you assemble this circuit to make it reliable?

With ZFS? You cannot. PVE HA is not built do failover on a local storage failure, so if you local ZFS fails, the VMs will not be automatically started on the other nodes. You will also have dataloss in the HA case of a node failure and the following failover to another node. Please reconsider building a proper cluster with a cluster filesystem. You will not have fun with this setup and your experience will PVE will not be good and it's not PVEs fault.

bbgeek17 said:
You should place a switch on the back-end network.

That'll work or you can switch to full-mesh-routing (FRR) like in the 3-node CEPH setup in order to have a proper multi-link-ring.

Search

Search

Best practic cluster proxmox

pelip

New Member

bbgeek17

Distinguished Member

pelip

New Member

bbgeek17

Distinguished Member

pelip

New Member

bbgeek17

Distinguished Member

bbgeek17

Distinguished Member

LnxBil

Distinguished Member

We value your privacy