Clustering

casalicomputers

Renowned Member
Mar 14, 2015
89
3
73
Hello everybody,
Currently, I've got a proxmox node with fiber channel SAN storage (I'm using LVM here) and NFS on a Synology NAS.

Since I've got another two smaller spare servers available, I would like to put a FC HBA card on them and create a 3-node cluster, but I don't want to keep these two nodes up all the time.
I'd wish to use them just when I need to perform maintenance on the primary node without any downtime, in particular:
  • Power on the additional nodes
  • Live migrate the VMs to those nodes
  • Perform maintenance on the primary node which may involve reboots/shutdown (eg. add ram)
  • Live migrate the VMs back to the primary node
  • Power down the additional nodes

I already read the Proxmox VE 2.0 Cluster wiki article, but I don't know if it applies to PVE 3.x as well.

Is there any drawbacks with such configuration?
Do I need to tweak stuff like "cluster votes" to keep the cluster running with just one node?

Thanks,
Michele
 
Last edited:
Proxmox VE Cluster is designed to run 24h. So yes, there are drawbacks and this setup is not recommended as you will never have quorum in three node cluster with two nodes turned off.

Behaviour in 2.x and 3.x are the same.
 
Proxmox VE Cluster is designed to run 24h. So yes, there are drawbacks and this setup is not recommended as you will never have quorum in three node cluster with two nodes turned off.

Behaviour in 2.x and 3.x are the same.


just spitballing here:

https://pve.proxmox.com/wiki/Manual:_pvecm


Proxmox1.MyCluster.local:
pvecm add <IP to ProxmoxQuorumProvider.MyCluster.local -votes 12


ProxmoxQuorumProvider.MyCluster.local on a VM on Proxmox1:
pvecm create MyClusterName -votes 14

26 Votes if Proxmox1 online


Proxmox2.MyCluster.local:
pvecm add <IP to ProxmoxQuorumProvider.MyCluster.local> -votes 12

12 Votes if Proxmox 2 online

Proxmox3.MyCluster.local:
pvecm add <ProxmoxQuorumProvider.MyCluster.local> -votes 12

12 Votes if Proxmox 3 online



Afaik Quorum in Promox works as follows:
a subcluster of size n in a cluster of m total nodes has quorum if it has a plurality of nodes in its partition. That is, it has n members where n > INT(m/2)


That would basically provide Quorum in the following cases:

Running Proxmox1 ONLY

If Proxmox1 is online and before you turn off Proxmox2 + 3, you migrate ProxmoxQuorumProvider to Proxmox1, then shutdown Proxmox 2 and 3 as needed.

Votes Expected: 50
Quorum at > 25 Votes
Actual Votes: 26
Quorum reached

Running Proxmox1, Proxmox2 and Proxmox3 at the same time

If you have Proxmox 2 and 3 running you migrateProxmoxQuorumProvider to the Node needing the least Cpu-Cycles and Ram for operations

Votes Expected: 50
Quorum at > 25 Votes
Actual Votes: 50
Quorum reached

Running Proxmox2 and Proxmox3 only

Before you turn off Proxmox1 you migrate ProxmoxQuorumProvider to Proxmox2 or Proxmox3. Then you shut down Proxmox1

Votes Expected: 50
Quorum at >25 Votes
Actual Votes: 48 Votes
Quorum reached


Running Proxmox1 and (Proxmox2 OR Proxmox3) at the same time

If you have Proxmox 2 and 3 running you migrateProxmoxQuorumProvider to the Node needing the least Cpu-Cycles and Ram for operations, then Shutdown the Proxmox Node you dont need (either 2 or 3).

Votes Expected: 50
Quorum at > 25 Votes
Actual Votes: 40
Quorum reached



lets hope i didn't make a math mistake here.



Edit:
With a second ProxmoxQuorumProvider2 and scaled up Votenumbers, you might even be able to automate the Migration of ProxmoxQuorumProvider1 using the Proxmox-HA-Provider.
I'm still toying with that on my 3 Node Proxmox4.x/Ceph Cluster, so no gurantees

edit2: I am pretty sure that a ProxmoxQuorumProvider can be run with minimal resource impact

 
Last edited:
Hi,
thanks for your reply.

So, basically, your idea is to use another proxmox VM inside the main node, acting as fake node to increase the votes count. Am I right?
What about using quorum disks instead?
 
Last edited:
Yes, thats my idea.
I tested it last night on my Virtualbox 3-Node Cluster. Worked Fine, went pretty smooth and had only minimal overhead in CPU/Ram usage



Never used Quorum disk before, so cant speak for that.


Just an fyi:

  • I used openvswitch network model

  • I created the cluster on the ProxmoxQuorumProvider-VM, then attached the Main Proxmox Node and the "Cold-Standby Nodes" to that ProxmoxQuorumProvider-VM. Reason: I have yet to figure out, how to Access the Webinterface from another Proxmox-node, if the Proxmox-node you created the Cluster on is down.
  • Migration of the Quorum Provider Vm from one Node to another took 9 seconds using a three node Ceph-Cluster (seperate from the Proxmoxnodes [15 OSD's]) as shared storage ( roughly 120 MB/s read speed). this is all running on the same Virtual Box, Sharing a single 2 TB HDD, so performance is likely better on your physical setup.
 
Last edited:
I have obtained the same result using qdisk, but with less effort :D

I installed proxmox on 3 VMs (the nodes) where I attached a small 1GB iSCSI disk from my NAS
Then, I created a partition on it and formatted as qdisk:
Code:
# mkqdisk -c /dev/sda1 -l qdisk01
Writing new quorum disk label 'qdisk01' to /dev/sda1.
WARNING: About to destroy all data on /dev/sda1; proceed [N/y] ? y
Warning: Initializing previously initialized partition
Initializing status block for node 1...
Initializing status block for node 2...
Initializing status block for node 3...
Initializing status block for node 4...
Initializing status block for node 5...
Initializing status block for node 6...
Initializing status block for node 7...
Initializing status block for node 8...
Initializing status block for node 9...
Initializing status block for node 10...
Initializing status block for node 11...
Initializing status block for node 12...
Initializing status block for node 13...
Initializing status block for node 14...
Initializing status block for node 15...
Initializing status block for node 16...

I edited /etc/pve/cluster.conf and added the following lines, and incremented file version:
Code:
<cluster>
 ......
 <quorumd votes="2" allow_kill="0" interval="1" label="qdisk01" tko="10"/>
 <totem token="54000"/>
 ......
</cluster>

At this point I made sure that each node got the same version of the cluster config file, and then restarted the services on each node one at time:
Code:
# service rgmanager stop
# service cman reload
Stopping cluster: 
   Stopping dlm_controld... [  OK  ]
   Stopping fenced... [  OK  ]
   Stopping qdiskd... [  OK  ]
   Stopping cman... [  OK  ]
   Waiting for corosync to shutdown:[  OK  ]
   Unloading kernel modules... [  OK  ]
   Unmounting configfs... [  OK  ]
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Starting qdiskd... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Tuning DLM kernel config... [  OK  ]
   Unfencing self... [  OK  ]

I had a few issues about not being able to write to /etc/pve, but nothing that cannot be solved with a reboot or just a restart of some services (usually pve-cluster, pvedaemon, pvestatd and pveproxy).

FYI, I set votes=2 on qdisk presence in cluster.conf.
In this way the expected votes will be always >= 3 (node=1, qdisk=2, expected=3) and even with only one active node.
Version: 6.2.0
Config Version: 5
Cluster Name: testcluster
Cluster Id: 31540
Cluster Member: Yes
Cluster Generation: 192
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Quorum device votes: 3
Total votes: 6
Node votes: 1
Quorum: 4
Active subsystems: 7
Flags:
Ports Bound: 0 178
Node name: pve-nodo3
Node ID: 3
Multicast addresses: 239.192.123.175
Node addresses: 192.168.120.3

Since I won't use any HA features (I don't need them) and perform any action manually, I believe there's no risk that one node tries to start a VM which is already running on an other node.
What can be the worst case?

In my real setup, i'm going to replace the iscsi disk with a lun from my SAN.
 
Last edited:
Have you tested what happens when you have the following conditions:

State11 said:
UP:proxmox1 + Quorum Disk
Down: Proxmox2 + Proxmox3

State2 said:
UP:proxmox2 + Proxmox3 + Quorum Disk
Down: Proxmox1


3 said:
UP:proxmox2 + Quorum Disk
Down:proxmox1 + Proxmox3 down



Can you still access the Web-interface to Migrate VMS's when you go from State1 to State2 and finally to e.g. State3 or is that only possible once Proxmox1 is available again ??
Basically the thing that would happen if e.g. the Mainboard on Proxmox1 gets fried and your replacement takes e.g. 48 hours and you'd need to use your cluster / VM's in the meanwhile.
 
Last edited:
Yes, I tested all the possible combinations and I've been able to migrate and create new vms all the times

UP: Proxmox1 + Quorum Disk
Down: Proxmox2 + Proxmox 3
Votes: 3/3 - Quorum reached

UP: Proxmox2 + Proxmox3 + Quorum Disk
Down: Proxmox2 + Proxmox 3
Votes: 5/3 - Quorum reached
 
Basically the thing that would happen if e.g. the Mainboard on Proxmox1 gets fried and your replacement takes e.g. 48 hours and you'd need to use your cluster / VM's in the meanwhile.


If the proxmox node1 suddenly dies (few months
after my other two nodes have been powered off) ... I won't have /etc/pve replicated .... that could be an issue ?
Maybe I could schedule an rsync ... ?
 
Last edited:
Yes, I tested all the possible combinations and I've been able to migrate and create new vms all the times

how do you access the proxmox Webinterface when Node 1 is offline/dies ?

e.g. my nodes are
Proxmox1 10.1.0.1
Proxmox2 10.1.0.2
Proxmox3 10.1.0.3
Proxmox4 10.1.0.4 <-- created the Cluster on

If Proxmox4 is up, i can only access the Webinterface via 10.1.0.4:8006,
If Proxmox4 is down, 10.1.0.4:8006 is obviously not available, but 10.1.0.1:8006, 10.1.0.2:8006, 10.1.0.3:8006 are also not available. They were useable before adding the nodes into the Cluster tho.
 
Sincerely, I had no issues when I tried to access to the web interface on the other nodes when the primary (where I created the cluster) was down.

Selezione_038.png
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!