Really weird proxmox issue

nitaish

Well-Known Member
Feb 1, 2014
53
4
48
Mumbai
www.techknowlogy.in
We have a number of 2-node clusters connected to a SAN storage. Off late we have started facing a really weird issue. The L2 switch on which all the servers are connected suddenly reboots and at the same time one of the nodes in any of the clusters reboot. Every time there is a different node that reboots along with the switch. I have tested both the switch as well as the power supply to the Rack and everything seems to be normal. Also, there is no log found in the node that reboots. We are unable to find out what is causing the node and the switch to reboot together. Also, we tried to reboot the switch manually once and all the nodes got rebooted which is beyond our understanding. One thing common is that the node that is the primary node in the cluster reboots and the secondary node does not reboot when the switch reboots automatically. What could be the reason for this weird behavior?
 
What you are seeing is 100% expected imo. This is a quorum issue. When the switch reboots, the nodes loose cluster connectivity and think something is wrong, so one of the two get fenced. This is one of the very reasons two node clusters are a bad idea (They arn't even supported at all). It's also the reason why the cluster network should be redundant.

Proxmox is behaving as expected, gotta figure out why that switch is rebooting.
 
Hi,
Thanks for your reply. I also need to know if we can share single node among multiple clusters in HA setup.Basically we are having multiple clusters. If I am keeping server A, B and C in same cluster 1, can I keep server C in cluster 2, 3 etc.
 
To be precise and make sure we are on the same page, let me explain you my question in detail.I have to create 200 Clusters for hosting my applications. Keeping in mind that Proxmox wants at least 3 nodes in each cluster, shall I add 600 physical nodes in all? Or can I have 200 clusters of 2 nodes each and have a few extra nodes which I will share with all the clusters. So in that case if Cluster A has nodes A1 and A2 and Cluster B has nodes B1 and B2 and I have a single node C1, can I add C1 as the third node in both the clusters?I hope I have explained the requirement. Is it possible to have such a setup?
 
To be precise and make sure we are on the same page, let me explain you my question in detail.I have to create 200 Clusters for hosting my applications. Keeping in mind that Proxmox wants at least 3 nodes in each cluster, shall I add 600 physical nodes in all? Or can I have 200 clusters of 2 nodes each and have a few extra nodes which I will share with all the clusters. So in that case if Cluster A has nodes A1 and A2 and Cluster B has nodes B1 and B2 and I have a single node C1, can I add C1 as the third node in both the clusters?I hope I have explained the requirement. Is it possible to have such a setup?

No, this isn't possible. Like Udo wrote: a node can only be member of one cluster. But offcourse you can make bigger clusters than only 3 nodes (this is only a minimal requirement). Always keep an odd number of nodes in a cluster (3, 5, 7, 9, 11 etc).
 
To be precise and make sure we are on the same page, let me explain you my question in detail.I have to create 200 Clusters for hosting my applications. Keeping in mind that Proxmox wants at least 3 nodes in each cluster, shall I add 600 physical nodes in all? Or can I have 200 clusters of 2 nodes each and have a few extra nodes which I will share with all the clusters. So in that case if Cluster A has nodes A1 and A2 and Cluster B has nodes B1 and B2 and I have a single node C1, can I add C1 as the third node in both the clusters?I hope I have explained the requirement. Is it possible to have such a setup?

I agree with the post above. You should be looking to create a cluster with more nodes.
 
But doesn't it look like it is commercially not viable. Imagine each of my node has 128 GB RAM and in each cluster I add 3 nodes. I get to use only 128 GB RAM from 3 nodes. So, when I multiply by 200 clusters, the RAM available with me is 25 TB against all nodes with a total of 75 TB RAM. In that case why would anyone use Proxmox instead of more illustrious cloud platforms?
 
But doesn't it look like it is commercially not viable. Imagine each of my node has 128 GB RAM and in each cluster I add 3 nodes. I get to use only 128 GB RAM from 3 nodes. So, when I multiply by 200 clusters, the RAM available with me is 25 TB against all nodes with a total of 75 TB RAM. In that case why would anyone use Proxmox instead of more illustrious cloud platforms?

Bare with my im not quite understanding but I will try.

If you have 3 nodes with 128GB each, that would be 384G for use over the three nodes. Are you saying you will only be running 1 single VM on this cluster? If you are running multiple VM's you can have them spread over the 3 nodes, or however many nodes you want in a single cluster.
 
So, you mean to say I can have all nodes in a single cluster and spread my VMs in all the nodes? In that case, how will the failover work? From what I have understood so far from the Proxmox documentation is that one of the nodes will be the primary node and the other nodes will have no VM and in case the primary node shuts down, the VMs will be moved to the secondary node.
 
So, you mean to say I can have all nodes in a single cluster and spread my VMs in all the nodes? In that case, how will the failover work? From what I have understood so far from the Proxmox documentation is that one of the nodes will be the primary node and the other nodes will have no VM and in case the primary node shuts down, the VMs will be moved to the secondary node.

One node is the "Master" but that has nothing to do with being able to run VM's on all the nodes. If its a 3 node cluster and 1 node fails the VM's will get started on the remaining nodes. You can take it a step further and setup HA groups to ensure VM's fail over to specific nodes, or get moved back to the original node once it comes back online.

Keep in mind the above is based on the idea that all 3 nodes have shared storage of some sort.

Check out the HA groups section.

https://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#HA_Groups
 
The Proxmox documentation is not clear. How does the HA work and how much resource I get if I add 10 nodes in a single cluster?

HA works by moving the VM from a deemed failed node to a deemed good node. You can setup groups to ensure VM's are spread evenly or return back to their orginal host once it comes online. Your resources are dependant on the nodes you use to setup the cluster. You can run VM's on all nodes within the cluster.
 
So probably I can create 10 node cluster and then within that cluster can create HA GROUP 1, HA GROUP 2 etc. If I keep 5 nodes in HA group 1 and 5 nodes in HA group 2, then If any one node within that HA fails then all VMs of that node will be distributed to other 4 nodes.
If my understanding is correct then please clear following doubts:

1. Once node fails, the distribution policy can be set? Or it happens randomly. I know we can set priority. But I want them to be distributed in remaining 4 nodes equally. What I mean to say : suppse HA group 1 has 5 nodes: node 1, 2 ,3 etc. each node has 12 VMs. If node 2 fails then 12 VMs of node 2 should get distributed among node1, 3,4 and 5 equally. I mean 3 VMs on each node. Is it possible to achieve this. How?

2. Can same node be part of multiple HA group ? Can I have overlapping of nodes in multiple HA groups?
 
So probably I can create 10 node cluster and then within that cluster can create HA GROUP 1, HA GROUP 2 etc. If I keep 5 nodes in HA group 1 and 5 nodes in HA group 2, then If any one node within that HA fails then all VMs of that node will be distributed to other 4 nodes.
If my understanding is correct then please clear following doubts:

1. Once node fails, the distribution policy can be set? Or it happens randomly. I know we can set priority. But I want them to be distributed in remaining 4 nodes equally. What I mean to say : suppse HA group 1 has 5 nodes: node 1, 2 ,3 etc. each node has 12 VMs. If node 2 fails then 12 VMs of node 2 should get distributed among node1, 3,4 and 5 equally. I mean 3 VMs on each node. Is it possible to achieve this. How?

2. Can same node be part of multiple HA group ? Can I have overlapping of nodes in multiple HA groups?

1. Yes that can be done.....Look at the wiki its about as straight forward as it gets.

2. No
 
2. Believe this is possible, we have overlapping HA groups. We seperate VMs from SW Service clusters into non-overlapping groups each:

[QOUTE]
group: AnyOne
comment For VMs which could be run on any node
nodes n3,n6,n1,n5,n2,n7,n4
restricted

group: HN12
comment Hypervisor Node 1,2
nodes n1,n2
restricted

group: HN34
comment Hypervisor Node 3,4
nodes n3,n4
restricted

group: HN567
comment Hypervisor Node 5,6,7
nodes n5,n6,n7
restricted

group: HN123
comment Hypervisor Node 1,2,3
nodes n2,n1,n3
restricted

group: HN456
comment Hypervisor Node 4,5,6
nodes n4,n5,n6
restricted

[/QUOTE]
 
2. Believe this is possible, we have overlapping HA groups. We seperate VMs from SW Service clusters into non-overlapping groups each:

[QOUTE]
group: AnyOne
comment For VMs which could be run on any node
nodes n3,n6,n1,n5,n2,n7,n4
restricted

group: HN12
comment Hypervisor Node 1,2
nodes n1,n2
restricted

group: HN34
comment Hypervisor Node 3,4
nodes n3,n4
restricted

group: HN567
comment Hypervisor Node 5,6,7
nodes n5,n6,n7
restricted

group: HN123
comment Hypervisor Node 1,2,3
nodes n2,n1,n3
restricted

group: HN456
comment Hypervisor Node 4,5,6
nodes n4,n5,n6
restricted
[/QUOTE]

Yep you are correct, I misinterpreted his question. I thought he was asking if 1 node could be part of multiple clusters. Need more coffee.
 
THx for replies. Few more queries:
1. Do I need to setup fencing in proxmox version 4.2 or it is already configured ?
2. How proxmox distributes VMs when one node fails? I mean does proxmox randomly distributes load? Or equally distributes?
 
THx for replies. Few more queries:
1. Do I need to setup fencing in proxmox version 4.2 or it is already configured ?
2. How proxmox distributes VMs when one node fails? I mean does proxmox randomly distributes load? Or equally distributes?

1. Watchdog fencing should be all set

2. I believe it is random, but honestly not to sure. There definitely is no logic as far as load distribution.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!