Large proxmox installations (clusters!). I need some developer's attention

RRJ

Member
Apr 14, 2010
245
0
16
Estonia, Tallinn
Hello,

I've been using proxmox for a while (3 years in total), but in single mode.
Today we've got about 300 000 eur for hardware, which will be used for cluster virtualization platform.
All we want is two performance clusters in high availability cluster. In other words, we want to have two racks full of hardware, where one rack is a high-performace cluster and other with same type of hardware, which is used for high availability cluster. Can proxmox handle this?

Looking for an answer as soon as possible.


Thank you.
 
Thank you for an answer.
Could you give us some basic installation tips, so we could start with tests? The hardware tender will be published next week, but software tender will be published at the beginnig of summer, if we will not be able to configure proxmox till that time.
So basically the configuration should be looking like:
1. install proxmox in the first rack in high-performance cluster
2. install proxmox in the second rack in high-performance cluster
3. what would be this step? We should put in HA only those two DC-s or each physical server?

Will blade servers give us some flexibility ? They are typically hot-swap.
We are not commercial organization. We run Library and this cluster would be used for internal needs, so we won't have to install new hardware each time, we need a server.
 
what's your problem with proxmox wiki, all informations you need are on the wiki. Just RTFM! :)
 
any hints or at least some manual link ?

Hi,
1. you should play with an pve-cluster to see how live-migration works (some people have not the right idea about shared storage and so on). You can use simple PCs for that (need hw-virtualisation, but this is normal today)

2. be sure that your shared storage for your ha is also ha!

3. test ha - testing is good for all, but for ha it's mandantory

Udo
 
what's your problem with proxmox wiki, all informations you need are on the wiki. Just RTFM! :)
There we go to the moment, when people decide to use commercial software like VMWare/Hyper-v. They never send people to RTFM (they even send their workers to demonstrate, why their solution is better for free).
I hope, that you are not from proxmox team and that was your own opinion.
 
Ok, so we should start from creating the HA shared storage. Ok, when we'll get the hardware, I'll play with it.
But I'm interested, how can I join 2 performance clusters (they work out of the box, if I'm right) into HA cluster? :) Are there any documentations on this type of installations?
 
Ok, so we should start from creating the HA shared storage. Ok, when we'll get the hardware, I'll play with it.
But I'm interested, how can I join 2 performance clusters (they work out of the box, if I'm right) into HA cluster? :) Are there any documentations on this type of installations?
Hi,
what do you mean with an "performance cluster"?? How many nodes? Why perform this cluster more then others?
What is the reason for splitting in severals cluster?
HA, and alos normal migration work only inside one cluster - mean you must create one cluster if you wan't to use ha.

The normal way for an cluster is to use enough server for all processes (CPU/IO/RAM) with 3 or more nodes.
If one node fail - the VMs can run on other nodes in the cluster (manual, or with ha).

About the testing: what kind of shared storage do you want to use (iSCSI, FC, NFS, DRBD, RDB)?


Udo
 
by performance cluster I mean the basic Proxmox cluster system, that was available before HA cluster came in game (1.x ver). When one runs few proxmox instances and can switch nodes for any virtual host inside this cluster. May be, my suggestion is wrong and it is not performance cluster.

By the other hand, may be my understanding of new HA cluster is wrong also and we won't need the performance cluster actually.

I understand it this way:
We install everything in redundant way: network, shared HA storage, power supply, UPS, etc. Lets say, we've got 3 nodes in total.
What will happen, if two of three nodes will go down? To be more specific, lets speak even of 7 nodes ie. What will happen, if 3 of 7 will go down and there is not enough resources for those 4 remained nodes to handle all the needs of virtual hosts that were running in this cluster?
It is easy to count:
lets say we have 140Gb ram on 7 (20 on each) nodes.
lets say 130 is used.
When 3 nodes go down, we lose like 60 GB.

We won't be able to serve those virtual machines needs, that will be switched to the remained HW.

OR
It works other way? The HA cluster offered by proxmox allows only 1 node to be down in the same time?
so it is always like n+1 ? If we've got cluster of 7 nodes, 1 actually is a backup for any and only one in the cluster? Than this explains me everything :) and I don't have any question on installation process. We only have to install HA cluster with proxmox, that does not seem to be hard.
 
So if you want advanced support buy it. If not read the wiki and i'm not from promox team.

I'm just a user since few years ago and i try by myself and if i don't get success i ask in the forum.

But for me, the first step is to read wiki.
 
Last edited:
willy1009, like myself, is just another user who tries to help people in our spare time, or seek answers when we can not find a solution.
Many of your questions have been answered in these forums and in the wiki, I think that is all anyone was trying to point out to you.

Anyhow, back to your questions.
First you need to understand what the cluster in Proxmox cluster means and what HA means in Proxmox.

Proxmox cluster simply allows synchronisation of configuration across all nodes in a cluster.
The big benefit here is every Proxmox node is a "management" node.
You can use the web GUI on any of the nodes to manage all of the resources.
So Proxmox Cluster provides a web GUI with no SPOF (Single Point of Failure)

When using HA VMs, if the physical node running the HA VM fails, Proxmox will automatically start that VM on another node.
You can do this manually without HA, all HA does is automate it so recovering from a node failure is automatic.

Now lets talk some nuts and bolts.
If you want a reliable cluster you need at least three nodes so you can have quorum (aka majority), think of the cluster as a democracy.
The majority of the nodes must agree on what the current state of all the nodes are.
This is extremely important when dealing with shared storage, imagine if incorrect assumptions were made and the same VM was started on two nodes at the same time.

If you want HA VMs, you MUST have proper fencing setup, this is needed to ensure the assumed dead node is actually dead.
I prefer PDUs for fencing, but IPMI can work in most cases just as well.


You were asking about a three node cluster:
Loose one node, you have two and still have quorum so assuming all your VM can fit on two nodes, no problem.
Loose two nodes, you no longer have quorum, you are dead and can not even manage things.

By default you need to have n/2 + 1 nodes running to have quorum.
It should be obvious that you must have enough resources available to run all the VMs you need to run.
If all your VMs use 100GB of RAM, and you only have 7 nodes with 20GB each, then it is obvious that you need at least 5 of those 7 nodes online.

Proxmox does include KSM which will help reduce ram usage if you run a lot of the same OS VMs.
So you can overcommit a little bit, how much is very specific to your environment and workloads.
 
e100
It is just awesome answer. You don't even imagine, how grateful I am for your answer. It's worth of thousands thanks. Simple and clear. Should be added to "Proxmox HA cluster for dumies" for sure!
 
so I was just downloading 2.2 thinking I will sign up to the forum next week when I have more time, but had to do so now to say this is a great post!

THANKS

willy1009, like myself, is just another user who tries to help people in our spare time, or seek answers when we can not find a solution.
Many of your questions have been answered in these forums and in the wiki, I think that is all anyone was trying to point out to you.

Anyhow, back to your questions.
First you need to understand what the cluster in Proxmox cluster means and what HA means in Proxmox.

Proxmox cluster simply allows synchronisation of configuration across all nodes in a cluster.
The big benefit here is every Proxmox node is a "management" node.
You can use the web GUI on any of the nodes to manage all of the resources.
So Proxmox Cluster provides a web GUI with no SPOF (Single Point of Failure)

When using HA VMs, if the physical node running the HA VM fails, Proxmox will automatically start that VM on another node.
You can do this manually without HA, all HA does is automate it so recovering from a node failure is automatic.

Now lets talk some nuts and bolts.
If you want a reliable cluster you need at least three nodes so you can have quorum (aka majority), think of the cluster as a democracy.
The majority of the nodes must agree on what the current state of all the nodes are.
This is extremely important when dealing with shared storage, imagine if incorrect assumptions were made and the same VM was started on two nodes at the same time.

If you want HA VMs, you MUST have proper fencing setup, this is needed to ensure the assumed dead node is actually dead.
I prefer PDUs for fencing, but IPMI can work in most cases just as well.


You were asking about a three node cluster:
Loose one node, you have two and still have quorum so assuming all your VM can fit on two nodes, no problem.
Loose two nodes, you no longer have quorum, you are dead and can not even manage things.

By default you need to have n/2 + 1 nodes running to have quorum.
It should be obvious that you must have enough resources available to run all the VMs you need to run.
If all your VMs use 100GB of RAM, and you only have 7 nodes with 20GB each, then it is obvious that you need at least 5 of those 7 nodes online.

Proxmox does include KSM which will help reduce ram usage if you run a lot of the same OS VMs.
So you can overcommit a little bit, how much is very specific to your environment and workloads.
 
Thanks for the post, please put these clear info in the wiki! I've never worked with clusters nor HA, and I think a clear "high level" picture of all this stuff for newbie is important (I reached it so far collecting pieces from wiki, posts in this forum, googling, etc.).
I still don't understand a couple of things though:
You were asking about a three node cluster:
Loose one node, you have two and still have quorum so assuming all your VM can fit on two nodes, no problem.
Loose two nodes, you no longer have quorum, you are dead and can not even manage things.

a) why if I have 3 nodes and loose one, I still have quorum? With 2 survived nodes I don't have a majority (let's say that node1 thinks node3 is still alive, and node2 thinks is dead)
b) why with 2 dead node out of 3 I'm doomed? Can't (if resources permit) all the VM be started automatically on the only survived node? (of course fencing must be working reliable in this situation). Can I manually (with easy) do it, since you write "can not even manage things" = everything is lost
Thanks a lot
 
a) This is where fencing kicks in. Fencing or stonith (shoot the other node in the head) ensures to shut down a failing node and keeps it down until the operator steps in. So the situation is that all remaining active nodes knows that the failing node is down so two yes is therefore a majority out of three.
b) With two dead nodes out of three you no longer are able to have majority since 1 yes out of three is a minority. If majority lacks then the hole system goes into, what you could call, survival mode. The easiest way to survive if you are a distributed service is to prevent any change under the assumption: I know what I have but I don't know what I will get.
 
I understand a) that 2 nodes survive and they both voted to shut down node3, but what puzzles me is "you still have quorum", when OMHO with 2 nodes you have now lost a situation when you have quorum
b) so? I mean, is there a way to restore the functionality of the surviving node while waiting the other 2 to be replaced? Or you have ALL your infrastructure stopped forever?
Maybe because I've never tried HA, but seems that is more troubles than benefits, since is easy to find yourself in a very risky situation or a situation too complex not to do some fatal mistake when pressed to make it work again.
Maybe a cluster with shared storage and without HA is much much safer?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!