datacenter cluster

PetrolDriver

New Member
Nov 2, 2023
14
0
1
Hi,

I have two Proxmox server and want to join them to a cluster.

Here I have some questions:

1)
Does it matter which server is the master and which one the slave?

2)
I read that the slave server must be empty from virtual machines before joining to the master server.

3)
What happens when one of the servers breaks down, is the other server still running without problems (except when the storage of the virtual machines were stored on the server, which breaks away).

4)
I made the empty server (newer than the other server) to the master server - how can I remove this cluster to make the other server, which containes virtual machines, to the master server?

I only clicked to "Create Cluster" but this commands does not remove the master functionality.
Code:
systemctl stop pve-cluster.service
/etc/pve/corosync/corosync.conf

VMware's concept of joining more VMware servers is better than Proxmox - there servers can be merged, although they contains virtual machines. Also the renaming of virtual machines is possible, in Proxmox it seems not to be possible.
 
Last edited:
1)
Does it matter which server is the master and which one the slave?
It does not work that way. It works with a quorum of equal nodes and that's why you should not have only two nodes. Please read the manual: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum
2)
I read that the slave server must be empty from virtual machines before joining to the master server.
New nodes cannot have quests. It's explicitly in the manual: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_join_node_to_cluster
3)
What happens when one of the servers breaks down, is the other server still running without problems (except when the storage of the virtual machines were stored on the server, which breaks away).
On a two nodes cluster, your cluster is broken (see point 1). More than one thread about problems with this on this forum.
4)
I made the empty server (newer than the other server) to the master server - how can I remove this cluster to make the other server, which containes virtual machines, to the master server?
There is no such thing as a master, see point 1. The manual describes how to remove nodes from a cluster: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node
VMware's concept of joining more VMware servers is better than Proxmox - there servers can be merged, although they contains virtual machines. Also the renaming of virtual machines is possible, in Proxmox it seems not to be possible.
Maybe VMware is a better fit for your situation and you might want to use that instead. Otherwise, you might want to read the manual a bit and experiment with all features for free.
If you have specific questions (which are no based on a false premise), feel free to ask them on this community forum.
 
Thanks for your answers.

It seems that cluster functionality is not what I need.

But it's not possible to remove the "created cluster" on one of my servers.
Code:
root@pve-2:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve-2 (local)
root@pve-2:~# pvecm delnode pve-2
Cannot delete myself from cluster!
root@pve-2:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve-2 (local)
root@pve-2:~#

Does Proxmox support just merging two servers (storage, shared storages) that I can clone/migrate virtual machines from one Proxmox server to another one?
 
Last edited:
Thanks for your answers.

It seems that cluster functionality is not what I need.

But it's not possible to remove the "created cluster" on one of my servers.
Just reinstall Proxmox?
Does Proxmox support just merging two servers (storage, shared storages) that I can clone/migrate virtual machines from one Proxmox server to another one?
Proxmox does support migration between two separate nodes, but that is still command line only: search the manual or this forum for qm remote-migrate,
 
Just reinstall Proxmox?
Really reinstalling? :confused:

I only clicked on the button "create cluster".
What happens in the background (starting pve-cluster.service, writing /etc/pve/corosync/corosync.conf and generating key)?
Is there no other way to undo this "click"?
 
It seems that cluster functionality is not what I need.
Maybe you won't need all of its functionality, but it is extremely handy (as long it's not broken..)

Instead of having a main server and satellites, it's a system of pears in which each system can manage the whole datacenter.

With your nodes in a cluster, moving between nodes can be done via the GUI.

To get quorum, you could run a small machine as a third leg.
 
Maybe you won't need all of its functionality, but it is extremely handy (as long it's not broken..)

Instead of having a main server and satellites, it's a system of pears in which each system can manage the whole datacenter.

With your nodes in a cluster, moving between nodes can be done via the GUI.

To get quorum, you could run a small machine as a third leg.
This is the thing that should be emphasized to everyone trying to simply create a cluster in PVE with 2 nodes. He is asking if one can be master and other slave, so either we recommend him not having a cluster or have 2 votes for his "master" or have a Q device, without that he will have more trouble with having the cluster.

I also have yet to find a way to have "cluster" without impled HA in PVE. Because lots of people just want to have literally 2 nodes with replication so that if they at any point manually want to spin up the same VM on the other node, they can. They do not need the corosync and all the complexity.
 
I also have yet to find a way to have "cluster" without impled HA in PVE.
That's the default case.

HA is often responsible on rebooting nodes (fencing) by surprise. But this mechanism is only active if one or more VMs have HA enabled. So just do not use HA to disable this mechanism. Without active HA you should not get rebooting machines.

BUT: without Quorum the management of a cluster is disabled. This is where workarounds like a Quorum-Device or "pvecm expected 1" or "more votes for a 'master' node" enter the field.

Best regards
 
That's the default case.

HA is often responsible on rebooting nodes (fencing) by surprise. But this mechanism is only active if one or more VMs have HA enabled. So just do not use HA to disable this mechanism. Without active HA you should not get rebooting machines.

BUT: without Quorum the management of a cluster is disabled. This is where workarounds like a Quorum-Device or "pvecm expected 1" or "more votes for a 'master' node" enter the field.

Best regards
Right, but for all practical purposes, if one logs into GUI and cannot do anything because one of the two nodes is down, even though the expectation was he would be able to e.g. manually spin up replica of a VM that was on the dead node, he must be wondering why, why doesn't this work, what does the quorum need to be for when I had not a single HA machine.

I am pretty positive my nodes were rebooting just as you described without any single VM being set as HA. The way the whole corosync operates makes it so, so either I have yet to learn something or misunderstood you or something changed over time in how PVE clusters are done (without HA).
 
fencing is only active with HA enabled, and HA is optional on top of clustering. quorum is a requirement for clusters (else there is no way to ensure consistency when using shared resources, which means data loss/corruption).
 
fencing is only active with HA enabled, and HA is optional on top of clustering. quorum is a requirement for clusters (else there is no way to ensure consistency when using shared resources, which means data loss/corruption).
thank you Fabian! But then I struggle to understand how it's done in PVE, when I now go to see my datacentre settings it says HA is set as default(=conditional). I left everything at default when forming the cluster, I believe I understood how the pmxcfs works, etc., but what about those reboots? I do not have any VM set as HA. When i check
Code:
systemctl status pve-ha-lrm
● pve-ha-lrm.service - PVE Local HA Resource Manager Daemon
     Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; preset: e>
     Active: active (running) since Sun 2023-11-05 02:52:52 UTC; 2 days ago

Is it supposed to be running in my case?

EDIT: Not sure if relevant, but then I can see these kinds:
Code:
-- Boot 6f4da149534d4340b34aa1b7c045ac64 --
Nov 04 21:23:24 pve7 systemd[1]: Starting pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
Nov 04 21:23:25 pve7 pve-ha-lrm[1184]: starting server
Nov 04 21:23:25 pve7 pve-ha-lrm[1184]: status change startup => wait_for_agent_lock
Nov 04 21:23:25 pve7 systemd[1]: Started pve-ha-lrm.service - PVE Local HA Resource Manager Daemon.
Nov 04 23:00:56 pve7 pve-ha-lrm[1184]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve7/lrm_status.tmp.1184' - Permission denied
Nov 06 13:35:01 pve7 pve-ha-lrm[1184]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve7/lrm_status.tmp.1184' - Permission denied
 
Last edited by a moderator:
that's just the HA policy (that tells you what happens *if HA is active and you reboot a node*). HA is active if you opt into it by making a guest HA-managed. the services will always be running, even with HA inactive, ha-manager status -verbose will give you more details.

why your node rebooted is hard to tell without logs.
 
that's just the HA policy (that tells you what happens *if HA is active and you reboot a node*). HA is active if you opt into it by making a guest HA-managed. the services will always be running, even with HA inactive, ha-manager status -verbose will give you more details.

why your node rebooted is hard to tell without logs.
alright, I do not want to steal this thread to someone else, will look into the logs - if they still exist - from some days ago when most of it was all happening - and create a new thread (it then appeared it was all while the network connectivity between the nodes was in some way disrupted, i completely assumed it was simply because it's a cluster setup)
 
  • Like
Reactions: fabian
It worked. :)

I only want to move or clone a VM from one pve-server to the second one via GUI like I am used on VMware (I do not need the HA feature). Shared storage works in the same way like on VMware. When Proxmox does not offer this feature, I will use the CLI for this.

What happens when one pve-server out of the three would fail?
Is the datacenter still running with two servers until the third one is fixed?

The next thing I do not understand is, why the joining server must be emtpy from virtual machines.
In my case the IDs on pve-1 starts from 100 (100-199), the virtual machines on pve-2 starts from 200 (200-299). So the IDs cannot occur any issue.
 
It worked. :)

I only want to move or clone a VM from one pve-server to the second one via GUI like I am used on VMware (I do not need the HA feature). Shared storage works in the same way like on VMware. When Proxmox does not offer this feature, I will use the CLI for this.
VMware "cheats" in that case and uses the shared storage as arbitrator. you can do the same with PVE by using a qdevice instead of a full third node.
What happens when one pve-server out of the three would fail?
Is the datacenter still running with two servers until the third one is fixed?
yes. the quorate partition (2/3 in this example) would continue to work, and the third node would sync up again once it gets back.
The next thing I do not understand is, why the joining server must be emtpy from virtual machines.
In my case the IDs on pve-1 starts from 100 (100-199), the virtual machines on pve-2 starts from 200 (200-299). So the IDs cannot occur any issue.
because /etc/pve where all the "cluster-wide" config lives gets overwritten using the cluster version. that check is a safeguard to avoid you losing your configs (the existence of configured guests is used as a proxy for "this node is not freshly installed"). even if no guest ID collision exists, other stuff will also be overwritten/lost:
- firewall configs
- SDN config
- jobs
- storage.cfg
- users/tokens/acls/..
 
Thx, now the things make sense. :)

What happens, when two out of three servers fails?
The more servers a cluster contain, the more servers could fail (I compare it to RAID/ZFS).
Is the Proxmox cluster concept more scalable than comparing to RAID for example?
 
without HA, a node that is not part of the quorate partition will become read-only - it cannot change any configs in /etc/pve, it cannot start/stop/.. guests or do other actions that would potentially clash with other nodes (since it cannot synchronize with them). any guests that are already running should continue to run.
with HA, a non-quorate node will fence itself (lack of quorum == lack of watchdog updates == fence), if there are other quorate nodes they will recover configured HA services.
 
  • Like
Reactions: PetrolDriver
What happens on a cluster when none of the virtual machines will be added to the HA ressources list?
Normally then the virtual machines, which fail on one node, should not be started by any other spare node?

HA.png

So, if I do not need HA or the virtual machines are running on local storage, then it should be no problem to run a cluster with only two nodes.
Then the only advantage in a cluster with 2 nodes would be the feature of cloning/migrating virtual machines via GUI.

Is it also possible to migrate a virtual machine stored on local storage from one node to another node's local storage?
Or does a migration only include the moving of the configuration of a virtual machine and it only works when the virtual machines is stored on a shared storage?
 
What happens on a cluster when none of the virtual machines will be added to the HA ressources list?
if you don't have any HA resources, HA will not be active and no fencing or automatic recovery will occur.
Normally then the virtual machines, which fail on one node, should not be started by any other spare node?
exactly - if a node cannot talk to the rest of the cluster, it will go read-only like I described earlier, any guests running there continue to run. if it fully crashes they won't run anymore of course ;)
View attachment 57745

So, if I do not need HA or the virtual machines are running on local storage, then it should be no problem to run a cluster with only two nodes.
you still need three nodes (or two + a tie breaker vote) to tolerate the failure of one and still have quorum in the rest.
Then the only advantage in a cluster with 2 nodes would be the feature of cloning/migrating virtual machines via GUI.
shared configuration, migration, cloning, .. basically all the cluster features
Is it also possible to migrate a virtual machine stored on local storage from one node to another node's local storage?
yes, with some restrictions depending on whether the migration happens offline or online. if the two storages are the same type (e.g. both LVM-thin) it should always work.
Or does a migration only include the moving of the configuration of a virtual machine and it only works when the virtual machines is stored on a shared storage?
no. a migration with volumes on shared storage just skips the volume/disk migration part, if there are local volumes they will be transferred as well (as will the state including memory of the VM, in case of a live migration).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!