Cluster Outtake when rebooting Node

Jürgen Wendler · Nov 10, 2023

Hello folks,

following Situation:
Existing Cluster with 3 Nodes - runs for years now - recently upgraded to latest Debian/Proxmox Release

Now the following:
We added 4 new Nodes on another geographical Area which is connected via Darkfibre VPN. Everything still working like a charm. Every Node has a dedicated Mgmt Interface, a dedicated SAN Interface and one for VM Traffic. Everything good here, every server can reach every other server via ping, ssh etc.
Now i migrated some VMs from one of the "old" Nodes to the new Nodes to spread them out - Everything still works.
And here is the question:
Then i shutdown one of the older nodes because we need to swap network cards there. Of course i migrated all vms before doing that. But suddenly the whole cluster became unresponsive, the vms were shutdown and could not get started anymore until we started the shutdown Node again.
The only really error i got:

Code:

2023-11-09T14:06:57.070785+01:00 dus1 corosync[2182]:   [KNET  ] link: host: 1 link: 0 is down
2023-11-09T14:06:57.071127+01:00 dus1 corosync[2182]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
2023-11-09T14:06:57.071198+01:00 dus1 corosync[2182]:   [KNET  ] host: host: 1 has no active links
2023-11-09T14:06:58.287294+01:00 dus1 pve-ha-crm[2238]: status change slave => wait_for_quorum
2023-11-09T14:07:01.450967+01:00 dus1 pve-ha-lrm[2249]: lost lock 'ha_agent_dus1_lock - cfs lock update failed - Permission denied
2023-11-09T14:07:06.454437+01:00 dus1 pve-ha-lrm[2249]: status change active => lost_agent_lock
2023-11-09T14:07:09.548265+01:00 dus1 pvescheduler[300289]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
2023-11-09T14:07:09.548806+01:00 dus1 pvescheduler[300288]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
2023-11-09T14:07:52.296775+01:00 dus1 watchdog-mux[939]: client watchdog expired - disable watchdog updates

What am i missing in my config? I mean whats the sense of a cluster if i cant shutdown one node for maintaince - or emergency case?
Did i miss something regarding Quorum?

Thanks for clearification

Chris · Nov 10, 2023

Hi,
what is the status of the cluster at the moment? The host seems to reboot because it looses quorum and therefore the watchdog timeout is triggered, rebooting the host. pvecm status should give you a list of all the nodes within the same network partition, check it on all of them. Also check the journal on the other nodes for further information.

The setup as you currently have it is not recommended at all, as corosync requires a low latency network to keep the distributed cluster information up to date, which might not be the case if you tunnel your traffic over a VPN to an other location.

Jürgen Wendler · Nov 10, 2023

Hello,

first: its a 40GB Dark Fibre between the two locations. Ping times are around 0,4 ms between the locations. But thanks, will consider to look here closer.
The outcome of pvecm status:

Code:

root@dus1:/var/log# pvecm status
Cluster information
-------------------
Name:             LinuxCluster
Config Version:   14
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Nov 10 11:22:21 2023
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000002
Ring ID:          1.3aa
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   6
Highest expected: 6
Total votes:      4
Quorum:           4
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.195.24
0x00000002          1 192.168.195.20 (local)
0x00000003          1 192.168.195.21
0x00000004          1 192.168.195.26

And yes - at the moment 2 Nodes are shutdown because of maintance work.

And just to complete it, here is my corosync.conf:

Code:

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: dus1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.195.20
  }
  node {
    name: dus3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.195.24
  }
  node {
    name: dus4
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.195.26
  }
  node {
    name: dus5
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.195.21
  }
  node {
    name: dus6
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 192.168.195.23
  }
  node {
    name: dus7
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 192.168.195.25
  }
}

quorum {
  provider: corosync_votequorum
}
totem {
  cluster_name: LinuxCluster
  config_version: 14
  interface {
    bindnetaddr: 192.168.195.23
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

not sure about the bindnetaddr since it isnt really necessary any more - at least i read about that here in the forum.

Greetings

LnxBil · Nov 10, 2023

Jürgen Wendler said:
And yes - at the moment 2 Nodes are shutdown because of maintance work.

That is not good and vital for the cluster health.

Jürgen Wendler said:
Expected votes: 6
Highest expected: 6
Total votes: 4
Quorum: 4
Flags: Quorate

If one node goes down, you only have 3 out of 6 and you would be quorate with 4, so everything breaks down. Therefor you always need the majority and 50% is not that.

Jürgen Wendler · Nov 13, 2023

LnxBil said:
That is not good and vital for the cluster health.

If one node goes down, you only have 3 out of 6 and you would be quorate with 4, so everything breaks down. Therefor you always need the majority and 50% is not that.

Is there a way to configure this quorate number? I have 6 Nodes in my cluster because i want to be able to shutdown 2 for maintenance or emergency cases.

LnxBil · Nov 13, 2023

Jürgen Wendler said:
Is there a way to configure this quorate number?

Please read about the cluster manager in general.

Jürgen Wendler said:
I have 6 Nodes in my cluster because i want to be able to shutdown 2 for maintenance or emergency cases.

This is exactly what is configured at the moment. Your problem is that there are 3 nodes down and you won't have quorum with 3 down and 3 running (out of the configured 6). You need 4 nodes running - always >50%, not exactly 50%. Therefore you normally have an odd number of nodes, which cannot have exactly 50% down.

Chris · Nov 13, 2023

Jürgen Wendler said:
Is there a way to configure this quorate number? I have 6 Nodes in my cluster because i want to be able to shutdown 2 for maintenance or emergency cases.

First off, since a majority of the nodes is required, it is recommended to run an uneven number of nodes or alternatively add an external voter as tie breaker [0].

In your 6 node cluster, you can loose up to 2 nodes, so shutdown of 2 nodes for maintenance is possible, but if you have already 2 nodes offline, you cannot shutdown another node.
In that case you should bring up another node before shutting down the other one.

While there is the possibility to reduce the number of expected votes temporarily, that should only be used for maintenance and not during normal cluster operation, see for example [1].

Regarding your setup with 2 locations: if the latency is low enough all the time, than that should not be a problem. Recommended is < 5 ms [2].

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_separate_node_without_reinstall
[2] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_cluster_network_requirements

Jürgen Wendler · Nov 14, 2023

Thanks for all the answers. I got it now. Ive added a 7th Node to be uneven and ill try to shutdown only one at a time for maintenance / updates. So no need to worry there.
Regarding latency: Since we are 99% under 1ms between the locations we should be good here too.

Anyway,, thanks for all your help!!!

Search

Search

Cluster Outtake when rebooting Node

Jürgen Wendler

Member

Chris

Proxmox Staff Member

Jürgen Wendler

Member

LnxBil

Distinguished Member

Jürgen Wendler

Member

LnxBil

Distinguished Member

Chris

Proxmox Staff Member

Jürgen Wendler

Member