Clusterproblem - all nodes restarted automatically

Jospeh Huber · Nov 15, 2018

Hi,

tonight all nodes of a three node cluster restartet after a crash automatically at the same time ... very strange.
We never had this before, it looks like a clustering issue.
Any ideas what happened here?

Our Version:
proxmox-ve: 5.2-2 (running kernel: 4.15.18-1-pve) pve-manager: 5.2-6 (running version: 5.2-6/bcd5f008) pve-kernel-4.15: 5.2-4 pve-kernel-4.15.18-1-pve: 4.15.18-17 pve-kernel-4.15.17-1-pve: 4.15.17-9 pve-kernel-4.4.117-1-pve: 4.4.117-109 pve-kernel-4.4.98-6-pve: 4.4.98-107 pve-kernel-4.4.98-3-pve: 4.4.98-103 pve-kernel-4.4.76-1-pve: 4.4.76-94 pve-kernel-4.4.59-1-pve: 4.4.59-87 pve-kernel-4.4.35-2-pve: 4.4.35-79 pve-kernel-4.4.35-1-pve: 4.4.35-77 pve-kernel-4.4.24-1-pve: 4.4.24-72 pve-kernel-4.4.19-1-pve: 4.4.19-66 pve-kernel-4.4.16-1-pve: 4.4.16-64 pve-kernel-4.4.6-1-pve: 4.4.6-48 ceph: 12.2.5-pve1 corosync: 2.4.2-pve5 criu: 2.11.1-1~bpo90 glusterfs-client: 3.8.8-1 ksm-control-daemon: 1.2-2 libjs-extjs: 6.0.1-2 libpve-access-control: 5.0-8 libpve-apiclient-perl: 2.0-5 libpve-common-perl: 5.0-37 libpve-guest-common-perl: 2.0-17 libpve-http-server-perl: 2.0-9 libpve-storage-perl: 5.0-24 libqb0: 1.0.1-1 lvm2: 2.02.168-pve6 lxc-pve: 3.0.0-3 lxcfs: 3.0.0-1 novnc-pve: 1.0.0-2 proxmox-widget-toolkit: 1.0-19 pve-cluster: 5.0-29 pve-container: 2.0-24 pve-docs: 5.2-5 pve-firewall: 3.0-13 pve-firmware: 2.0-5 pve-ha-manager: 2.0-5 pve-i18n: 1.0-6 pve-libspice-server1: 0.12.8-3 pve-qemu-kvm: 2.11.2-1 pve-xtermjs: 1.0-5 qemu-server: 5.0-30 smartmontools: 6.5+svn4324-1 spiceterm: 3.0-5 vncterm: 1.5-3 zfsutils-linux: 0.7.9-pve1~bpo9

We have some HA KVMs and LXC and a local CEPH cluster installed on the proxmox hosts.

Here are the logs:
After the "^@^@^@^" the reboot takes place on all nodes...

pxhost1
=======
Nov 15 03:39:22 pxhost1 corosync[1617]: notice [TOTEM ] A processor failed, forming new configuration.
Nov 15 03:39:22 pxhost1 corosync[1617]: [TOTEM ] A processor failed, forming new configuration.
Nov 15 03:39:29 pxhost1 corosync[1617]: notice [TOTEM ] A new membership (111.222.333.119:10200) was formed. Members left: 1
Nov 15 03:39:29 pxhost1 corosync[1617]: notice [TOTEM ] Failed to receive the leave message. failed: 1
Nov 15 03:39:29 pxhost1 corosync[1617]: [TOTEM ] A new membership (111.222.333.119:10200) was formed. Members left: 1
Nov 15 03:39:29 pxhost1 corosync[1617]: [TOTEM ] Failed to receive the leave message. failed: 1
Nov 15 03:39:29 pxhost1 corosync[1617]: warning [CPG ] downlist left_list: 1 received
Nov 15 03:39:29 pxhost1 corosync[1617]: [CPG ] downlist left_list: 1 received
Nov 15 03:39:29 pxhost1 corosync[1617]: warning [CPG ] downlist left_list: 1 received
Nov 15 03:39:29 pxhost1 corosync[1617]: [CPG ] downlist left_list: 1 received
Nov 15 03:39:29 pxhost1 pmxcfs[1548]: [dcdb] notice: members: 2/2123, 3/1548
Nov 15 03:39:29 pxhost1 pmxcfs[1548]: [dcdb] notice: starting data syncronisation
Nov 15 03:39:29 pxhost1 pmxcfs[1548]: [status] notice: members: 2/2123, 3/1548
Nov 15 03:39:29 pxhost1 pmxcfs[1548]: [status] notice: starting data syncronisation
Nov 15 03:39:29 pxhost1 corosync[1617]: [QUORUM] Members[2]: 2 3
Nov 15 03:39:29 pxhost1 corosync[1617]: notice [QUORUM] Members[2]: 2 3
Nov 15 03:39:29 pxhost1 corosync[1617]: notice [MAIN ] Completed service synchronization, ready to provide service.
Nov 15 03:39:29 pxhost1 corosync[1617]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 15 03:40:00 pxhost1 systemd[1]: Starting Proxmox VE replication runner...
Nov 15 03:40:01 pxhost1 CRON[1237891]: (root) CMD (/usr/bin/ceph health > /tmp/ceph.health)
Nov 15 03:40:01 pxhost1 CRON[1237892]: (root) CMD (/usr/bin/ceph status > /tmp/ceph.status)
Nov 15 03:40:05 pxhost1 corosync[1617]: notice [TOTEM ] A new membership (111.222.333.119:10308) was formed. Members joined: 1
Nov 15 03:40:05 pxhost1 corosync[1617]: [TOTEM ] A new membership (111.222.333.119:10308) was formed. Members joined: 1
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

pxhost2
=======
Nov 15 03:39:29 pxhost2 corosync[2284]: notice [TOTEM ] Failed to receive the leave message. failed: 2 3
Nov 15 03:39:29 pxhost2 corosync[2284]: [TOTEM ] A new membership (111.222.333.121:10200) was formed. Members left: 2 3
Nov 15 03:39:29 pxhost2 corosync[2284]: warning [CPG ] downlist left_list: 2 received
Nov 15 03:39:29 pxhost2 corosync[2284]: notice [QUORUM] This node is within the non-primary component and will NOT provide any services.
Nov 15 03:39:29 pxhost2 corosync[2284]: notice [QUORUM] Members[1]: 1
Nov 15 03:39:29 pxhost2 corosync[2284]: notice [MAIN ] Completed service synchronization, ready to provide service.
Nov 15 03:39:29 pxhost2 corosync[2284]: [TOTEM ] Failed to receive the leave message. failed: 2 3
Nov 15 03:39:29 pxhost2 corosync[2284]: [CPG ] downlist left_list: 2 received
Nov 15 03:39:29 pxhost2 pmxcfs[2258]: [dcdb] notice: members: 1/2258
Nov 15 03:39:29 pxhost2 pmxcfs[2258]: [status] notice: members: 1/2258
Nov 15 03:39:29 pxhost2 corosync[2284]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Nov 15 03:39:29 pxhost2 corosync[2284]: [QUORUM] Members[1]: 1
Nov 15 03:39:29 pxhost2 corosync[2284]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 15 03:39:29 pxhost2 pmxcfs[2258]: [status] notice: node lost quorum
Nov 15 03:39:29 pxhost2 pmxcfs[2258]: [dcdb] crit: received write while not quorate - trigger resync
Nov 15 03:39:29 pxhost2 pmxcfs[2258]: [dcdb] crit: leaving CPG group
Nov 15 03:39:30 pxhost2 pve-ha-lrm[2800]: lost lock 'ha_agent_pxhost2_lock - cfs lock update failed - Operation not permitted
Nov 15 03:39:31 pxhost2 pve-ha-lrm[2800]: status change active => lost_agent_lock
Nov 15 03:39:31 pxhost2 pvestatd[2432]: storage 'BACKUP_TAPE_LAN2' is not online
Nov 15 03:39:33 pxhost2 pvestatd[2432]: storage 'BACKUP_NAS' is not online
Nov 15 03:39:41 pxhost2 pvestatd[2432]: storage 'BACKUP_NAS' is not online
Nov 15 03:39:43 pxhost2 pvestatd[2432]: storage 'BACKUP_TAPE_LAN2' is not online
Nov 15 03:39:51 pxhost2 pvestatd[2432]: storage 'BACKUP_TAPE_LAN2' is not online
Nov 15 03:39:53 pxhost2 pvestatd[2432]: storage 'BACKUP_NAS' is not online
Nov 15 03:39:54 pxhost2 rrdcached[2232]: flushing old values
Nov 15 03:39:54 pxhost2 rrdcached[2232]: rotating journals
Nov 15 03:39:54 pxhost2 rrdcached[2232]: started new journal /var/lib/rrdcached/journal/rrd.journal.1542249594.630429
Nov 15 03:39:54 pxhost2 rrdcached[2232]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1542242394.630404
Nov 15 03:40:00 pxhost2 corosync[2284]: notice [TOTEM ] A new membership (111.222.333.119:10300) was formed. Members joined: 2 3
Nov 15 03:40:00 pxhost2 corosync[2284]: [TOTEM ] A new membership (111.222.333.119:10300) was formed. Members joined: 2 3
Nov 15 03:40:00 pxhost2 systemd[1]: Starting Proxmox VE replication runner...
Nov 15 03:40:01 pxhost2 pvesr[459624]: trying to aquire cfs lock 'file-replication_cfg' ...
Nov 15 03:40:01 pxhost2 CRON[459651]: (root) CMD (/usr/bin/ceph status > /tmp/ceph.status)
Nov 15 03:40:01 pxhost2 CRON[459652]: (root) CMD (/usr/bin/ceph health > /tmp/ceph.health)
Nov 15 03:40:01 pxhost2 pvestatd[2432]: storage 'BACKUP_TAPE_LAN2' is not online
Nov 15 03:40:02 pxhost2 pvesr[459624]: trying to aquire cfs lock 'file-replication_cfg' ...
Nov 15 03:40:03 pxhost2 pvesr[459624]: trying to aquire cfs lock 'file-replication_cfg' ...
Nov 15 03:40:03 pxhost2 pvestatd[2432]: storage 'BACKUP_NAS' is not online
Nov 15 03:40:04 pxhost2 pvesr[459624]: trying to aquire cfs lock 'file-replication_cfg' ...
Nov 15 03:40:04 pxhost2 pmxcfs[2258]: [status] notice: cpg_send_message retry 10
Nov 15 03:40:05 pxhost2 pvesr[459624]: trying to aquire cfs lock 'file-replication_cfg' ...
Nov 15 03:40:05 pxhost2 corosync[2284]: notice [TOTEM ] A new membership (111.222.333.119:10308) was formed. Members joined: 2 3 left: 2 3
Nov 15 03:40:05 pxhost2 corosync[2284]: notice [TOTEM ] Failed to receive the leave message. failed: 2 3
Nov 15 03:40:05 pxhost2 corosync[2284]: [TOTEM ] A new membership (111.222.333.119:10308) was formed. Members joined: 2 3 left: 2 3
Nov 15 03:40:05 pxhost2 corosync[2284]: [TOTEM ] Failed to receive the leave message. failed: 2 3
Nov 15 03:40:05 pxhost2 pmxcfs[2258]: [status] notice: cpg_send_message retry 20
Nov 15 03:40:06 pxhost2 pvesr[459624]: trying to aquire cfs lock 'file-replication_cfg' ...
Nov 15 03:40:06 pxhost2 pmxcfs[2258]: [status] notice: cpg_send_message retry 30
Nov 15 03:40:07 pxhost2 pvesr[459624]: trying to aquire cfs lock 'file-replication_cfg' ...
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

pxhost 5
========
Nov 15 03:39:22 pxhost5 corosync[2183]: notice [TOTEM ] A processor failed, forming new configuration.
Nov 15 03:39:22 pxhost5 corosync[2183]: [TOTEM ] A processor failed, forming new configuration.
Nov 15 03:39:29 pxhost5 corosync[2183]: notice [TOTEM ] A new membership (111.222.333.119:10200) was formed. Members left: 1
Nov 15 03:39:29 pxhost5 corosync[2183]: notice [TOTEM ] Failed to receive the leave message. failed: 1
Nov 15 03:39:29 pxhost5 corosync[2183]: [TOTEM ] A new membership (111.222.333.119:10200) was formed. Members left: 1
Nov 15 03:39:29 pxhost5 corosync[2183]: [TOTEM ] Failed to receive the leave message. failed: 1
Nov 15 03:39:29 pxhost5 corosync[2183]: warning [CPG ] downlist left_list: 1 received
Nov 15 03:39:29 pxhost5 corosync[2183]: warning [CPG ] downlist left_list: 1 received
Nov 15 03:39:29 pxhost5 corosync[2183]: [CPG ] downlist left_list: 1 received
Nov 15 03:39:29 pxhost5 corosync[2183]: [CPG ] downlist left_list: 1 received
Nov 15 03:39:29 pxhost5 pmxcfs[2123]: [dcdb] notice: members: 2/2123, 3/1548
Nov 15 03:39:29 pxhost5 pmxcfs[2123]: [dcdb] notice: starting data syncronisation
Nov 15 03:39:29 pxhost5 corosync[2183]: notice [QUORUM] Members[2]: 2 3
Nov 15 03:39:29 pxhost5 corosync[2183]: notice [MAIN ] Completed service synchronization, ready to provide service.
Nov 15 03:39:29 pxhost5 corosync[2183]: [QUORUM] Members[2]: 2 3
Nov 15 03:39:29 pxhost5 corosync[2183]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 15 03:39:30 pxhost5 pmxcfs[2123]: [dcdb] notice: cpg_send_message retried 1 times
Nov 15 03:39:30 pxhost5 pmxcfs[2123]: [status] notice: members: 2/2123, 3/1548
Nov 15 03:39:30 pxhost5 pmxcfs[2123]: [status] notice: starting data syncronisation
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^

wolfgang · Nov 16, 2018

Hi,

this can only happened if the corosync network disappears.

How is your corosync network layout?
Does it work properly?

Jospeh Huber · Nov 16, 2018

Hi,

it is a "standard" configuration which is working since the cluster was created with Proxmox 4 a few years ago.
and it was working all the time.

In the night there are backups scheduled they put the network under load... but this is also running for years.

Hm, so the reboot can happen:
What do you exactly mean with disappear?... the interface is physically down on the host i.e. because of a stale network switch?
Because this is a very dangerous "feature", is there any documentation where I can lookup the conditions for this behaviour?

There is no dedicated network interface if you mean this:

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pxhost1
nodeid: 3
quorum_votes: 1
ring0_addr: pxhost1
}

node {
name: pxhost2
nodeid: 1
quorum_votes: 1
ring0_addr: pxhost2
}

node {
name: pxhost5
nodeid: 2
quorum_votes: 1
ring0_addr: pxhost5
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: xxxxxxxxx-proxmox4
config_version: 3
ip_version: ipv4
secauth: on
interface {
bindnetaddr: 111.222.333.121
ringnumber: 0
}
}

wolfgang · Nov 16, 2018

Jospeh Huber said:
In the night there are backups scheduled they put the network under load... but this is also running for years.

This does not prove that is it correct.
Things change and network load is never the same.
Also if a network works correctly is influenced by the switch(age, SW, load, ....)

Jospeh Huber said:
What do you exactly mean with disappear?

Network interruption, slow network, packet loss, ...

Jospeh Huber said:
Because this is a very dangerous "feature", is there any documentation where I can lookup the conditions for this behaviour?

Here is the Documentation of HA
https://pve.proxmox.com/wiki/High_Availability

Jospeh Huber said:
There is no dedicated network interface if you mean this:

I would really recommend you to use a dedicated network for your corosync Network or disable HA.

Belokan · Nov 16, 2018

I got this exact situation 3 times in the past on my @home cluster. I have 2 LANs, one for the public network (including backup and corosync) and one used by Ceph.
Since I've added a second corosync ring on the Ceph LAN it never happened again. And in my case, it was clearly backups causing corosync timeouts.

Jospeh Huber · Nov 16, 2018

@wolfgang:
It is clear, that "worked for years" is not a proof that it is correct ... but almost

As I wrote, we've been running it for years without any problems and I wasn't aware that something like this could happen. That one should use a dedicated network for ceph is described in detail... but not for corosync.

Explicitly I don't read this of the description "if the network is slow/gone gone a reboot" happened but perhaps I miss something?
https://pve.proxmox.com/wiki/High_Availability
Do you mean this paragraphs?
Cluster Resource Manager
...
When a cluster member determines that it is no longer in the cluster quorum, the LRM waits for a new quorum to form. As long as there is no quorum the node cannot reset the watchdog. This will trigger a reboot after the watchdog then times out, this happens after 60 seconds.`
...
Fences
..
During normal operation, ha-manager regularly resets the watchdog timer to prevent it from elapsing. If, due to a hardware fault or program error, the computer fails to reset the watchdog, the timer will elapse and triggers a reset of the whole server (reboot).

Belokan said:
I got this exact situation 3 times in the past on my @home cluster. I have 2 LANs, one for the public network (including backup and corosync) and one used by Ceph.
Since I've added a second corosync ring on the Ceph LAN it never happened again. And in my case, it was clearly backups causing corosync timeouts.

Thanks for your hint, we have also another dedicated network for the ceph filesystem. So we can use this also for corosync.

What's the best way to do that in a productive cluster?
Can this be done in rolling mode or does the entire cluster and all VMs and containers have to be shut down?
Is there anything to consider?

Thx

Belokan · Nov 18, 2018

Jospeh Huber said:
Thanks for your hint, we have also another dedicated network for the ceph filesystem. So we can use this also for corosync.

What's the best way to do that in a productive cluster?
Can this be done in rolling mode or does the entire cluster and all VMs and containers have to be shut down?
Is there anything to consider?

Thx

I did it "live" following this tuto if I remember correctly: https://www.sebastien-han.fr/blog/2012/08/01/corosync-rrp-configuration/
Important things to remember is to increase "config_version" and set "rrp_mode: passive"

If it can help, here are my single ring (previous) & dual rings (actual) corosync.conf:

SINGLE

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: pve1
}

node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: pve2
}

node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: pve3
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: home-cluster
config_version: 10
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.1.150
ringnumber: 0
}
}

DUAL

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: pve1
ring1_addr: pve1pm
}

node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: pve2
ring1_addr: pve2pm
}

node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: pve3
ring1_addr: pve3pm
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: home-cluster
config_version: 11
ip_version: ipv4
rrp_mode: passive
secauth: on
version: 2
interface {
bindnetaddr: 192.168.1.150
ringnumber: 0
}
interface {
bindnetaddr: 10.0.0.0
ringnumber: 1
}

}

I don't now why bindnetaddr: is set to the primary cluster node's address instead of its network's one but this was done by Proxmox during cluster configuration ...

Olivier

Jospeh Huber · Nov 19, 2018

Belokan said:
I did it "live" following this tuto if I remember correctly: https://www.sebastien-han.fr/blog/2012/08/01/corosync-rrp-configuration/
Important things to remember is to increase "config_version" and set "rrp_mode: passive"

If it can help, here are my single ring (previous) & dual rings (actual) corosync.conf:
...
I don't now why bindnetaddr: is set to the primary cluster node's address instead of its network's one but this was done by Proxmox during cluster configuration ...

Olivier

Thanks a lot I will try!

The dual configuration makes absolut sense for me... to make it more fault tolerant.
I will add my ceph network as second ring: 10.0.99.0
A quick look in the tutorial showed me, that there are some commands missing in my promox like the maintenance mode: "
crm configure property maintenance-mode=true".
And it is not configured via multicast in proxmox.

Since everything in /etc/pve is synced automatically, I just have to change it on one node in the cluster, right?

Jospeh Huber · Nov 19, 2018

Dual configuration done on a running cluster as described above:

1. edit the /etc/pve/corosync.conf on one node (with the dual config):
- add unique ring1_addr for all nodes
- add interface with the second network (i.e. for ceph)
- increase config_version and "set rrp_mode: passive"

2. restart corosync service on all nodes

3. check ring status on all nodes
corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 111.222.333.121
status = ring 0 active with no faults
RING ID 1
id = 10.0.99.82
status = ring 1 active with no faults

Search

Search

Clusterproblem - all nodes restarted automatically

Jospeh Huber

Renowned Member

wolfgang

Proxmox Retired Staff

Jospeh Huber

Renowned Member

wolfgang

Proxmox Retired Staff

Belokan

Active Member

Jospeh Huber

Renowned Member

Belokan

Active Member

Jospeh Huber

Renowned Member

Jospeh Huber

Renowned Member

We value your privacy