pve 4.4: Cluster nodes flapping in GUI from green to red and vice versa

northe · Sep 27, 2017

Hello Wolfgang,
attached your request. Let us stop us here because I have to check if this is not an interference of the IPMI which resides on the mainboard. At my point of view, it looks like a double IP in the network but this is not the case (checked ARP-table twice). So next IPMI has control over the server, so I want to disable it. I come back to you.

# corosync-quorumtool
Quorum information
------------------
Date: Wed Sep 27 08:33:07 2017
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 5
Ring ID: 1/552
Quorate: Yes

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 5
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
1 1 node1708vm-1
2 1 node1708vm-2
3 1 node1708vm-3
4 1 node1708vm-4
5 1 node1708vm-5 (local)

northe · Sep 27, 2017

I have set the mode from "failover" to "deticated" in the IPMI NIC interface settings, hoping that this might be the reason for the phaenomen... but it was not. The nodes still keep on flapping.

On the console I saw on all nodes the following message about 20-30 times:
...systemd-sysv-generator ignoring creation of an alias umountiscsi.service for itself
ISCSI is not in use.

I am running out of ideas, because I can't find any error message.

wolfgang · Sep 28, 2017

Please send me your storage.cfg
Your ceph config shows you are use cephx auth but your rados command shows auth none.

Code:

cat /etc/pve/storage.cfg 
ls -hal /etc/pve/priv/ceph

northe · Sep 28, 2017

I have to mention, that in an early state of intallation (after creating HA) I need to switch the the corosync network from previous Admin-Net 192.168.0.0/24 to 10.0.21.0/24 (dedicated network). This changed hostnames from node1708-X to node1708vm-X.
It would be nice to be able to define the network according their purposes (admin, coro, storage) with the installation wizzard.

# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,iso,backup
lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images
rbd: VM-OS
monhost 10.0.21.10,10.0.21.20,10.0.21.30
content images
krbd 0
pool TIER0
username admin
rbd: VM-DATA
monhost 10.0.21.10,10.0.21.20,10.0.21.30
content images
krbd 0
pool TIER1
username admin

#ls -hal /etc/pve/priv/ceph
-rw------- 1 root www-data 137 Sep 19 17:09 TIER0.keyring
-rw------- 1 root www-data 137 Sep 19 17:09 TIER1.keyring

northe · Sep 28, 2017

With the movement of coro network the "old" hostnames left in /etc/pve/nodes:
drwxr-xr-x 2 root www-data 0 Sep 13 19:01 node1708-1
drwxr-xr-x 2 root www-data 0 Sep 17 17:40 node1708-2
drwxr-xr-x 2 root www-data 0 Sep 17 17:41 node1708-3
drwxr-xr-x 2 root www-data 0 Sep 17 17:41 node1708-4
drwxr-xr-x 2 root www-data 0 Sep 17 17:42 node1708-5
drwxr-xr-x 2 root www-data 0 Sep 20 10:52 node1708vm-1
drwxr-xr-x 2 root www-data 0 Sep 20 10:52 node1708vm-2
drwxr-xr-x 2 root www-data 0 Sep 20 10:52 node1708vm-3
drwxr-xr-x 2 root www-data 0 Sep 20 10:52 node1708vm-4
drwxr-xr-x 2 root www-data 0 Sep 20 10:52 node1708vm-5
Should I delete the directories with the old hostnames ?

wolfgang · Sep 28, 2017

Please rename you keyring's.
You have to use the storage name not the pool name.

VM-OS.keyring
VM-DATA.keyring

Then restart the pvestad.service

northe said:
Should I delete the directories with the old hostnames ?

I you have no settings in it you can erase the old node dir.
But I would back them up.

northe · Sep 28, 2017

I moved the old directories to /root/nodes/ and renamed the two keyring files according the storage names.
Afterall rebooted each node. Nothing changed.

wolfgang · Sep 28, 2017

The monitor address in your storage.cfg is not correct.

northe · Sep 28, 2017

I do not clearly understand
10.0.21.0/24 = coro network
10.0.20.0/24 = ceph network
So I added the three nodes which are qurorum monitors 10.0.21.10 to 30.
Do I have to define three ceph nodes (randomly?) as monitor ?

northe · Sep 28, 2017

Ah, I see.
ceph status lists the nodes1708-1 to 3 by name which are not in /etc/hosts...... I update the hosts file .

wolfgang · Sep 28, 2017

You have also to update the storage.cfg with the correct mon.

northe · Sep 28, 2017

... rebooted and now it is silence. I wil obey the cluster but since 10 minutes everything is stable.
Wolfgang, you are my hero! Thank you very much!

Search

Search

pve 4.4: Cluster nodes flapping in GUI from green to red and vice versa

northe

Active Member

Attachments

northe

Active Member

wolfgang

Proxmox Retired Staff

northe

Active Member

northe

Active Member

wolfgang

Proxmox Retired Staff

northe

Active Member

Attachments

wolfgang

Proxmox Retired Staff

northe

Active Member

northe

Active Member

wolfgang

Proxmox Retired Staff

northe

Active Member

We value your privacy