Need help - tricky as it isn't a constant red X or Green Tick on a node in a cluster

damonpillinger

Active Member
Sep 27, 2019
17
0
41
58
My PROX 4 goes from a green tick to a red X in the GUI

1571402047883.png

the CEPH has a drive on PROX 4. CEPH always stays all green.
If i turn of the node PROX4 and leave it off for 30 minutes then turn it on then CEPH goes all red but after about 10 minutes it goes all green EVEN THOUGH PROX4 MIGHT HAVE A RED X.

when prox4 is green I can migrate a VM (on a CEPH disk drive) to it (prox4) with no problem from PROX2 or PROX3.
but after 10-15 minutes I notice that the VM's have been migrated back to PROX2 or PROX3.
when i look at the SYSLOG for PROX4 I see this weird pattern

1571402294163.png

not sure what it means.

I have turned the node on and off a dozen times but a simple reboot isn't working.

CEPH works fine, all VM's are running fine

thanks
Concerned ;(
 
HI Wolfgang,

thanks for the links and I will give it a go tonight.
corosync looks to be running now but since all the nodes have dual NIC's I will set up a separate network.

systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2019-10-22 11:33:36 AEDT; 32min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1161 (corosync)
Tasks: 9 (limit: 4915)
Memory: 180.7M
CGroup: /system.slice/corosync.service
└─1161 /usr/sbin/corosync -f

Oct 22 11:44:44 prox4 corosync[1161]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 22 11:44:48 prox4 corosync[1161]: [TOTEM ] A new membership (1:142068) was formed. Members joined: 4
Oct 22 11:44:48 prox4 corosync[1161]: [KNET ] rx: host: 4 link: 0 is up
Oct 22 11:44:48 prox4 corosync[1161]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 22 11:44:48 prox4 corosync[1161]: [CPG ] downlist left_list: 0 received
Oct 22 11:44:48 prox4 corosync[1161]: [CPG ] downlist left_list: 0 received
Oct 22 11:44:48 prox4 corosync[1161]: [CPG ] downlist left_list: 0 received
Oct 22 11:44:48 prox4 corosync[1161]: [CPG ] downlist left_list: 0 received
Oct 22 11:44:48 prox4 corosync[1161]: [QUORUM] Members[4]: 1 2 3 4
Oct 22 11:44:48 prox4 corosync[1161]: [MAIN ] Completed service synchronization, ready to provide service.


many thanks
damon