Nodes going red

Q-wulf · Nov 16, 2015

mir said:
Requires that you have a switch which can act as IGMP querier since this is not implemented in OpenVSwitch yet.

RobFantini said:
[...]
quick questions - what kind of network switch do you use for OVS , what settings are used on the ports OVS uses ? Like same vlan etc. I'll start another thread on OVS questions - if any later.

We use:
Zyxel XGS1910-24 -> for clients
ZyXEL XGS1910-48 -> for old servers with 1G Links
Netgear ProSAFE XS728T -> 2x per Storage-Tower
Mellanox MSX1024B-1BFS -> 2x per campus (since last week can't tell you much other then that they do their job.)

we use this:
https://pve.proxmox.com/wiki/Open_vSwitch#Example_2:_Bond_.2B_Bridge_.2B_Internal_Ports Example 2: Bond + Bridge + Internal Ports

we have:
Vlan100: tag=101-106 : 10.100.[1-6].y/24 Proxmox (currently 6 Clusters)
Vlan20: tag=20 : 10.20.y.y/16 ceph1 (30 Machines)
Vlan21: tag=21 : 10.21.y.y/16 ceph2 (20 Machines)
Vlan22: tag=22 : 10.22.y.y/16 ceph3 (5 Machines) Old Office Cluster (here we got our feet wet)
Vlan30: tag=30 : 10.30.y.y/16 kerberos/FreeIPA
Vlan41: tag=41 : 10.41.y.y/16 Monitoring1
Vlan42: tag=42 : 10.42.y.y/16 Monitoring2
Vlan43: tag=43 : 10.43.y.y/16 Monitoring3
Vlan70: tag=70 : 10.70.y.y/16 NTP
Vlan80: tag=80 : 10.80.y.y/16 NFS-Images-C1
Vlan81: tag=81 : 10.81.y.y/16 NFS-Images-C2
Vlan90: tag=90 : 10.90.y.y/16 NFS-Backups-C1
Vlan91: tag=91 : 10.91.y.y/16 NFS-Backups-C2
Vlan92: tag=92 : 10.92.y.y/16 NFS-Backups-C3
...
..
.
Vlan200: untagged : 10.200.y.y Clients

we use jumboframes (except for client vlans), balance-slb
and thats about all the magic.

Hope that helps.

RobFantini · Nov 16, 2015

that is an interesting set up, thank you for the detail!

regarding the NFS-Backups networks - I assume a separate nic is used at each client?

it is interesting that you've a separate NTP network. I assume that can be accessed by any nic [ that dedicated nic is not needed at each pve machine]. ? I'll give NTP vlan a high priority...

Q-wulf · Nov 16, 2015

We bond all our Nics (Network interface Card as in eth0, eth1, ...) into a single Bond, then attach a bridge to it. To that bridge we add Ovs_InternalPorts.

We have different "sets" of Servers - depending on how old they are and what software they run - they have different connectivity. As an example:

1G onboard + quad 1G via pcie = 5G connectivity
1x dual 1G onboard + dual 10G via Pcie = 22G connectivity
1x dual10G onboard + dual 10G via Pcie = 40G connectivity
1x dual 10G onboard + 2x Dual 40G via Pcie = 100G connectivity

The reason beeing, that if we e.g. have dual 10G nics and use one dedicated Nic for Corosync/Proxmox, we'd be wasting tons of potential bandwith for e.g. ceph / NFS / VMs / whatever. It also makes it a bit more redundant for us, since the decision which nic/port is used by a VLan is up to openvswitch/physical switch and changes regularly

For our Backups. things start to get a bit crazy:
We have 3 Ceph Clusters (each with Erasure Coded Pools).
Ceph-CLuster 1 Is backing 24 HA-VM's (open media Vaults connected via NFS)
If our use-case requiers it, we have the Proxmox-Vm's setup to make snapshots every X-hours on the corresponding NFS-OpenMediaVault. up to 24x a day. We keep 2 copies. (so maximum of 48 hours of hourly backups)

NFS-Servers are on 10.90.0.[1-24]
NFS-Proxmox-Clients are on 10.90.[1-6].[1-10]

Ceph-Cluster 2 is backing 7 HA-VM's (open media Vaults connected via NFS)
This backup is done for every VM. Creates a Backup once per Day (staggered between 23:00-05:00). We keep again 2 copies for up to 14 days.

NFS-Servers are on 10.91.0.[1-7]
NFS-Proxmox-Clients are on 10.91.[1-6].[1-10]

Ceph-Cluster 3 is backing 4 HA-VM's (open media Vaults connected via NFS).
This type of Backup is only done for really important stuff, like e.g. our internal Wiki (where most of our configs and instructions are housed), ur Mail-Store and our document-management Software. Full backups are done once per week on sundays. we keep 18 copies thats basically 18 Month worth (only to safeguard against documents missing during the yearly revision of the company.

NFS-Servers are on 10.92.0.[1-4]
NFS-Proxmox-Clients are on 10.92.[1-6].[1-10]

For NTP this works similar
We have currently 7 NTP servers locally. if i'd had to guess somewhere around 2k-4k potential NTP-users. Even some Security-Cameras query NTP

NTP-Servers are on 10.70.0.[1-7]
NTP-Proxmox-Clients are on 10.70.[1-6].[1-10]
NTP-VM-Clients are on 10.70.[150-200].x
NTP-Office-Clients are on 10.70.[240-253].x

Perhaps it comes together the other way around

Example C1-ProxmoxNode-1 OvsIntPorts:

Vlan1 Tag=101 IP=10.100.101.1
Vlan20 Tag=20 Ip=10.20.101.1
Vlan21 Tag=21 IP=10.21.101.1
Vlan22 Tag=22 IP=10.22.101.1
Vlan30 Tag=30 IP=10.30.101.1
Vlan41 Tag=41 IP=10.41.101.1
Vlan42 Tag=42 IP=10.42.101.1
Vlan43 Tag=43 IP=10.43.101.1
Vlan70 Tag=70 IP=10.70.101.1
Vlan80 Tag=80 IP=10.80.101.1
Vlan81 Tag=81 IP=10.81.101.1
Vlan90 Tag=90 IP=10.90.101.1
Vlan91 Tag=91 IP=10.91.101.1
Vlan92 Tag=92 IP=10.92.101.1
Vlan201 untagged IP=10.200.101.1 <-- thats where e.g. Admin Computer (e.g. 10.200.0.1) sit on.

If this were e.g. an Vm (lets say a Zimbra-MailServer) that had a to receive client traffic from e.g. an office computer then it would have an an Vlan like this:

Vlan300 untagged IP=192.168.130.1 <-- we have office-clients sitting on 192.168.0.0/16

I wanna point out that those IPs and Vlan Tags numbers are examples. we use different ones. Its completely reasonable i made typos while changing em.

Prioritising of our Vlans is done on the switches. e.g. FreeIPA > NTP > Proxmox > Ceph >NFS-Backups-C1 = NFS-backups-C2 = NFS-Backups-C3 ...

RobFantini · Nov 21, 2015

still have issues, looking for target to fix

Hello
In the last week I've done a few things to try to fix the issue

- set up a separate nfs storage network . a 10G switch and nics were added to all 3 nodes. Backups use this network.

- moved high cpu usage video vm off of cluster.

- rechecked settings and checked logs

- kept a log of when a node went red and wat was going on at the time.

I'm still getting nodes turning red on web page , and /etc/pve non writable .

The issue usually would happens during a backup. However it also occurs overnight for unknown reasons.

Right now I've got one node all green, and the other 2 showing only local node green.

I'm not sure where the issue is coming from and would like to eliminate some possibilities.

Starting with multicasting it looks OK to me, can someone confirm that this test confirms there is NO multicast issue?

node dell1 :

Code:

dell1  ~ # omping -c 600 -i 1 -q  sys3-corosync sys5-corosync dell1-corosync
sys3-corosync : waiting for response msg
sys5-corosync : waiting for response msg
sys3-corosync : joined (S,G) = (*, 232.43.211.234), pinging
sys5-corosync : joined (S,G) = (*, 232.43.211.234), pinging
sys3-corosync : given amount of query messages was sent
sys5-corosync : given amount of query messages was sent

sys3-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.097/18.667/3876.433/214.750
sys3-corosync : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.103/18.710/3876.453/214.929
sys5-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.111/0.195/0.800/0.050
sys5-corosync : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.114/0.204/0.804/0.052

sys3:

Code:

sys3  ~ # omping -c 600 -i 1 -q  sys3-corosync sys5-corosync dell1-corosync
sys5-corosync  : waiting for response msg
dell1-corosync : waiting for response msg
sys5-corosync  : joined (S,G) = (*, 232.43.211.234), pinging
dell1-corosync : waiting for response msg
dell1-corosync : joined (S,G) = (*, 232.43.211.234), pinging
dell1-corosync : waiting for response msg
dell1-corosync : server told us to stop
sys5-corosync  : waiting for response msg
sys5-corosync  : server told us to stop

sys5-corosync  :   unicast, xmt/rcv/%loss = 598/598/0%, min/avg/max/std-dev = 0.086/0.168/0.291/0.032
sys5-corosync  : multicast, xmt/rcv/%loss = 598/597/0% (seq>=2 0%), min/avg/max/std-dev = 0.091/0.201/0.338/0.039
dell1-corosync :   unicast, xmt/rcv/%loss = 596/596/0%, min/avg/max/std-dev = 0.105/0.201/0.327/0.031
dell1-corosync : multicast, xmt/rcv/%loss = 596/596/0%, min/avg/max/std-dev = 0.113/0.213/0.339/0.033

sys5:

Code:

sys5  ~ # omping -c 600 -i 1 -q  sys3-corosync sys5-corosync dell1-corosync
sys3-corosync  : waiting for response msg
dell1-corosync : waiting for response msg
sys3-corosync  : waiting for response msg
dell1-corosync : waiting for response msg
sys3-corosync  : joined (S,G) = (*, 232.43.211.234), pinging
dell1-corosync : waiting for response msg
dell1-corosync : joined (S,G) = (*, 232.43.211.234), pinging
sys3-corosync  : given amount of query messages was sent
dell1-corosync : given amount of query messages was sent

sys3-corosync  :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.060/19.146/3776.038/210.777
sys3-corosync  : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.075/19.163/3776.043/210.776
dell1-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.089/0.216/0.658/0.047
dell1-corosync : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.112/0.229/0.665/0.046

So is multicast OK?

PS: I've got a red issue right now, sys3 shows all green, sys5 and dell1 show only localhost green .

RobFantini · Nov 21, 2015

today:
All nodes were green. then
08:00 migrating kvm's from sys3 to other nodes
09:00 dell1 and sys5 nodes show other nodes red , sys3 all green

The strange thing is that one node has /etc/pve writable :
check if /etc/pve is writable : sys3: Y , sys5: N dell1: N

I've done all the tests shown at this wiki page:
https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues

and multicast looks like it is working perfectly.

Should I just reinstall the cluster?

RobFantini · Nov 21, 2015

more info:

I've got backports enabled, and I installed some other utilities. Those could be interfering somehow.

Making a new cluster and keeping it standard - not adding other packages besides software referenced on wiki - like for testing multicast etc - is what I'll probably do next.

Q-wulf · Nov 21, 2015

I doubt a cluster reinstall will fix your issue (especially if you set it up the same way)

To me it still sounds like multicast is instable at specific times on your network.
It also seems to happen every-time you push large amounts of data over your network. (vzdump / VM-migration)

I'd look into / verify how you ...

prioritise your traffic
if your multicasts are stable when doing large data-pushs
Limit vzdump config globally https://pve.proxmox.com/wiki/VZDump
separated your corosync (we do not do it, no issues at all)

Edit: look at the omping results for "Sys-3" ... that looks unstable to me .. how far away is that machine from the other cluster machines ? (there is a 20 ms avarage latency present [max is around 3.7 Seconds]) especially if you compare it with the results for the other nodes.

see if that happens on normal ping latencies aswell.

from "sys-5" run in 2 separate terminal windows the following commands:
ping -t -i 0.1 IPsys3
ping -t -i 0.1 IPdell

compare the latencies. my guess is that even on normal pings "sys3" has a considerable higher latency then "dell".

RobFantini · Nov 21, 2015

Q-wulf said:
I doubt a cluster reinstall will fix your issue (especially if you set it up the same way)

To me it still sounds like multicast is instable at specific times on your network.
It also seems to happen every-time you push large amounts of data over your network. (vzdump / VM-migration)

I'd look into / verify how you ...

prioritise your traffic

if your multicasts are stable when doing large data-pushs

Limit vzdump config globally https://pve.proxmox.com/wiki/VZDump

separated your corosync (we do not do it, no issues at all)

Traffic prioritization is set up. corosync goes to ports on switch that have highest priority. No other ports are set that high.

I could limit vzdump bandwidth, however have already set up a dedicated 10G nic to a 10G switch on each node.

Migration - I saw a thread regarding setting nic for migration... I could do that. However I do not migrate often so will make changes that will probably get over written next pve deb upgrade.

Corosync - I followed this a few weeks ago . since then from cli pvecm nodes always show a working cluster:
https://pve.proxmox.com/wiki/Separate_Cluster_Network .

I've probably done something wrong with the network setup. Or the switch is not sufficient for out needs. I'll try to get some help with the switch . it is a Cisco SG-300 .

Any other suggestions?

Q-wulf · Nov 21, 2015

I assume you did not see my edit in my previous post. Here it is as a standalone post:

look at the omping results for "Sys-3" ... that looks unstable to me .. how far away is that machine from the other cluster machines ? (there is a 20 ms avarage latency present [max is around 3.7 Seconds]) especially if you compare it with the results for the other nodes.

see if that happens on normal ping latencies aswell.

from "sys-5" run in 2 separate terminal windows the following commands (at the same time):
ping -t -i 0.1 IPsys3
ping -t -i 0.1 IPdell

compare the latencies. my guess is that even on normal pings "sys3" has a considerable higher latency then "dell".

RobFantini · Nov 21, 2015

Q-wulf said:
I assume you did not see my edit in my previous post. Here it is as a standalone post:

Thanks for the repost...

Code:

ping -t -i 0.1 10.2.8.42
connect: Invalid argument

I'm checking man ping but can't yet figure out .. can you correct what I'm trying to do?

Q-wulf · Nov 21, 2015

thats for windows. sry. typed it from my head.
-t means "keep pinging" . everywhere else you do to have to specify it.

ping -i 0.1 10.2.8.42
will suffice.

-i you basically ping every 0.1 seconds. (you can decrease it to put more strain on it)

make sure you run both tests concurrently. there should be a clear and steady latency increase on sys3

RobFantini · Nov 21, 2015

sys3 is in same rack [ our only rack ] .

ping tests: these ran at the same time both from sys5.

Hosts:

Code:

10.2.8.42  sys3-corosync.fantinibakery.com  sys3-corosync
10.2.8.41  sys4-corosync.fantinibakery.com  sys4-corosync
10.2.8.19  sys5-corosync.fantinibakery.com  sys5-corosync
10.2.8.181 dell1-corosync.fantinibakery.com dell1-corosync

Code:

ping -c 1000 -i 0.1 10.2.8.181
64 bytes from 10.2.8.181: icmp_seq=1 ttl=64 time=0.239 ms
64 bytes from 10.2.8.181: icmp_seq=2 ttl=64 time=0.224 ms
..
64 bytes from 10.2.8.181: icmp_seq=100 ttl=64 time=0.207 ms
..
64 bytes from 10.2.8.181: icmp_seq=200 ttl=64 time=0.185 ms
..
64 bytes from 10.2.8.181: icmp_seq=300 ttl=64 time=0.164 ms
..
64 bytes from 10.2.8.181: icmp_seq=400 ttl=64 time=0.222 ms

64 bytes from 10.2.8.181: icmp_seq=500 ttl=64 time=0.228 ms

64 bytes from 10.2.8.181: icmp_seq=600 ttl=64 time=0.150 ms

64 bytes from 10.2.8.181: icmp_seq=700 ttl=64 time=0.263 ms

64 bytes from 10.2.8.181: icmp_seq=800 ttl=64 time=0.153 ms

64 bytes from 10.2.8.181: icmp_seq=900 ttl=64 time=0.161 ms

64 bytes from 10.2.8.181: icmp_seq=999 ttl=64 time=0.140 ms
64 bytes from 10.2.8.181: icmp_seq=1000 ttl=64 time=0.184 ms

--- 10.2.8.181 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 99897ms
rtt min/avg/max/mdev = 0.079/0.192/0.473/0.036 ms

and

Code:

ping -c 1000 -i 0.1 10.2.8.42
..
64 bytes from 10.2.8.42: icmp_seq=15 ttl=64 time=0.280 ms
64 bytes from 10.2.8.42: icmp_seq=16 ttl=64 time=0.133 ms
..
64 bytes from 10.2.8.42: icmp_seq=396 ttl=64 time=0.155 ms
64 bytes from 10.2.8.42: icmp_seq=397 ttl=64 time=0.121 ms
64 bytes from 10.2.8.42: icmp_seq=398 ttl=64 time=0.114 ms
..
64 bytes from 10.2.8.42: icmp_seq=448 ttl=64 time=0.171 ms
64 bytes from 10.2.8.42: icmp_seq=449 ttl=64 time=0.157 ms
..
64 bytes from 10.2.8.42: icmp_seq=529 ttl=64 time=0.166 ms
64 bytes from 10.2.8.42: icmp_seq=530 ttl=64 time=0.172 ms
..
64 bytes from 10.2.8.42: icmp_seq=613 ttl=64 time=0.165 ms
..
64 bytes from 10.2.8.42: icmp_seq=700 ttl=64 time=0.133 ms
..
64 bytes from 10.2.8.42: icmp_seq=800 ttl=64 time=0.139 ms
..
64 bytes from 10.2.8.42: icmp_seq=900 ttl=64 time=0.199 ms

..
64 bytes from 10.2.8.42: icmp_seq=999 ttl=64 time=0.150 ms
64 bytes from 10.2.8.42: icmp_seq=1000 ttl=64 time=0.170 ms

--- 10.2.8.42 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 99896ms
rtt min/avg/max/mdev = 0.073/0.158/0.304/0.034 ms

if you've another test please post it.

Q-wulf · Nov 21, 2015

okay, so normal pings look fine. Even tho your ompings you provided for node "sys3" look weird.

just to verify
Can you run the omping's again?

RobFantini · Nov 21, 2015

OK running ompings ..

Also - I was checking some log emails, and two KVM's on sys3 had disk issues. So maybe sys3 has hardware issue. here is a report from pone of the KVM's:

Code:

nodejs /var/log/syslog
Nov 21 04:34:41 nodejs kernel: [233074.401566]  [<ffffffffa00c7dcc>] ? ata_sff_error_handler+0x7c/0xe0 [libata]
Nov 21 04:34:41 nodejs kernel: [233074.401574]  [<ffffffffa00c5908>] ? ata_scsi_port_error_handler+0x518/0x8f0
[libata]
Nov 21 04:34:41 nodejs kernel: [233074.401578]  [<ffffffffa00c5d6c>] ? ata_scsi_error+0x8c/0xc0 [libata]
Nov 21 04:34:41 nodejs kernel: [233074.401591]  [<ffffffffa00553ab>] ? scsi_error_handler+0x11b/0x870 [scsi_mod]

entire log next:
..
Nov 21 04:25:01 nodejs CRON[29068]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 21 04:27:53 nodejs kernel: [232665.591977] BUG: soft lockup - CPU#0 stuck for 27s! [kworker/0:1:29032]
Nov 21 04:27:53 nodejs kernel: [232665.592221] Modules linked in: ppdev joydev ttm drm_kms_helper drm i2c_piix4 i2c_core psmouse evdev parport_pc parport
serio_raw pcspkr shpchp processor virtio_balloon thermal_sys button sch_fq_codel fuse autofs4 hid_generic usbhid hid ext4 crc16 mbcache jbd2 dm_mod sg
sr_mod cdrom ata_generic virtio_blk virtio_net floppy uhci_hcd ata_piix ehci_hcd libata virtio_pci virtio_ring virtio scsi_mod usbcore usb_common
Nov 21 04:27:53 nodejs kernel: [232665.592368] CPU: 0 PID: 29032 Comm: kworker/0:1 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u5
Nov 21 04:27:53 nodejs kernel: [232665.592370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org
04/01/2014
Nov 21 04:27:53 nodejs kernel: [232665.592406] Workqueue: events_freezable_power_ disk_events_workfn
Nov 21 04:27:53 nodejs kernel: [232665.592417] task: ffff88003a5874b0 ti: ffff88003da90000 task.ti: ffff88003da90000
Nov 21 04:27:53 nodejs kernel: [232665.592421] RIP: 0010:[<ffffffff81510e2e>]  [<ffffffff81510e2e>] _raw_spin_unlock_irqrestore+0xe/0x20
Nov 21 04:27:53 nodejs kernel: [232665.592438] RSP: 0018:ffff88003da93b20  EFLAGS: 00000296
Nov 21 04:27:53 nodejs kernel: [232665.592440] RAX: 0000000000000000 RBX: ffff880037168158 RCX: ffff8800371680c0
Nov 21 04:27:53 nodejs kernel: [232665.592441] RDX: 0000000000000001 RSI: 0000000000000296 RDI: 0000000000000296
Nov 21 04:27:53 nodejs kernel: [232665.592442] RBP: ffff880037168158 R08: 000000100000014a R09: 0008000000100000
Nov 21 04:27:53 nodejs kernel: [232665.592443] R10: ffff880037168198 R11: 0000000000000008 R12: ffff880037168158
Nov 21 04:27:53 nodejs kernel: [232665.592444] R13: ffffffffa00bb6f0 R14: ffff880037168178 R15: ffff880037169e70
Nov 21 04:27:53 nodejs kernel: [232665.592446] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
Nov 21 04:27:53 nodejs kernel: [232665.592447] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 21 04:27:53 nodejs kernel: [232665.592448] CR2: 000036b3dcc20000 CR3: 0000000036e71000 CR4: 00000000000006f0
Nov 21 04:27:53 nodejs kernel: [232665.592455] Stack:
Nov 21 04:27:53 nodejs kernel: [232665.592456]  ffffffffa00bf2ca ffff880037168000 ffff880037149080 ffff88003a7d3000
Nov 21 04:27:53 nodejs kernel: [232665.592458]  ffff88003a7d3000 ffff880037149080 ffff8800370ce048 ffff880000154010
Nov 21 04:27:53 nodejs kernel: [232665.592459]  ffffffffa004f394 ffff8800370ce000 ffff88003a7d3000 ffff880037106008
Nov 21 04:27:53 nodejs kernel: [232665.592461] Call Trace:
Nov 21 04:27:53 nodejs kernel: [232665.592487]  [<ffffffffa00bf2ca>] ? ata_scsi_queuecmd+0x13a/0x410 [libata]
Nov 21 04:27:53 nodejs kernel: [232665.592498]  [<ffffffffa004f394>] ? scsi_dispatch_cmd+0xb4/0x270 [scsi_mod]
Nov 21 04:27:53 nodejs kernel: [232665.592503]  [<ffffffffa0057b7d>] ? scsi_request_fn+0x2fd/0x500 [scsi_mod]
Nov 21 04:27:53 nodejs kernel: [232665.592508]  [<ffffffff8127cb1f>] ? __blk_run_queue+0x2f/0x40

..

there is a lot more to the log.

at the same time another KVM on sys3 had a msg about root file system being read only. had to reboot it.

So maybe sys3 has bad hardware.
zpool status shows no issue...

I'll post omping when it finishes.

RobFantini · Nov 21, 2015

omping Sat Nov 21 11:26:37 EST 2015

*dell1

Code:

dell1  /etc # omping -c 600 -i 1 -q  sys3-corosync sys5-corosync dell1-corosync
sys3-corosync : waiting for response msg
sys5-corosync : waiting for response msg
sys3-corosync : joined (S,G) = (*, 232.43.211.234), pinging
sys5-corosync : joined (S,G) = (*, 232.43.211.234), pinging
sys3-corosync : given amount of query messages was sent
sys5-corosync : given amount of query messages was sent                                                                  
                                                                                                                         
sys3-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.091/0.205/0.320/0.044                     
sys3-corosync : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.095/0.216/0.324/0.046         
sys5-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.121/0.212/0.319/0.037
sys5-corosync : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.124/0.221/0.336/0.036

*sys3:

Code:

sys3  ~ # omping -c 600 -i 1 -q  sys3-corosync sys5-corosync dell1-corosync
sys5-corosync  : waiting for response msg
dell1-corosync : waiting for response msg
sys5-corosync  : waiting for response msg
dell1-corosync : waiting for response msg
sys5-corosync  : joined (S,G) = (*, 232.43.211.234), pinging
dell1-corosync : joined (S,G) = (*, 232.43.211.234), pinging
sys5-corosync  : given amount of query messages was sent
dell1-corosync : given amount of query messages was sent                                                                 
                                                                                                                         
sys5-corosync  :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.099/0.195/0.344/0.039                    
sys5-corosync  : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.104/0.213/0.364/0.041                    
dell1-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.080/0.230/0.354/0.050
dell1-corosync : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.101/0.244/0.365/0.048

*sys5:

Code:

sys5  ~ # omping -c 600 -i 1 -q  sys3-corosync sys5-corosync dell1-corosync
sys3-corosync  : waiting for response msg
dell1-corosync : waiting for response msg
sys3-corosync  : joined (S,G) = (*, 232.43.211.234), pinging
dell1-corosync : waiting for response msg
dell1-corosync : joined (S,G) = (*, 232.43.211.234), pinging
sys3-corosync  : given amount of query messages was sent
dell1-corosync : given amount of query messages was sent                                                                 
                                                                                                                         
sys3-corosync  :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.098/0.202/0.318/0.042                    
sys3-corosync  : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.113/0.223/0.336/0.040        
dell1-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.137/0.251/0.369/0.037
dell1-corosync : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.145/0.262/0.377/0.036

Q-wulf · Nov 21, 2015

The issue is not present here anymore (high latency).
I am assuming your cluster has been restarted and does not have the issue (red nodes) currently.
If so, next time the issue arises you should omping again and see if it shows high latencies again.
If it does. run the ping command from above again.

to make sure we are on the same page - i am referring to these:

node dell1 :

Code:

dell1  ~ # omping -c 600 -i 1 -q  sys3-corosync sys5-corosync dell1-corosyncs
[...]
sys3-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.097/[B][COLOR=#ff0000]18.667/3876.433[/COLOR][/B]/214.750
sys3-corosync : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.103/[B][COLOR=#ff0000]18.710/3876.453/[/COLOR][/B]214.929
sys5-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.111/[B][COLOR=#008000]0.195/0.800[/COLOR][/B]/0.050
sys5-corosync : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.114/[B][COLOR=#008000]0.204/0.804[/COLOR][/B]/0.052

sys3:

Code:

sys3  ~ # omping -c 600 -i 1 -q  sys3-corosync sys5-corosync dell1-corosync
[...]
sys5-corosync  :   unicast, xmt/rcv/%loss = 598/598/0%, min/avg/max/std-dev = 0.086/[B][COLOR=#008000]0.168/0.291[/COLOR][/B]/0.032
sys5-corosync  : multicast, xmt/rcv/%loss = 598/597/0% (seq>=2 0%), min/avg/max/std-dev = 0.091/[B][COLOR=#008000]0.201/0.338[/COLOR][/B]/0.039
dell1-corosync :   unicast, xmt/rcv/%loss = 596/596/0%, min/avg/max/std-dev = 0.105/[B][COLOR=#008000]0.201/0.327[/COLOR][/B]/0.031
dell1-corosync : multicast, xmt/rcv/%loss = 596/596/0%, min/avg/max/std-dev = 0.113/[B][COLOR=#008000]0.213/0.339[/COLOR][/B]/0.033

sys5:

Code:

[...]
sys3-corosync  :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.060/[B][COLOR=#ff0000]19.146/3776.038[/COLOR][/B]/210.777
sys3-corosync  : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.075/[B][COLOR=#ff0000]19.163/3776.043[/COLOR][/B]/210.776
dell1-corosync :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.089/[B][COLOR=#008000]0.216/0.658[/COLOR][/B]/0.047
dell1-corosync : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.112/[B][COLOR=#008000]0.229/0.665[/COLOR][/B]/0.046

RobFantini · Nov 21, 2015

Q-wulf said:
The issue is not present here anymore (high latency).
I am assuming your cluster has been restarted and does not have the issue (red nodes) currently.
If so, next time the issue arises you should omping again and see if it shows high latencies again.
If it does. run the ping command from above again.

to make sure we are on the same page - i am referring to these:

Yes cluster was restarted and all nodes were green at web page.

So I'll reun the ping and omping tests when we have the red issue.

Thank you for the help!

Q-wulf · Nov 21, 2015

RobFantini said:
Yes cluster was restarted and all nodes were green at web page.

So I'll re-run the ping and omping tests when we have the red issue.

Thank you for the help!

My guess is high latency of multicast on Node sys3 is a symptom when ever your issue arises. Note sure what the problem is tho, what seems clear is that it happening on sys3 (as its the common denominator)

Possible causes from ontop of my head

Hardware defect:

HDD's Check the smart values of the disks on sys3 (smartmontools your your monitoring solution) --> Seems the likely source, as you previously indicate dthe issues are arising after backups / migration). Maybe even run a long test.
Ram : run a memtest on node Sys3 (when you restart a node there is a boot option to do so)
Network_hardware defect i'd put way down on the bottom of the list (since you installed new Network hardware).

Software related issues:

Check if there is high system load on sys3 when your nodes go red. If there is, see if you can narrow it to a CT/VM. Maybe an issue there is locking up your node sys3
Your corosync / network setup on sys3 is "wrong"
your ZFS has issues (i do not use ZFS in any meaningful way, so i cant help you there)
check your shared storage (or where ever you "backup to" / migrate from) if you actually use it.

thats where i'd go in the meantime.

RobFantini · Nov 21, 2015

I think it is sys3 hardware, please check #34 above.

we've some extra systems, I may just add a node and delete sys3 tomorrow.

Q-wulf · Nov 21, 2015

I know. hence my comments (check #38).

ps.: i do not think you see any edits i made past the initial post.

Nodes going red

Renowned Member

Famous Member

Renowned Member

Famous Member

Famous Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Famous Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Famous Member

Renowned Member

We value your privacy