Proxmox 3.4 corosync memory usage

RaidoR

New Member
Aug 14, 2015
20
0
1
Hello,

I have a problem with promox cluster.
Ocasionally corosync memory usage gets so high that VM-s are no longer accesible. Corosync mem usage around 22%

Code:
Proxmox2 top:

top - 08:30:00 up 39 days, 16:00, 1 user, load average: 0.18, 0.15, 0.10
Tasks: 316 total, 1 running, 315 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.9 us, 0.6 sy, 0.0 ni, 96.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 24670084 total, 9298576 used, 15371508 free, 38368 buffers
KiB Swap: 0 total, 0 used, 0 free, 181284 cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4865 root 20 0 1488m 1.0g 1552 S 18.2 4.4 1141:25 kvm
4727 root 20 0 1486m 1.0g 1644 S 2.0 4.3 1183:13 kvm
4497 root 20 0 203m 38m 3724 S 0.7 0.2 322:56.59 pvestatd
825502 www-data 20 0 286m 61m 4200 S 0.7 0.3 0:01.98 pveproxy worker
3141 root rt 0 98.1m 4312 2956 S 0.3 0.0 5:52.00 multipathd
4114 root 20 0 135m 4272 1468 S 0.3 0.0 12:54.46 dsm_sa_snmpd
4750 root 20 0 0 0 0 S 0.3 0.0 41:19.60 vhost-4727
4775 root 20 0 1475m 546m 1560 S 0.3 2.3 185:39.77 kvm
645583 root 0 -20 5512m 5.3g 42m S 0.3 22.4 40:02.47 corosync

It helps to restart Cman and pve-cluster services:
# service cman stop

# service cman start

# service pve-cluster restart.

It has happened at least three times during last 2 months. I have 4 node cluster setup.

I ran omping test and it shows some packet loss between node1 and node4, is this something to worry about?

Code:
# omping -c 600 -i 1 -q proxmox1 proxmox2 proxmox3 proxmox4
proxmox2 : waiting for response msg
proxmox3 : waiting for response msg
proxmox4 : waiting for response msg
proxmox2 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox3 : waiting for response msg
proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox2 : given amount of query messages was sent
proxmox4 : given amount of query messages was sent
proxmox3 : given amount of query messages was sent

proxmox2 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.069/0.192/0.514/0.047
proxmox2 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.080/0.205/0.536/0.048
proxmox3 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.083/0.171/0.422/0.038
proxmox3 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.081/0.176/0.432/0.039
proxmox4 :   unicast, xmt/rcv/%loss = 600/580/3%, min/avg/max/std-dev = 0.092/0.134/0.348/0.028
proxmox4 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.102/0.148/0.363/0.027

# omping -c 600 -i 1 -q proxmox1 proxmox2 proxmox3 proxmox4
proxmox1 : waiting for response msg
proxmox2 : waiting for response msg
proxmox3 : waiting for response msg
proxmox1 : waiting for response msg
proxmox2 : waiting for response msg
proxmox3 : waiting for response msg
proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox2 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox1 : given amount of query messages was sent
proxmox2 : given amount of query messages was sent
proxmox3 : given amount of query messages was sent

proxmox1 :   unicast, xmt/rcv/%loss = 600/580/3%, min/avg/max/std-dev = 0.101/0.163/0.383/0.044
proxmox1 : multicast, xmt/rcv/%loss = 600/580/3%, min/avg/max/std-dev = 0.110/0.179/0.395/0.045
proxmox2 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.086/0.196/0.363/0.052
proxmox2 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.098/0.208/0.385/0.051
proxmox3 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.093/0.179/0.503/0.044
proxmox3 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.099/0.189/0.516/0.045

My node2 corosync log stopped logging before restarting services on all nodes.
Here is node2 log: https://pastebin.com/ZzMQNfrk