cman-running-will-not-stop-or-restart

C

Chris Rivera

Guest
I started a post but cannot get back to it.... permanent internal server 500 error for the last 2 days

http://forum.proxmox.com/threads/13015-cman-running-will-not-stop-or-restart

int500.gif

#####

All nodes show as offline... After much review we found that the switch detected a multicast storm that activated filtering which effectively stopped all multicast from working. This has been cleared. I ran multicast tests and see communication.

All nodes still show offline.


#####

root@proxmox4:~# clustat
Cluster Status for FL-Cluster @ Mon Mar 4 09:57:35 2013
Member Status: Inquorate


Member Name ID Status
------ ---- ---- ------
proxmox11 1 Offline
proxmox2 2 Offline
proxmox3a 3 Offline
proxmox4 4 Online, Local
poxmox5 5 Offline
proxmox6 6 Offline
proxmox7 7 Offline
proxmox8 8 Offline
proxmox9 9 Offline
Proxmox10 10 Offline

#####

root@proxmox4:~# pvecm nodes
Node Sts Inc Joined Name
1 X 1076912 proxmox11
2 X 1076456 proxmox2
3 X 1076904 proxmox3a
4 M 971024 2013-02-25 13:46:12 proxmox4
5 X 1076456 poxmox5
6 X 0 proxmox6
7 X 1076904 proxmox7
8 X 1076904 proxmox8
9 X 1076912 proxmox9
10 X 1076912 Proxmox10


######


root@proxmox4:~# pvecm status
Version: 6.2.0
Config Version: 49
Cluster Name: FL-Cluster
Cluster Id: 6836
Cluster Member: Yes
Cluster Generation: 1082596
Membership state: Cluster-Member
Nodes: 1
Expected votes: 10
Total votes: 1
Node votes: 1
Quorum: 6 Activity blocked
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: proxmox4
Node ID: 4
Multicast addresses: 239.192.26.206
Node addresses: *.*.*.*

######



cisco3750-1.**.**.com#show storm-control mu
Interface Filter State Upper Lower Current
--------- ------------- ----------- ----------- ----------
Gi1/0/1 Forwarding 5.00% 5.00% 0.00%
Gi1/0/2 Forwarding 5.00% 5.00% 0.00%
Gi1/0/3 Blocking 5.00% 5.00% 51.97%
Gi1/0/4 Link Down 5.00% 5.00% 0.00%
Gi1/0/5 Link Down 5.00% 5.00% 0.00%
Gi1/0/6 Forwarding 5.00% 5.00% 4.05%
Gi1/0/7 Blocking 5.00% 5.00% 5.18%
Gi1/0/8 Link Down 5.00% 5.00% 0.00%
Gi1/0/9 Forwarding 5.00% 5.00% 0.00%
Gi1/0/10 Link Down 5.00% 5.00% 0.00%
Gi1/0/11 Blocking 5.00% 5.00% 5.58%
Gi1/0/12 Forwarding 5.00% 5.00% 0.00%
Gi1/0/13 Forwarding 5.00% 5.00% 0.00%
Gi1/0/14 Forwarding 5.00% 5.00% 3.25%
Gi1/0/15 Forwarding 5.00% 5.00% 4.50%
Gi1/0/16 Forwarding 5.00% 5.00% 0.00%
Gi1/0/17 Blocking 5.00% 5.00% 6.85%
Gi1/0/18 Forwarding 5.00% 5.00% 0.00%
Gi1/0/19 Forwarding 5.00% 5.00% 0.00%
Gi1/0/20 Forwarding 5.00% 5.00% 0.00%
Gi1/0/21 Forwarding 5.00% 5.00% 0.00%
Gi1/0/22 Forwarding 5.00% 5.00% 0.00%
Gi1/0/23 Forwarding 5.00% 5.00% 0.00%
Gi1/0/24 Forwarding 5.00% 5.00% 0.00%
Gi1/0/25 Link Down 5.00% 5.00% 0.00%
Gi1/0/26 Forwarding 5.00% 5.00% 3.88%
Gi1/0/27 Forwarding 5.00% 5.00% 0.00%
Gi1/0/28 Forwarding 5.00% 5.00% 0.00%
Gi1/0/29 Forwarding 5.00% 5.00% 3.70%
Gi1/0/30 Link Down 5.00% 5.00% 0.00%
Gi1/0/31 Forwarding 70.00% 70.00% 0.00%
Gi1/0/48 Forwarding 70.00% 70.00% 0.00%
Gi2/0/7 Forwarding 5.00% 5.00% 0.00%

#####

1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/6. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
1w5d: %STORM_CONTROL-3-FILTERED: A Multicast storm detected on Gi1/0/3. A packet filter action has been applied on the interface. (cisco3750-1.*.*.com-1)
 
This cluster is not HA...

just running a normal cluster no additional hardware.
 
...

All nodes show as offline... After much review we found that the switch detected a multicast storm that activated filtering which effectively stopped all multicast from working. This has been cleared. I ran multicast tests and see communication.

All nodes still show offline.

...

the Proxmox VE cluster works with IP multicast. if your network switches blocks the traffic, the cluster cannot work.

test with omping, see https://pve.proxmox.com/wiki/Multicast_notes
 
Tom,

I have tested it and multicast works now. This was cleared as soon as we found the issue.

After much review we found that the switch detected a multicast storm that activated filtering which effectively stopped all multicast from working. This has been cleared. I ran multicast tests and see communication.

#####

root@proxmox4:~# asmping 224.0.2.1 *.*.*.*
asmping joined (S,G) = (*,224.0.2.234)
pinging *.*.*.* from 63.217.249.161
unicast from *.*.*.*, seq=1 dist=0 time=1.154 ms
multicast from *.*.*.*, seq=1 dist=0 time=1.183 ms
unicast from *.*.*.*, seq=2 dist=0 time=0.348 ms
multicast from *.*.*.*, seq=2 dist=0 time=0.359 ms
unicast from *.*.*.*, seq=3 dist=0 time=0.219 ms
multicast from *.*.*.*, seq=3 dist=0 time=0.260 ms
unicast from *.*.*.*, seq=4 dist=0 time=0.257 ms
multicast from *.*.*.*, seq=4 dist=0 time=0.267 ms
unicast from *.*.*.*, seq=5 dist=0 time=0.269 ms
multicast from *.*.*.*, seq=5 dist=0 time=0.294 ms
unicast from *.*.*.*, seq=6 dist=0 time=0.214 ms
multicast from *.*.*.*, seq=6 dist=0 time=0.227 ms
unicast from *.*.*.*, seq=7 dist=0 time=0.221 ms
multicast from *.*.*.*, seq=7 dist=0 time=0.234 ms
unicast from *.*.*.*, seq=8 dist=0 time=0.256 ms
multicast from *.*.*.*, seq=8 dist=0 time=0.283 ms
unicast from *.*.*.*, seq=9 dist=0 time=0.230 ms
multicast from *.*.*.*, seq=9 dist=0 time=0.244 ms
unicast from *.*.*.*, seq=10 dist=0 time=0.231 ms
multicast from *.*.*.*, seq=10 dist=0 time=0.253 ms
unicast from *.*.*.*, seq=11 dist=0 time=0.245 ms
multicast from *.*.*.*, seq=11 dist=0 time=0.264 ms
unicast from *.*.*.*, seq=12 dist=0 time=0.244 ms
multicast from *.*.*.*, seq=12 dist=0 time=0.260 ms
unicast from *.*.*.*, seq=13 dist=0 time=0.318 ms
multicast from *.*.*.*, seq=13 dist=0 time=0.329 ms
unicast from *.*.*.*, seq=14 dist=0 time=0.228 ms
multicast from *.*.*.*, seq=14 dist=0 time=0.251 ms
unicast from *.*.*.*, seq=15 dist=0 time=0.217 ms
multicast from *.*.*.*, seq=15 dist=0 time=0.236 ms
unicast from *.*.*.*, seq=16 dist=0 time=0.227 ms
multicast from *.*.*.*, seq=16 dist=0 time=0.234 ms
unicast from *.*.*.*, seq=17 dist=0 time=0.238 ms
multicast from *.*.*.*, seq=17 dist=0 time=0.254 ms
unicast from *.*.*.*, seq=18 dist=0 time=0.203 ms
multicast from *.*.*.*, seq=18 dist=0 time=0.224 ms



######

yet all nodes show offline... all have the same message in syslog

######

root@proxmox2:~# cat /var/log/syslog | tail
Mar 4 10:30:33 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 60
Mar 4 10:30:34 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 70
Mar 4 10:30:35 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 80
Mar 4 10:30:36 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 90
Mar 4 10:30:37 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 100
Mar 4 10:30:37 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retried 100 times
Mar 4 10:30:37 proxmox2 pmxcfs[1289]: [status] crit: cpg_send_message failed: 6
Mar 4 10:30:38 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 10
Mar 4 10:30:39 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 20
Mar 4 10:30:40 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 30


######

root@proxmox4:~# cat /var/log/syslog | tail
Mar 4 10:30:21 proxmox4 dlm_controld[755535]: daemon cpg_leave error retrying
Mar 4 10:30:21 proxmox4 pmxcfs[38472]: [status] notice: cpg_send_message retry 40
Mar 4 10:30:22 proxmox4 pmxcfs[38472]: [status] notice: cpg_send_message retry 50
Mar 4 10:30:23 proxmox4 pmxcfs[38472]: [status] notice: cpg_send_message retry 60
Mar 4 10:30:24 proxmox4 pmxcfs[38472]: [status] notice: cpg_send_message retry 70
Mar 4 10:30:25 proxmox4 pmxcfs[38472]: [status] notice: cpg_send_message retry 80
Mar 4 10:30:26 proxmox4 pmxcfs[38472]: [status] notice: cpg_send_message retry 90
Mar 4 10:30:27 proxmox4 pmxcfs[38472]: [status] notice: cpg_send_message retry 100
Mar 4 10:30:27 proxmox4 pmxcfs[38472]: [status] notice: cpg_send_message retried 100 times
Mar 4 10:30:27 proxmox4 pmxcfs[38472]: [status] crit: cpg_send_message failed: 6




....

######

root@proxmox4:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1



Are there any logs specifically for cman to help me better troubleshoot the issue?
 
test all nodes with omping, very useful tool.

restart the services (cman and pve-cluster), e.g.

> service cman stop/start or restart

you will find all logs in the syslog (/var/log/syslog)
 
serivices restarted:

-pvedaemon
-pvestatd
-pve-cluster
-cman (fails)

output of syslog on nodes:



Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:30 proxmox4 pmxcfs[604018]: [status] crit: cpg_send_message failed: 9
Mar 4 11:11:31 proxmox4 pmxcfs[604018]: [dcdb] notice: cpg_join retry 20190
Mar 4 11:11:32 proxmox4 pmxcfs[604018]: [dcdb] notice: cpg_join retry 20200
Mar 4 11:11:33 proxmox4 pmxcfs[604018]: [dcdb] notice: cpg_join retry 20210




Mar 4 11:11:53 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 20
Mar 4 11:11:54 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 30
Mar 4 11:11:55 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 40
Mar 4 11:11:56 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 50
Mar 4 11:11:57 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 60
Mar 4 11:11:58 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 70
Mar 4 11:11:59 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 80
Mar 4 11:12:00 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 90
Mar 4 11:12:01 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 100
Mar 4 11:12:01 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retried 100 times
Mar 4 11:12:01 proxmox2 pmxcfs[1289]: [status] crit: cpg_send_message failed: 6
Mar 4 11:12:01 proxmox2 pvestatd[2385]: status update time (330.538 seconds)
Mar 4 11:12:02 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 10
Mar 4 11:12:03 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 20
Mar 4 11:12:04 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 30
Mar 4 11:12:05 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 40
Mar 4 11:12:06 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 50
Mar 4 11:12:07 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 60
Mar 4 11:12:08 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 70
Mar 4 11:12:09 proxmox2 pmxcfs[1289]: [status] notice: cpg_send_message retry 80
 
how do i install omping on debian squeez?

i dont find a apt-get install.... and the website for omping shows centos / red hat distro.

thanks in advance
 
I tried that and it did not work either


root@proxmox4:~# aptitude install omping
Couldn't find any package whose name or description matched "omping"
Couldn't find any package whose name or description matched "omping"
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B of archives. After unpacking 0 B will be used.

root@proxmox4:~# apt-get install omping
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package omping


root@proxmox1a:/var/downloads# aptitude install omping
Couldn't find any package whose name or description matched "omping"
Couldn't find any package whose name or description matched "omping"
The following partially installed packages will be configured:
fence-agents-pve libpve-access-control proxmox-ve-2.6.32 pve-cluster
pve-manager qemu-server redhat-cluster-pve vzctl
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B of archives. After unpacking 0 B will be used.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!