pve-manager/2.3/7946f1f1 PVETest two node cluster fail

ebiss

Renowned Member
May 2, 2012
181
0
81
root@proxmox20151:~# uname -a
Linux proxmox20151 2.6.32-19-pve #1 SMP Tue Mar 12 10:32:32 CET 2013 x86_64 GNU/Linux
root@proxmox20151:~# pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-91
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-19-pve: 2.6.32-91
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-18
pve-firmware: 1.0-21
libpve-common-perl: 1.0-48
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-8
ksm-control-daemon: 1.1-1

proxmoxcluster2.png


/etc/init.d/pve-cluster restart
show syslog as :
Mar 13 10:44:11 proxmox20151 pmxcfs[2174]: [main] notice: teardown filesystem
Mar 13 10:44:12 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Mar 13 10:44:12 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:12 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:12 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Mar 13 10:44:12 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:12 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:12 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:12 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:12 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1667]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pvedaemon[1665]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 10:44:13 proxmox20151 pmxcfs[2207]: [status] notice: update cluster info (cluster name pcluster, version = 2)
Mar 13 10:44:13 proxmox20151 pmxcfs[2207]: [dcdb] notice: members: 2/2207
Mar 13 10:44:13 proxmox20151 pmxcfs[2207]: [dcdb] notice: all data is up to date
Mar 13 10:44:13 proxmox20151 pmxcfs[2207]: [dcdb] notice: members: 2/2207
Mar 13 10:44:13 proxmox20151 pmxcfs[2207]: [dcdb] notice: all data is up to date
Mar 13 10:44:15 proxmox20151 pvedaemon[1666]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Mar 13 10:44:16 proxmox20151 pvestatd[1717]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
 
pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: pcluster
Cluster Id: 27700
Cluster Member: Yes
Cluster Generation: 668
Membership state: Cluster-Member
Nodes: 1
Expected votes: 2
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 1
Flags:
Ports Bound: 0
Node name: proxmox20151
Node ID: 2
Multicast addresses: 239.192.108.160
Node addresses: 172.18.20.151


/etc/init.d/cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]
 
First, if you restart pve-cluster, you also need to restart pvedaemon and pvestatd.

Seems you cluster communication is broken - please check if multicast is working.
 
omping.png
yes,Both nodes are running.
above is my omping output

pvecm status as below
pvecm.png
 
one more thing:
Before aptitude full-upgrade to pvetest newest kernel,everything is ok, all nodes is green.
After reboot use the new kernel, have problom as: via 150 webUI 151 is red, via 151 webUI 150 is red.
 
Can I rollback proxmox from "pvetest" to "pve" ?
just want to see it's a network problom or new kernel problom.
 
try:

# wget ftp://download.proxmox.com/debian/d.../pve-kernel-2.6.32-18-pve_2.6.32-88_amd64.deb
# dpkg -i pve-kernel-2.6.32-18-pve_2.6.32-88_amd64.deb

Then reboot and select this kernel manually.

- - - Updated - - -

What switch do you use (vendor/model)?

Use kernel 2.6.32-18 the cluster is ok now.

root@proxmox20150:~# uname -a
Linux proxmox20150 2.6.32-18-pve #1 SMP Mon Jan 21 12:09:05 CET 2013 x86_64 GNU/Linux
root@proxmox20150:~# pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: pcluster
Cluster Id: 27700
Cluster Member: Yes
Cluster Generation: 696
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: proxmox20150
Node ID: 1
Multicast addresses: 239.192.108.160
Node addresses: 172.18.20.150

switch vendor maybe cisco, tomorrow I will make sure

use 2.6.32-18 kernel when I "service pve-cluster restart"
syslog show:
Mar 13 23:13:50 proxmox20150 pmxcfs[2050]: [main] notice: teardown filesystem
Mar 13 23:13:51 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Mar 13 23:13:51 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:51 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:51 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:51 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:51 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:51 proxmox20150 pvedaemon[1766]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Mar 13 23:13:51 proxmox20150 pvedaemon[1766]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:51 proxmox20150 pvedaemon[1766]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:52 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:52 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:52 proxmox20150 pvedaemon[1764]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:52 proxmox20150 pvedaemon[1768]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Mar 13 23:13:52 proxmox20150 pvedaemon[1768]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:52 proxmox20150 pvedaemon[1768]: WARNING: ipcc_send_rec failed: Connection refused
Mar 13 23:13:52 proxmox20150 pmxcfs[2050]: [main] notice: exit proxmox configuration filesystem (0)
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [status] notice: update cluster info (cluster name pcluster, version = 2)
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [status] notice: node has quorum
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: members: 1/2202, 2/1370
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: starting data syncronisation
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: members: 1/2202, 2/1370
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: starting data syncronisation
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: received sync request (epoch 1/2202/00000001)
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: received sync request (epoch 1/2202/00000001)
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: received all states
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: leader is 1/2202
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: synced members: 1/2202, 2/1370
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: start sending inode updates
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: sent all (0) updates
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: all data is up to date
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: received all states
Mar 13 23:13:52 proxmox20150 pmxcfs[2202]: [dcdb] notice: all data is up to date
Mar 13 23:13:55 proxmox20150 pvestatd[1790]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
have WARNING,all node in cluster is green.


kernel 2.6.32-18 omping show
omping2.png
 
Last edited:
Can I rollback proxmox from "pvetest" to "pve" ?
just want to see it's a network problom or new kernel problom.

you should have older kernel in /var/cache/apt/archives.
just do a dpkg -i pve-kernel-xxx.deb


Last kernel have a change in multicast, to try to correct problem with cisco switches.
Before downgrade, just some questions:

What hardware switches do you use ?
Do you have upgraded all nodes to last kernel ? or only one ?
 
The two node is IBM HS22,
Switch is Cisco 3012
View attachment 1344


@spirit Yes, All node is full-upgraded.

ok, I'm rea ding the your cisco dioc,
http://www.cisco.com/en/US/docs/swi...ex2/configuration/guide/swigmp.html#wp1193337

igmp snooping is enable by default (this is multicast filtering), but igmp querier is disable by default (this is required to have multicast filtering working).

so,
1) can you try to disable filtering:
#conf t
#no ip igmp snooping

multicast should work again

2) reenable filtering, but activate igmp querier

#conf t
#ip igmp snooping
#ip igmp snooping querier
#ip igmp snooping querier query-interval 10

you can see multticast group filtering with

#sh ip igmp snooping groups


I think it should work

- - - Updated - - -

The two node is IBM HS22,
Switch is Cisco 3012
View attachment 1344


@spirit Yes, All node is full-upgraded.

ok, I'm rea ding the your cisco dioc,
http://www.cisco.com/en/US/docs/swi...ex2/configuration/guide/swigmp.html#wp1193337

igmp snooping is enable by default (this is multicast filtering), but igmp querier is disable by default (this is required to have multicast filtering working).

so,
1) can you try to disable filtering:
#conf t
#no ip igmp snooping

multicast should work again

2) reenable filtering, but activate igmp querier

#conf t
#ip igmp snooping
#ip igmp snooping querier
#ip igmp snooping querier query-interval 10

you can see multticast group filtering with

#sh ip igmp snooping groups


I think it should work
 
Thanks spirit ,but switch is out of my control.
Why new kernel change this ?
 
The two node is IBM HS22,
Switch is Cisco 3012
View attachment 1344


@spirit Yes, All node is full-upgraded.

ok, I'm reading the your cisco switch doc,
http://www.cisco.com/en/US/docs/swi...ex2/configuration/guide/swigmp.html#wp1193337


igmp snooping is enable by default (this is multicast filtering), but igmp querier is disable by default (this is required to have multicast filtering working).


so,
1) can you try to disable filtering:
#conf t
#no ip igmp snooping


multicast should work again


2) reenable filtering, but activate igmp querier


#conf t
#ip igmp snooping
#ip igmp snooping querier
#ip igmp snooping querier query-interval 10


you can see multticast group filtering with


#sh ip igmp snooping groups




I think it should work

- - - Updated - - -

ok, I'm rea ding the your cisco dioc,
http://www.cisco.com/en/US/docs/swit...html#wp1193337

igmp snooping is enable by default (this is multicast filtering), but igmp querier is disable by default (this is required to have multicast filtering working).

so,
1) can you try to disable filtering:
#conf t
#no ip igmp snooping

multicast should work again

2) reenable filtering, but activate igmp querier

#conf t
#ip igmp snooping
#ip igmp snooping querier
#ip igmp snooping querier query-interval 10

you can see multticast group filtering with

#sh ip igmp snooping groups


I think it should work
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!