Problems with Clustering in 2.0.18

Tallaril

New Member
Feb 18, 2011
23
0
1
Hi there,

I'm running Proxmox in the above mentioned version. I've created a cluster and all looked good until i've restored a VM to the new added node.

All i've got was: Error Verbindungsfehler 500: Can't connect to 109.234.106.16:8006 (connect: Connection refused)

I've nothing changed. Is there any workaround for delete the node and add it again afterwards.

That's what i've got in the log of the node:

Feb 5 17:03:46 swarley corosync[1455]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] New Configuration:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] Members Left:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] Members Joined:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] New Configuration:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] Members Left:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] Members Joined:
Feb 5 17:03:46 swarley corosync[1455]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 5 17:03:46 swarley corosync[1455]: [CPG ] chosen downlist: sender r(0) ip(109.234.106.16) ; members(old:1 left:0)
Feb 5 17:03:46 swarley corosync[1455]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] New Configuration:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] Members Left:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] Members Joined:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] New Configuration:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.10)
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] Members Left:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] Members Joined:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.10)
Feb 5 17:03:47 swarley corosync[1455]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 5 17:03:47 swarley corosync[1455]: [CMAN ] quorum regained, resuming activity
Feb 5 17:03:47 swarley corosync[1455]: [QUORUM] This node is within the primary component and will provide service.
Feb 5 17:03:47 swarley corosync[1455]: [QUORUM] Members[2]: 1 2
Feb 5 17:03:47 swarley corosync[1455]: [QUORUM] Members[2]: 1 2
Feb 5 17:03:47 swarley corosync[1455]: [CPG ] chosen downlist: sender r(0) ip(109.234.106.10) ; members(old:1 left:0)
Feb 5 17:03:47 swarley corosync[1455]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 5 17:13:26 swarley corosync[1455]: [TOTEM ] Retransmit List: 5a1 5a2 5a3 5a4 5a5 5a6
Feb 6 00:49:57 swarley corosync[1455]: [TOTEM ] Retransmit List: e5d5 e5d6 e5d7 e5d8
Feb 6 00:50:06 swarley corosync[1455]: [TOTEM ] Retransmit List: e5dd e5de e5df e5e0 e5e1 e5e2 e5e3


And that's the log of the master .... :

Feb 5 17:02:11 bazinga kernel: DLM (built Dec 19 2011 10:07:10) installed
Feb 5 17:02:11 bazinga corosync[1589]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Feb 5 17:02:11 bazinga corosync[1589]: [MAIN ] Corosync built-in features: nss
Feb 5 17:02:11 bazinga corosync[1589]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Feb 5 17:02:11 bazinga corosync[1589]: [MAIN ] Successfully parsed cman config
Feb 5 17:02:11 bazinga corosync[1589]: [MAIN ] Successfully configured openais services to load
Feb 5 17:02:11 bazinga corosync[1589]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Feb 5 17:02:11 bazinga corosync[1589]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Feb 5 17:02:12 bazinga corosync[1589]: [TOTEM ] The network interface [109.234.106.10] is now up.
Feb 5 17:02:12 bazinga corosync[1589]: [QUORUM] Using quorum provider quorum_cman
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Feb 5 17:02:12 bazinga corosync[1589]: [CMAN ] CMAN 1324544458 (built Dec 22 2011 10:01:01) started
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: openais cluster membership service B.01.01
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: openais event service B.01.01
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: openais message service B.03.01
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: openais distributed locking service B.03.01
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: openais timer service A.01.01
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: corosync configuration service
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: corosync profile loading service
Feb 5 17:02:12 bazinga corosync[1589]: [QUORUM] Using quorum provider quorum_cman
Feb 5 17:02:12 bazinga corosync[1589]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Feb 5 17:02:12 bazinga corosync[1589]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] New Configuration:
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] Members Left:
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] Members Joined:
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] New Configuration:
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] #011r(0) ip(109.234.106.10)
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] Members Left:
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] Members Joined:
Feb 5 17:02:12 bazinga corosync[1589]: [CLM ] #011r(0) ip(109.234.106.10)
Feb 5 17:02:12 bazinga corosync[1589]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 5 17:02:12 bazinga corosync[1589]: [QUORUM] Members[1]: 1
Feb 5 17:02:12 bazinga corosync[1589]: [QUORUM] Members[1]: 1



Any ideas?
 
aren't there newer logs on the master (time 17:03:XX)?


An ongoing error in masters syslog is:

Feb 6 06:26:28 bazinga kernel: [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 128
Feb 6 06:26:28 bazinga kernel: [drm:drm_edid_block_valid] *ERROR* Raw EDID:
Feb 6 06:26:28 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
Feb 6 06:26:28 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
Feb 6 06:26:28 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
Feb 6 06:26:28 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
Feb 6 06:26:28 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
Feb 6 06:26:28 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
Feb 6 06:26:28 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
Feb 6 06:26:28 bazinga kernel: <3>ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................

(every 10 seconds)

i've found also:

Feb 6 06:26:34 bazinga pvestatd[1870]: WARNING: Use of uninitialized value in addition (+) at /usr/share/perl5/PVE/OpenVZ.pm line 213.
Feb 6 06:26:34 bazinga pvestatd[1870]: WARNING: Use of uninitialized value in multiplication (*) at /usr/share/perl5/PVE/OpenVZ.pm line 215.

at the time you've stated the syslog shows:

Feb 5 17:03:57 bazinga kernel: radeon 0000:11:04.0: DVI-I-1: EDID block 0 invalid.
Feb 5 17:03:57 bazinga kernel: [drm:radeon_dvi_detect] *ERROR* DVI-I-1: probed a monitor but no|invalid EDID
Feb 5 17:03:57 bazinga pvedaemon[2652]: starting CT 102: UPID:bazinga:00000A5C:00002E71:4F2EA86D:vzstart:102:root@pam:
Feb 5 17:03:57 bazinga pvedaemon[1849]: <root@pam> starting task UPID:bazinga:00000A5C:00002E71:4F2EA86D:vzstart:102:root@pam:
Feb 5 17:03:57 bazinga kernel: CT: 102: started
Feb 5 17:03:59 bazinga pvedaemon[1849]: <root@pam> end task UPID:bazinga:00000A5C:00002E71:4F2EA86D:vzstart:102:root@pam: OK
Feb 5 17:04:02 bazinga pvedaemon[3308]: start VM 100: UPID:bazinga:00000CEC:0000303E:4F2EA872:qmstart:100:root@pam:
Feb 5 17:04:02 bazinga pvedaemon[1848]: <root@pam> starting task UPID:bazinga:00000CEC:0000303E:4F2EA872:qmstart:100:root@pam:
Feb 5 17:04:03 bazinga kernel: device tap100i0 entered promiscuous mode
Feb 5 17:04:03 bazinga kernel: vmbr0: port 2(tap100i0) entering forwarding state
Feb 5 17:04:03 bazinga pvedaemon[1848]: <root@pam> end task UPID:bazinga:00000CEC:0000303E:4F2EA872:qmstart:100:root@pam: OK
Feb 5 17:04:06 bazinga pvedaemon[3688]: start VM 103: UPID:bazinga:00000E68:000031B7:4F2EA876:qmstart:103:root@pam:
Feb 5 17:04:06 bazinga pvedaemon[1851]: <root@pam> starting task UPID:bazinga:00000E68:000031B7:4F2EA876:qmstart:103:root@pam:
Feb 5 17:04:06 bazinga kernel: device tap103i0 entered promiscuous mode
Feb 5 17:04:06 bazinga kernel: vmbr0: port 3(tap103i0) entering forwarding state
Feb 5 17:04:06 bazinga pvedaemon[1851]: <root@pam> end task UPID:bazinga:00000E68:000031B7:4F2EA876:qmstart:103:root@pam: OK

following by the above mentioned error.

Maybe a kernel problem?
 
But isn't there a corosyn message at 'Feb 5 17:03:46' (are near that time)?


Last corosync message on master was this:

Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] New Configuration:
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] #011r(0) ip(109.234.106.10)
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] Members Left:
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] Members Joined:
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] New Configuration:
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] #011r(0) ip(109.234.106.10)
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] Members Left:
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] Members Joined:
Feb 5 17:03:47 bazinga corosync[1589]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:47 bazinga corosync[1589]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 5 17:03:47 bazinga corosync[1589]: [CMAN ] quorum regained, resuming activity
Feb 5 17:03:47 bazinga corosync[1589]: [QUORUM] This node is within the primary component and will provide service.
Feb 5 17:03:47 bazinga corosync[1589]: [QUORUM] Members[2]: 1 2
Feb 5 17:03:47 bazinga corosync[1589]: [QUORUM] Members[2]: 1 2
Feb 5 17:03:47 bazinga corosync[1589]: [CPG ] chosen downlist: sender r(0) ip(109.234.106.10) ; members(old:1 left:0)
Feb 5 17:03:47 bazinga corosync[1589]: [MAIN ] Completed service synchronization, ready to provide service.


and at node:

Feb 5 17:03:46 swarley corosync[1455]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] New Configuration:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] Members Left:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] Members Joined:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] New Configuration:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] Members Left:
Feb 5 17:03:46 swarley corosync[1455]: [CLM ] Members Joined:
Feb 5 17:03:46 swarley corosync[1455]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 5 17:03:46 swarley corosync[1455]: [CPG ] chosen downlist: sender r(0) ip(109.234.106.16) ; members(old:1 left:0)
Feb 5 17:03:46 swarley corosync[1455]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] New Configuration:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] Members Left:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] Members Joined:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] CLM CONFIGURATION CHANGE
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] New Configuration:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.10)
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.16)
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] Members Left:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] Members Joined:
Feb 5 17:03:47 swarley corosync[1455]: [CLM ] #011r(0) ip(109.234.106.10)
Feb 5 17:03:47 swarley corosync[1455]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 5 17:03:47 swarley corosync[1455]: [CMAN ] quorum regained, resuming activity
Feb 5 17:03:47 swarley corosync[1455]: [QUORUM] This node is within the primary component and will provide service.
Feb 5 17:03:47 swarley corosync[1455]: [QUORUM] Members[2]: 1 2
Feb 5 17:03:47 swarley corosync[1455]: [QUORUM] Members[2]: 1 2
Feb 5 17:03:47 swarley corosync[1455]: [CPG ] chosen downlist: sender r(0) ip(109.234.106.10) ; members(old:1 left:0)
Feb 5 17:03:47 swarley corosync[1455]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 5 17:13:26 swarley corosync[1455]: [TOTEM ] Retransmit List: 5a1 5a2 5a3 5a4 5a5 5a6
Feb 6 00:49:57 swarley corosync[1455]: [TOTEM ] Retransmit List: e5d5 e5d6 e5d7 e5d8
Feb 6 00:50:06 swarley corosync[1455]: [TOTEM ] Retransmit List: e5dd e5de e5df e5e0 e5e1 e5e2 e5e3
 
Looks like both nodes are online and working again?


looks like, but they are not. Master is working fine (except of the anoying error) but communication between master and slave through webinterface is not possible.

Any time i try to click on any of the nodes actions, it says:

Error Verbindungsfehler 500: Can't connect to 109.234.106.16:8006 (connect: Connection refused)

I'm fine with kicking the node and readd it ... do you have any recommendations how to do this?
 
For some reason https is stoped in the node where the vm is moved.

/etc/init.d/apache2 restart

and all is fine.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!