Hi, I had a working cluster configuration with Proxmox 2.0. I had all four nodes running, online and with rgmanager active.
After upgrading a node to 2.1 the cluster stopped working. So I upgraded all the node, but the maximum I could get was having all nodes Online but rgmanager was never active and the HA configuration was not working anymore, and the VM could not be started.
I tried to solve the problem upgrading everything to 2.2 but now I have a even worse situation were 3 nodes are online, and the 4-th is as offline:
In that node CMAN is offline and log files are full of messages like:
I tried to restart pve* services with:
/etc/init.d/pvestatd restart
/etc/init.d/pve-cluster
but nothing change apart some more messages.
I'm going to reboot that node to see what happens but I'd like to know if there are directions about the steps to take to restart a failed node.
Regards
Simone
After upgrading a node to 2.1 the cluster stopped working. So I upgraded all the node, but the maximum I could get was having all nodes Online but rgmanager was never active and the HA configuration was not working anymore, and the VM could not be started.
I tried to solve the problem upgrading everything to 2.2 but now I have a even worse situation were 3 nodes are online, and the 4-th is as offline:
Code:
root@lama2:~# clustat
Cluster Status for SiwebCluster @ Tue Nov 20 16:02:43 2012
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
lama1 1 Online
lama2 2 Online, Local
lama9 3 Online
lama10 4 Offline
Code:
root@lama10:~# clustat
Could not connect to CMAN: Connection refused
In that node CMAN is offline and log files are full of messages like:
Code:
Nov 20 15:47:50 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
Nov 20 15:47:50 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
Nov 20 15:47:50 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
Nov 20 15:47:50 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
Nov 20 15:47:50 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
I tried to restart pve* services with:
/etc/init.d/pvestatd restart
/etc/init.d/pve-cluster
but nothing change apart some more messages.
Code:
Nov 20 15:48:08 lama10 pvestatd[3271]: server closing
Nov 20 15:48:08 lama10 pvestatd[26301]: starting server
Nov 20 15:48:18 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
Nov 20 15:48:18 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
...
Nov 20 15:48:18 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
Nov 20 15:48:18 lama10 pmxcfs[2529]: [status] crit: cpg_send_message failed: 9
Nov 20 15:48:19 lama10 pmxcfs[2529]: [main] notice: teardown filesystem
Nov 20 15:48:28 lama10 pvestatd[26301]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Nov 20 15:48:28 lama10 pvestatd[26301]: WARNING: ipcc_send_rec failed: Connection refused
Nov 20 15:48:28 lama10 pvestatd[26301]: WARNING: ipcc_send_rec failed: Connection refused
Nov 20 15:48:28 lama10 pvestatd[26301]: WARNING: ipcc_send_rec failed: Connection refused
Nov 20 15:48:28 lama10 pvestatd[26301]: WARNING: ipcc_send_rec failed: Connection refused
Nov 20 15:48:28 lama10 pvestatd[26301]: WARNING: ipcc_send_rec failed: Connection refused
Nov 20 15:48:31 lama10 pmxcfs[26319]: [quorum] crit: quorum_initialize failed: 6
Nov 20 15:48:31 lama10 pmxcfs[26319]: [quorum] crit: can't initialize service
Nov 20 15:48:31 lama10 pmxcfs[26319]: [confdb] crit: confdb_initialize failed: 6
Nov 20 15:48:31 lama10 pmxcfs[26319]: [quorum] crit: can't initialize service
Nov 20 15:48:31 lama10 pmxcfs[26319]: [dcdb] crit: cpg_initialize failed: 6
Nov 20 15:48:31 lama10 pmxcfs[26319]: [quorum] crit: can't initialize service
Nov 20 15:48:31 lama10 pmxcfs[26319]: [dcdb] crit: cpg_initialize failed: 6
Nov 20 15:48:31 lama10 pmxcfs[26319]: [quorum] crit: can't initialize service
Nov 20 15:48:37 lama10 pmxcfs[26319]: [quorum] crit: quorum_initialize failed: 6
Nov 20 15:48:37 lama10 pmxcfs[26319]: [confdb] crit: confdb_initialize failed: 6
Nov 20 15:48:37 lama10 pmxcfs[26319]: [dcdb] crit: cpg_initialize failed: 6
Nov 20 15:48:37 lama10 pmxcfs[26319]: [dcdb] crit: cpg_initialize failed: 6
Nov 20 15:48:38 lama10 pmxcfs[26319]: [status] crit: cpg_send_message failed: 9
Nov 20 15:48:38 lama10 pmxcfs[26319]: [status] crit: cpg_send_message failed: 9
Nov 20 15:48:38 lama10 pmxcfs[26319]: [status] crit: cpg_send_message failed: 9
Nov 20 15:48:38 lama10 pmxcfs[26319]: [status] crit: cpg_send_message failed: 9
I'm going to reboot that node to see what happens but I'd like to know if there are directions about the steps to take to restart a failed node.
Regards
Simone