Error after deleting a node from cluster, can't start cman

slappyjam

New Member
Feb 27, 2013
13
0
1
Hey guys,

I am having a problem with one of the nodes in my cluster. I was trying to delete the node so I could re-add it. After I ran 'pvecm delnode' on the node I wanted to delete (virt2-atl), restarting cman caused a seg fault. I want to add the node back into the cluster but since I can't start cman, I can't add the node. I have a feeling the process is bailng on a config file but not sure. Has anyone seen this or can someone point me in the right direction here?

Here's the output from the node when I try to start cman (and YES I have tried rebooting):
Code:
root@virt2-atl:~# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... /usr/sbin/ccs_config_validate: line 186: 101930 Segmentation fault      (core dumped) ccs_config_dump > $tempfile


Unable to get the configuration
corosync [MAIN  ] Corosync Cluster Engine ('1.4.4'): started and ready to provide service.
corosync [MAIN  ] Corosync built-in features: nss
corosync [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
corosync died with signal: 11 Check cluster logs for details
[FAILED]
 
Please can you post the contents of /etc/cluster/cluster.conf


My cluster.conf:
Code:
<?xml version="1.0"?>
<cluster config_version="29" name="KVM-ATL">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="10.10.148.228" lanplus="1" login="root" name="ipmi-virt1-atl" passwd="password"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.10.148.229" lanplus="1" login="ADMIN" name="ipmi-virt2-atl" passwd="password"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.10.148.183" lanplus="1" login="admin" name="ipmi-virt3-atl" passwd="password"/>
  </fencedevices>
  <clusternodes>


    <clusternode name="virt3-atl" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi-virt3-atl"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="virt1-atl" nodeid="3" votes="1">
      <fence>
        <method name="1">
          <device name="ipmi-virt1-atl"/>
        </method>
      </fence>
    </clusternode>
  <clusternode name="virt4-atl" votes="1" nodeid="4"/></clusternodes>
  <rm/>
</cluster>

- - - Updated - - -

yes, that is my thread is well.. thought I'd make a seperate post. I'm okay with reinstalling so long as it doesn't mean reprovisioning. My machines are in a datacenter that's relatively far away.

Is there a text-based install I can do over IPMI? Or is there a way to do a re-install just be removing/adding back the packages??
 
That's correct. I did a 'pvecm delnode' on virt2. I want to add virt2-atl back to the cluster, but I can't if cman isn't starting up.
 
Here's the output from pvecm nodes/add on a machine in the cluster:

Code:
root@virt1-atl:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M   1052   2013-02-19 14:45:53  virt3-atl
   2   X   1148                        virt2-atl
   3   M   1028   2013-02-19 14:17:06  virt1-atl
   4   X      0                        virt4-atl

root@virt1-atl:~# pvecm add virt2-atl
authentication key already exists
 
If I run that command on virt2-atl, it has the same output:

Code:
root@virt2-atl:~# pvecm add virt2-atl
authentication key already exists
 
If I run that command on virt2-atl, it has the same output:

You should be more careful when running such command. I guess you want something like:

Code:
root@virt2-atl:~# pvecm add virt1-atl

There is also a '--force' option to re-add a node.
 
I feel like you simply aren't paying attention to anything I'm saying here. I've already run that command, and if you took the time to scroll up you would see the output from it. I feel like I'm being patronized here. If I had a budget, I would buy support (as your tagline so kindly suggests). The problem is that cman won't start up anymore and trying to readd the node with pvecm simply doesn't work.

Do I have to resinstall this node to make it work?


As a side note:

I really like this proxmox product but the attitude that I've gotten form the "staff members" that respond to these posts is awful. If this is anything like what it's like to have paid support, then I'll find a different enterprise solution.

You guys really need to get your act together.



You should be more careful when running such command. I guess you want something like:

Code:
root@virt2-atl:~# pvecm add virt1-atl

There is also a '--force' option to re-add a node.
 
This is the community forum, a place where you can share your issues and questions with other community members. And community members can help others.
We moderate and also help as best we can via this channel. So far no one from the community helped you, so one of our staff members tried to help immediately.

If you want personal and direct help, the forum is the wrong place.
For more details, see http://pve.proxmox.com/wiki/Get_support

Configuration and Debugging of a Cluster setup is a challenging task and I understand that you would also like that our support team help you without any costs for you, and of course, in a 24/7 way and without any delay.

Proxmox VE is free software, but support is not free. You call this "a bad attitude" and awful, but its the basis of our business model (and also for a lot of other opensource projects).

You get free software, but you need to pay for support (if you are unhappy with the forum). Its up to you if you like this or not, but this is the way to go.
 
I've read other forums where you guys go on and on about how nothing in life is free. Trust me, I get it.

All I'm trying to say is sometimes it's better to say nothing than something completely useless. It's very frustrating.

As far as my actual problem I'm going to just go reinstall the entire cluster. As soon as I have a budget, I plan on buying support. Until then, take it easy on your potential customers, bro.

Thanks for your response.



This is the community forum, a place where you can share your issues and questions with other community members. And community members can help others.
We moderate and also help as best we can via this channel. So far no one from the community helped you, so one of our staff members tried to help immediately.

If you want personal and direct help, the forum is the wrong place.
For more details, see http://pve.proxmox.com/wiki/Get_support

Configuration and Debugging of a Cluster setup is a challenging task and I understand that you would also like that our support team help you without any costs for you, and of course, in a 24/7 way and without any delay.

Proxmox VE is free software, but support is not free. You call this "a bad attitude" and awful, but its the basis of our business model (and also for a lot of other opensource projects).

You get free software, but you need to pay for support (if you are unhappy with the forum). Its up to you if you like this or not, but this is the way to go.
 
I feel like you simply aren't paying attention to anything I'm saying here. I've already run that command, and if you took the time to scroll up you would see the output from it. I feel like I'm being patronized here.

You simply run the wrong command, so please re-read my post carefully.
 
Wow.. don't I feel like a complete jerk now.

I tried running:

Code:
root@virt2-atl:~# pvecm add virt1-atl
authentication key already exists
Then i added the --force option and it worked.

Code:
root@virt2-atl:~# pvecm add virt1-atl --force
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-clustercan't create shared ssh key database '/etc/pve/priv/authorized_keys'
.
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Tuning DLM kernel config... [  OK  ]
   Unfencing self... [  OK  ]
   Joining fence domain... [  OK  ]
generating node certificates
merge known_hosts file
restart services
Restarting PVE Daemon: pvedaemon.
Restarting web server: apache2 ... waiting .
successfully added node 'virt2-atl' to cluster.

Many apologies. I guess I should pay more attention before I run my mouth.

Thanks guys!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!