Revert cluster to single node/machine

reetp · Oct 11, 2013

Having put a 2 node cluster together as an experiment I was trying to figure out if there is a 'graceful' way to revert to a single machine setup ?

I guess I could set

pvecm expected 1

And then remove the other node, but I don't think that puts me back to 'single machine' status.

Is there any guidance on this anywhere ?

I wanted to update the Cluster wiki page here http://pve.proxmox.com/wiki/Proxmox_VE_Cluster and this seems an often asked question.

B. Rgds
John

wahmed · Oct 11, 2013

If you no longer need the whole cluster, you can just delete one node at a time from the Cluster using the command below. Just run the command from the shell of the node:

#pvecm delnode <node>

This should "gracefully" revert the node to a single machine setup.

reetp · Oct 20, 2013

Not sure if it has worked or not. I did the following :

root@proxmox:/# pvecm nodes
Node Sts Inc Joined Name
1 M 60 2013-10-13 02:08:03 proxmox
2 M 64 2013-10-13 02:08:03 proxmox1

root@proxmox:/# pvecm delnode proxmox1

Node has disappeared from the Manager but :

root@proxmox:/# pvecm nodes
Node Sts Inc Joined Name
1 M 60 2013-10-13 02:08:03 proxmox
2 M 64 2013-10-13 02:08:03 proxmox1

root@proxmox:/# pvecm status
Version: 6.2.0
Config Version: 8
Cluster Name: mycluster
Cluster Id: 3380
Cluster Member: Yes
Cluster Generation: 64
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: proxmox
Node ID: 1
Multicast addresses: 239.192.13.65
Node addresses: 192.168.x.x

Any thoughts ?

B. Rgds
John

PS - Manager shows this :

reetp · Oct 23, 2013

Beginning to think that there is a bug somewhere in pmxcfs or the pvecm del command ?

Pushing a bit harder and trying things I ended up with this :

root@proxmox:~# pvecm nodes
Node Sts Inc Joined Name
1 M 88 2013-10-23 01:59:12 proxmox
2 X 0 proxmox1

And a server that boots but does not start the cluster service as it is 'waiting for quorum' which will never arrive.

For those who don't yet know, I think worked out that :

Though I deleted the node, it was NOT deleted from the config.db file (I can see the remnants in there)
At boot pmxcfs reads /var/lib/pve-cluster/config.db (and presumably other stuff)
This recreates the /etc/pve/cluster.conf file and /etc/cluster/cluster.conf
Nothing much can override it as far as I can tell.
You cannot make changes to /etc/pve/cluster.conf as it is read only, and any changes you make to /etc/cluster/cluster.conf get overwritten every boot.
There are notes the link below about adding this to the cluster.conf files <cman two_node="1" expected_votes="1"> </cman> but it threw an error for me

So each time the server boots, it recreates the cluster conf files, waits for quorum on a non existent node, and fails to start the cluster services, or anything else.

There appear to be no way to remove this information from the the db beyond the pvecm commands, and if for any reason this is not done just right, the database is left screwed, and so are you.

This thread helped me out of the current mess :
http://forum.proxmox.com/threads/10307-cluster-conf-wrong-prevents-cman-start-HELP!

I did :

service pve-cluster stop
pmxcfs --local

I then edited /etc/pve/cluster.conf and removed the offending node e.g.

<clusternodes>
<clusternode name="proxmox" votes="1" nodeid="1"/>
<clusternode name="proxmox1" votes="1" nodeid="2"/> // Removed this one
</clusternodes>

service pve-cluster stop

Voila, it all worked.

Please don't try this at home

I don't believe that this is a permanent fix, but it worked for me.

I still think that there is an issue with removing nodes whereby they do not get correctly deleted from the database - the forums are spattered with problems on this topic. If you get stuck, there is no way forwards, or backwards. This needs properly documenting somehow.

Quite what the answer is I do not know. I shall test some more when I get back from my trip.

B. Rgds
John

tom · Oct 23, 2013

test again. as long as you have quorum, you can manipulate all with pvecm. if you do not have quorum, gain quorum with "pvecm e 1".
there is no need to edit databases directly or manually do changes in /etc/pve/cluster.conf.

just to note, you can do all these test with a test cluster installed as KVM VMs, so you do not need any additional hardware.

reetp · Oct 23, 2013

Thanks for the reply.

As above, there was no quorum by default when you boot. Yes, I could do it temporarily with pvecm e 1 but on next reboot it is lost and back to square one.

I needed a permanent solution, at least for the moment, as the machine will be unattended for the next week and I needed to know it would reboot properly and start VMs by itself if for instance we had a power failure.

I will do a test cluster when I am back and see what happens.

I still think that there is a problem in the software, or the system for removing a node is not properly documented - you only have to read the number of posts on the subject (I know you have, and I reckon I have read most of them too

) too know that things aren't as they should be.

Quite simply "pvecm del 2" did not work as expected. So something isn't right !

Will come back with more when I return.

B. Rgds
John

tom · Oct 23, 2013

reetp said:
Thanks for the reply.

As above, there was no quorum by default when you boot. Yes, I could do it temporarily with pvecm e 1 but on next reboot it is lost and back to square one.

I needed a permanent solution, at least for the moment, as the machine will be unattended for the next week and I needed to know it would reboot properly and start VMs by itself if for instance we had a power failure.

I will do a test cluster when I am back and see what happens.

I still think that there is a problem in the software, or the system for removing a node is not properly documented - you only have to read the number of posts on the subject (I know you have, and I reckon I have read most of them too ) too know that things aren't as they should be.

Quite simply "pvecm del 2" did not work as expected. So something isn't right !

Will come back with more when I return.

B. Rgds
John

Clustering in a complex topic, and all is about quorum. If you have a wrong config, you have no quorum.

So you need to set the right config before you boot.

m.ardito · Oct 23, 2013

tom said:
Clustering in a complex topic, and all is about quorum. If you have a wrong config, you have no quorum.
So you need to set the right config before you boot.

so, why a deleted node with "pvecm del <nodename>" should not be restarted as it is, but first reinstalled from scratch? And then, if needed, rejoined to the same cluster?

Marco

wahmed · Oct 23, 2013

m.ardito said:
so, why a deleted node with "pvecm del <nodename>" should not be restarted as it is, but first reinstalled from scratch? And then, if needed, rejoined to the same cluster?

Marco

It is not like you cannot readd the same node in same cluster. But in a cluster node there are more things involved than pvecm delnode deletes.
I use the following procedure to delete a node from the cluster and readd later if i want to:
*===========
*cp -a /etc/pve /root/pve_backup (create backup first)
*
*Stop cluster service: /etc/init.d/pve-cluster stop
*
*umount /etc/pve
*
*/etc/init.d/cman stop
*
*rm /etc/cluster/cluster.conf
*
*rm -rf /var/lib/pve-cluster/*
*
*/etc/init.d/pve-cluster start
*
*pvecm add proxmox1 (re-add node2 on the cluster again)

Sent from my ASUS Transformer Pad TF700T using Tapatalk

m.ardito · Oct 24, 2013

symmcom said:
It is not like you cannot readd the same node in same cluster. But in a cluster node there are more things involved than pvecm delnode deletes.

i understand this. I just expect a FOSS project to explain in some way all procedures, problems and all things involved. even "more things".

symmcom said:
I use the following procedure to delete a node from the cluster and readd later if i want to:

I've seen you procedure before, and while I could use that too, since it is an "unsupported" way (see this and this), I wish to know what every step does and why. And the wiki reports also this, similar in some points but not the same thing

- official (supported) procedure -
1) delete the node (from another cluster member)
2) Shut server down & re-install from scratch
3) rejoin your node

- your procedure -
1) backup pve-cluster related file/folders.
2) stop pve-cluster service
3) unmount a folder
4) stop cman service
5) delete relevant files/folders (configs?)
6) restart pve-cluster service
7) rejoin your node

btw:
* why backup those files, since you're not going to use them again? just for safety, to revert (how if)?
* what about cman? is it started in other ways?
* will the pve-cluster service restart recreate all needed files/folder/configs?

- wiki procedure-
1) stop services pvestatd, pvedaemon, cman, pve-cluster
2) backup pve-cluster related file/folders.
3) backup /root/.ssh, note down two existing ssh symlinks
4) Shut server down & re-install from scratch. Make sure the hostname is the same as it was before you continue.
5) stop services pvestatd, pvedaemon, cman, pve-cluster
6) restore saved /root/.ssh
7) restore saved pve-cluster related file/folders.
8) restart services pve-cluster and cman
9) restore two ssh symlinks
10) restart services pvestatd, pvedaemon
11) rejoin your node

I would just like very much to see some proxmox team member jump in, at least and comment it, saying what works and what not, and why. if those are not the recommended way, unsupported, but cannot harm the cluster in any way, maybe it's just too complicated for beginners, but from the technical point it can work well. I feel some "obscurity" about these cluster concepts and operations, here and I dont' know why.

We are not talking about common linux clustering software (like cman), pvecm is a proxmox software, there's the "man" but nearly nothing else. Or, I could not find it.
Oh yes, if I was a coder, I could read & understand all cluster related code, and then figure it out, just until code changes and I have to restart...

this is why I also started this thread http://forum.proxmox.com/threads/16...quot-cluster-docs-and-operations-quot-section

but it seems that my threads get zero responses these days... :-D

Thanks for your procedure, anyway

Marco

wahmed · Oct 24, 2013

The procedure i used is definitely not recommend way. Its a quick and dirty way to do things. Thats why wiki says "re install from scratch". If something goes wrong doing non-recommended way, for obvious reason proxmox will get the blame. I am not proxmox staff so i freely share what i know

In real case scenario though, you wont be taking chances by reusing same node without reinstalling, at least i wont. Reinstalling does not take lot of time, definitely not the time it will take to fix if something goes wrong.

I know very little about inner workings of Proxmox to comment on many things. But am learning though. My main focus is to have an environment running as smoothly as possible and let the proxmox team take care of coding.

Sent from my ASUS Transformer Pad TF700T using Tapatalk

Search

Search

Revert cluster to single node/machine

reetp

Renowned Member

wahmed

Famous Member

reetp

Renowned Member

reetp

Renowned Member

tom

Proxmox Staff Member

reetp

Renowned Member

tom

Proxmox Staff Member

m.ardito

Active Member

wahmed

Famous Member

m.ardito

Active Member

wahmed

Famous Member