HELP. I deleted the cluster.conf file (trying to remove the cluster) and I think my PVE node won't reboot correctly.

cmrho

New Member
Nov 4, 2024
24
5
3
Hi.

Bonehead move #126 of the week.

Problem: While playing around with joining clusters (using 2 different pve nodes on different IP), I decided to just 'unjoin' the cluster. It wasn't working right, so I deleted the cluster.conf file in the /etc/pve/ folder. When I did this on node #2 (a test node), my VMs and containers completely disappeared when I rebooted node #2. I mean, even the vm and lxc .conf files are missing.

Concern: The cluster.conf file is also missing from the /etc/pve folder of my MAIN pve node. I'm freaked out about rebooting my main node for fear that the same thing will happen. There is still a 'corosync.conf' file in the /etc/pve folder.

Assets: I have a full daily backup of the /etc/pve/ folder. In reviewing yesterday's backup, there are definitely NOT 'corosync.conf' and 'cluster.conf' files in the /etc/pve folder.

After your proper ribbing of my boneheadedness, I would love for guidance on how to fix this issue.

Note 1:
Incidentally, I don't know if it helps, but it might have been not acting correctly because after installation of these nodes, I ran a 'post-install' script that, I believe, disabled HA & clustering? In any case, I think I need the proper cluster.conf file for this thing to run right.

Note 2:

I don't know if the disappearance of the vm and lxc .conf files were just coincidental after I deleted the 'cluster.conf' file on node 2. After recreating a couple of test VM and CTs on node 2, I deleted the 'corosync.conf' file from the /etc/pve folder and rebooted node 2. The VM and CTs did NOT disappear.

If I can get someone to confirm that deleting 'corosync.conf' from node 1 (my main pve server) will not delete my VM and LXC .conf files, I will do that and reboot. Otherwise, I am not rebooting this system until I have no other option.

Thanks so much.
 
Last edited:
Hi.

Bonehead move #126 of the week.

Problem: While playing around with joining clusters (using 2 different pve nodes on different IP), I decided to just 'unjoin' the cluster. It wasn't working right, so I deleted the cluster.conf file in the /etc/pve/ folder. When I did this on node #2 (a test node), my VMs and containers completely disappeared when I rebooted node #2. I mean, even the vm and lxc .conf files are missing.

Concern: The cluster.conf file is also missing from the /etc/pve folder of my MAIN pve node. I'm freaked out about rebooting my main node for fear that the same thing will happen. There is still a 'corosync.conf' file in the /etc/pve folder.

Assets: I have a full daily backup of the /etc/pve/ folder. In reviewing yesterday's backup, there are definitely NOT 'corosync.conf' and 'cluster.conf' files in the /etc/pve folder.

After your proper ribbing of my boneheadedness, I would love for guidance on how to fix this issue.

Note 1:
Incidentally, I don't know if it helps, but it might have been not acting correctly because after installation of these nodes, I ran a 'post-install' script that, I believe, disabled HA & clustering? In any case, I think I need the proper cluster.conf file for this thing to run right.

Note 2:

I don't know if the disappearance of the vm and lxc .conf files were just coincidental after I deleted the 'cluster.conf' file on node 2. After recreating a couple of test VM and CTs on node 2, I deleted the 'corosync.conf' file from the /etc/pve folder and rebooted node 2. The VM and CTs did NOT disappear.

If I can get someone to confirm that deleting 'corosync.conf' from node 1 (my main pve server) will not delete my VM and LXC .conf files, I will do that and reboot. Otherwise, I am not rebooting this system until I have no other option.

Thanks so much.
Reply to myself for documentation purposes:

After signfiicant testing of a test pve node, I believed with some confidence that simply deleting the 'cluster.conf' and the 'corosync.conf' file would not result in any damage to the main pve node. I deleted the 'corosync.conf' file, stopped and started the cluster sevice ('systemctl stop pve-cluster'), confirmed that the cluster was gone (from CLI and from the GUI), then rebooted. All is good.