Please help, I am so close!!!

warloxian

Member
Jun 26, 2021
49
0
11
59
I have a cluster of 6 nodes running ceph on all. When i set up my last node I accidently set a duplicate static address that matched another node. I went in and changed the ip of the node. I fixed it in /hosts and /interfaces. When I run pvesh get cluster/config/nodes I can see all 6 nodes, I also fixed the ip address in /etc/pve/corosync.conf. When I run pvesh get cluster/config/nodes all 6 nodes show up, when i run it from vga terminal on the lost noDe I see all 6 nodes. When I am logged into my main node 5 nodes show up green and the one i changed ip on has a red x. when i log inot the gui of my changed node , it is green and 5 nodes have red x's . I am unable to access shell when logged into the gui of the changed node , I get error 1006, I am able to ping google.com as well as 8.8.8.8 and I am able to ping the addresses of all of the other 5 nodes and permission denied error. can ping the changed ip from all other nodes. I am not able to edit any config files from the changed node with permission denied error. When i run pvecm nodes from any of the 5 working nodes I can only see the 5 nodes and when i run the same from the changed node I can only see it and not the other 5 , this is the only place I can find where all 6 nodes don't show up.
When I look at the log files I see "cluster not quorate - extending auth key lifetime and when I run journalctl -u corosync on the changed node I see multiple instances of " host 1,2,3,4,5, has no active link"
Yes i am NOOB, yes I have tried everything I can find and have reached the end of Google, yes I am OVER MY HEAD. Yes , I am tired of reinstalling OS's everytime I screw things up. It's time for me to start learning how to work through these problems , instead of just writing over them. I have made tremendous progress with Linux in general, considering 12 months ago I knew nothing. I have several stable Linux OS's running on multiple laptops and have found ways to work through my BORKED Linux systems, but the server , VM world is my new "thorn in my side , from the tree that I planted" If I may quote Metallica?
Any help would be GREATLY Appreciated
 
Last edited:
/etc/pve/corosync.conf --> do you have increase config_version ?

also, after changing it, verify on each node that /etc/corosync/corosync.conf is same than /etc/pve/corosync.conf.
if not, overwrite /etc/corosync/corosync.conf and restart corosync service.
 
/etc/pve/corosync.conf --> do you have increase config_version ?

also, after changing it, verify on each node that /etc/corosync/corosync.conf is same than /etc/pve/corosync.conf.
if not, overwrite /etc/corosync/corosync.conf and restart corosync service.
As I ama NOOB, I guess I dont fully understand the question " increase config_version ?" , my problem with modifying the /etc/corosync/corosync.conf file is this. When I log into the changed node from web gui, I am not able to access shell at all? When I plug in my monitor, keyboard to the changed node and I try to change any of the files I get "access denied, even when I run it with sudo? This seems to be the hanging point. I appear to have been locked out of the ability to change any files for this reason
 
in /etc/pve/corosync.conf file, you have a field like:

Code:
totem {
  cluster_name: your cluster
  config_version: 4
  ...

in this example, config_version = 4 , do you to increase it (config_version + 1 = 5),
so

Code:
totem {
  cluster_name: your cluster
  config_version: 5
  ...


if you don't increase the config_version, the new config is not apply.



When it's done, the /etc/pve/corosync.conf will overwrite local /etc/corosync/corosync.conf on each node.
As it's seem that you have duplicate ips or bad config, it's possible that it's not done correctly, so you need to verify manually that /etc/corosync/corosync.conf is correctly updated.
if not, copy manually /etc/pve/corosync.conf to /etc/corosync/corosync.conf on each node, and restart corosync server "systemctl restart corosync" .

Then, when all is done, verify the coroysnc cluster status with "pvecm status"