[SOLVED] Stubborn Node wont go away

ozhound

New Member
Jan 23, 2023
7
2
3
HI All,

I've read several posts here on deleting an orphaned node from a cluster and the stubborn little bugger just won't go away, even though it shouldn't exist based on what I am reading.
2023-01-23_11-51-27.jpg
Steps that I've done.
1. run the pvecm delnode home command, result is
Code:
Node/IP: home is not a known host of the cluster.
2. I've confirmed that the home directory has been removed from /etc/pve/nodes
3. I've comfirmed that there is only the correct nodes listed in the corosync.conf file
4. The output of pvecm nodes is
Code:
Membership information
----------------------
    Nodeid      Votes Name
         1          1 ozhound (local)

This wouldn't be a big deal however, even with all this being the case, after a restart the system will not start any containers / vm's until I run a pvecm expected 1. Which, although rarely happens, is still an issue.

Im stumped, any advice?
 
Hello,

Can you please provide us with the following:

Bash:
cat /etc/pve/corosync.conf
cat /etc/pve/.members
ls -la /etc/pve/nodes/

I would also check the Syslog if it gives you any hint.
 
Hello,

Can you please provide us with the following:

Bash:
cat /etc/pve/corosync.conf
cat /etc/pve/.members
ls -la /etc/pve/nodes/

I would also check the Syslog if it gives you any hint.
Hi Moayad

thankyou for your response

As requested

corosync.conf
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: ozhound
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.205
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: OzCluster
  config_version: 1
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

.members

Code:
{
"nodename": "ozhound",
"version": 3,
"cluster": { "name": "Ozhound", "version": 2, "nodes": 2, "quorate": 1 },
"nodelist": {
  "home": { "id": 2, "online": 0},
  "ozhound": { "id": 1, "online": 1, "ip": "192.168.1.205"}
  }
}

Nodes

Code:
total 0
drwxr-xr-x 2 root www-data 0 Oct 23 16:39 .
drwxr-xr-x 2 root www-data 0 Jan  1  1970 ..
drwxr-xr-x 2 root www-data 0 Oct 23 16:39 ozhound
 
Thank you for the outputs!


From the `/etc/pve/.members` output, the `home` node is still present.

Can you please provide us also with the output of `cat /etc/hosts` command?

Did you see anything interesting in the Syslog/journalctl?
 
Hi,

hosts is as follows, ive put * in place of the domain

Code:
127.0.0.1 localhost.localdomain localhost
192.168.1.205 *.*.* ozhound

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

There is nothing that i can see in the syslog, but im not really sure what im looking at
 
Hello,

thank you for the syslog!

From the syslog provided, I can see the following:

Code:
Jan 22 11:18:51 ozhound pveproxy[1165]: '/etc/pve/nodes/home/pve-ssl.pem' does not exist!#012
Jan 22 11:18:51 ozhound pveproxy[1166]: '/etc/pve/nodes/home/pve-ssl.pem' does not exist!#012


Can you try to restart the following services in your node?

Bash:
systemctl restart pve-cluster.service
systemctl restart corosync.service                                                                                                       
systemctl restart pveproxy.service


Please also make sure that `home` node is offline.
 
Hi Moayad


Please also make sure that `home` node is offline.
No issues here, home doesn't exist at all

I've actually restarted at hardware level a few times, but I've done the restarts on the services as you have suggested but I'm not sure what I'm supposed to be doing next, nothing has changed.

Is there a way to edit that .members file and remove the home entry? i tried with root but permission denied

i believe that is linked to the GUI displaying this.
 
Hi,

No, you can't edit the .members file.

I would check if still any file related to the `home` node still exists in /etc/pve by issuing the find /etc/pve/ command. Another thing, I would check if the cat /etc/corosync/corosync.conf is the same as the cat /etc/pve/corosync.conf.
 
/etc/corosync/corosync.conf still had the home node listed, deleting this fixed the issue

thanks for your help
 
  • Like
Reactions: Moayad
For anyone finding this thread (and for myself in the future steps for removing a node from 2 node cluster)

1. Shutdown the node you want to remove
2. Ensure you are accessing the primary node that you don't want to remove
3. open a shell
4. execute
Code:
pvecm expected 1
and then
Code:
pvecm delnode <node-id>
5. run these commands
Code:
systemctl restart pve-cluster.service
and
Code:
systemctl restart corosync.service
6. Refresh the browser and the deleted node should be removed from the GUI
 
  • Like
Reactions: Moayad