[SOLVED] Stubborn Node wont go away

ozhound

New Member
Jan 23, 2023
7
2
3
HI All,

I've read several posts here on deleting an orphaned node from a cluster and the stubborn little bugger just won't go away, even though it shouldn't exist based on what I am reading.
2023-01-23_11-51-27.jpg
Steps that I've done.
1. run the pvecm delnode home command, result is
Code:
Node/IP: home is not a known host of the cluster.
2. I've confirmed that the home directory has been removed from /etc/pve/nodes
3. I've comfirmed that there is only the correct nodes listed in the corosync.conf file
4. The output of pvecm nodes is
Code:
Membership information
----------------------
    Nodeid      Votes Name
         1          1 ozhound (local)

This wouldn't be a big deal however, even with all this being the case, after a restart the system will not start any containers / vm's until I run a pvecm expected 1. Which, although rarely happens, is still an issue.

Im stumped, any advice?
 
Hello,

Can you please provide us with the following:

Bash:
cat /etc/pve/corosync.conf
cat /etc/pve/.members
ls -la /etc/pve/nodes/

I would also check the Syslog if it gives you any hint.
 
Hello,

Can you please provide us with the following:

Bash:
cat /etc/pve/corosync.conf
cat /etc/pve/.members
ls -la /etc/pve/nodes/

I would also check the Syslog if it gives you any hint.
Hi Moayad

thankyou for your response

As requested

corosync.conf
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: ozhound
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.205
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: OzCluster
  config_version: 1
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

.members

Code:
{
"nodename": "ozhound",
"version": 3,
"cluster": { "name": "Ozhound", "version": 2, "nodes": 2, "quorate": 1 },
"nodelist": {
  "home": { "id": 2, "online": 0},
  "ozhound": { "id": 1, "online": 1, "ip": "192.168.1.205"}
  }
}

Nodes

Code:
total 0
drwxr-xr-x 2 root www-data 0 Oct 23 16:39 .
drwxr-xr-x 2 root www-data 0 Jan  1  1970 ..
drwxr-xr-x 2 root www-data 0 Oct 23 16:39 ozhound
 
Thank you for the outputs!


From the `/etc/pve/.members` output, the `home` node is still present.

Can you please provide us also with the output of `cat /etc/hosts` command?

Did you see anything interesting in the Syslog/journalctl?
 
Hi,

hosts is as follows, ive put * in place of the domain

Code:
127.0.0.1 localhost.localdomain localhost
192.168.1.205 *.*.* ozhound

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

There is nothing that i can see in the syslog, but im not really sure what im looking at
 
Hello,

thank you for the syslog!

From the syslog provided, I can see the following:

Code:
Jan 22 11:18:51 ozhound pveproxy[1165]: '/etc/pve/nodes/home/pve-ssl.pem' does not exist!#012
Jan 22 11:18:51 ozhound pveproxy[1166]: '/etc/pve/nodes/home/pve-ssl.pem' does not exist!#012


Can you try to restart the following services in your node?

Bash:
systemctl restart pve-cluster.service
systemctl restart corosync.service                                                                                                       
systemctl restart pveproxy.service


Please also make sure that `home` node is offline.
 
Hi Moayad


Please also make sure that `home` node is offline.
No issues here, home doesn't exist at all

I've actually restarted at hardware level a few times, but I've done the restarts on the services as you have suggested but I'm not sure what I'm supposed to be doing next, nothing has changed.

Is there a way to edit that .members file and remove the home entry? i tried with root but permission denied

i believe that is linked to the GUI displaying this.
 
Hi,

No, you can't edit the .members file.

I would check if still any file related to the `home` node still exists in /etc/pve by issuing the find /etc/pve/ command. Another thing, I would check if the cat /etc/corosync/corosync.conf is the same as the cat /etc/pve/corosync.conf.
 
/etc/corosync/corosync.conf still had the home node listed, deleting this fixed the issue

thanks for your help
 
  • Like
Reactions: Moayad
For anyone finding this thread (and for myself in the future steps for removing a node from 2 node cluster)

1. Shutdown the node you want to remove
2. Ensure you are accessing the primary node that you don't want to remove
3. open a shell
4. execute
Code:
pvecm expected 1
and then
Code:
pvecm delnode <node-id>
5. run these commands
Code:
systemctl restart pve-cluster.service
and
Code:
systemctl restart corosync.service
6. Refresh the browser and the deleted node should be removed from the GUI
 
  • Like
Reactions: Moayad

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!