Hello,
I am struggling to remove single Proxmox node from cluster properly. I am following guide in docs https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node and it looks like node is only partially removed.
Basicaly I did something like
Then I tried to remove node from cluster and got error
On many places I cannot find any trace:
But I can still see it in GUI and on many places:
Also I was unable to find any reason why could removing node from corosync failed. Syslog looks as expected to me:
Do you have any idea how to delete it properly and (most importantly) what have I done wrong?
Thanks for your time.
I am struggling to remove single Proxmox node from cluster properly. I am following guide in docs https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node and it looks like node is only partially removed.
Basicaly I did something like
Code:
# ensured no VMs are on node
systemctl stop pve-ha-lrm pve-ha-crm corosync pve-cluster pvedaemon pveproxy
dd if=/dev/urandom of=/dev/sda
shutdown now
Then I tried to remove node from cluster and got error
Code:
P virt1[root](15:01:30)-(~)
-> pvecm delnode virt98
Killing node 98
Could not kill node (error = CS_ERR_NOT_EXIST)
error during cfs-locked 'file-corosync_conf' operation: command 'corosync-cfgtool -k 98' failed: exit code 1
On many places I cannot find any trace:
Code:
P virt1[root](15:01:38)-(~)
-> grep virt98 /etc/pve/.members
P virt1[root](15:01:43)-(~)
-> grep 98 /etc/corosync/corosync.conf
P virt1[root](15:02:15)-(~)
-> pvecm delnode virt98
error during cfs-locked 'file-corosync_conf' operation: Node/IP: virt98 is not a known host of the cluster.
But I can still see it in GUI and on many places:
Code:
P virt1[root](15:44:28)-(~)
-> jq .node_status.virt98 /etc/pve/ha/manager_status
"gone"
P virt1[root](15:44:30)-(~)
-> ls -l /etc/pve/nodes/virt98
total 2
-rw-r----- 1 root www-data 84 Feb 19 14:58 lrm_status
drwxr-xr-x 2 root www-data 0 Feb 1 16:18 lxc
drwxr-xr-x 2 root www-data 0 Feb 1 16:18 openvz
drwx------ 2 root www-data 0 Feb 1 16:18 priv
-rw-r----- 1 root www-data 1675 Feb 1 16:18 pve-ssl.key
-rw-r----- 1 root www-data 1712 Feb 1 16:18 pve-ssl.pem
drwxr-xr-x 2 root www-data 0 Feb 1 16:18 qemu-server
Also I was unable to find any reason why could removing node from corosync failed. Syslog looks as expected to me:
Code:
Feb 19 15:01:39 virt1 pvecm[30430]: <root@pam> deleting node virt98 from cluster
Feb 19 15:01:39 virt1 pmxcfs[34727]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 21)
Feb 19 15:01:39 virt1 corosync[6398]: [CFG ] Config reload requested by node 1
Feb 19 15:01:39 virt1 corosync[6398]: [TOTEM ] Configuring link 0
Feb 19 15:01:39 virt1 corosync[6398]: [TOTEM ] Configured link number 0: local addr: 192.168.248.76, port=5405
Feb 19 15:01:39 virt1 corosync[6398]: [TOTEM ] Configuring link 1
Feb 19 15:01:39 virt1 corosync[6398]: [TOTEM ] Configured link number 1: local addr: 192.168.232.60, port=5406
Feb 19 15:01:39 virt1 corosync[6398]: [KNET ] host: host: 98 (passive) best link: 0 (pri: 0)
Feb 19 15:01:39 virt1 corosync[6398]: [KNET ] host: host: 98 has no active links
Feb 19 15:01:39 virt1 corosync[6398]: [KNET ] host: host: 98 (passive) best link: 0 (pri: 0)
Feb 19 15:01:39 virt1 corosync[6398]: [KNET ] host: host: 98 has no active links
Feb 19 15:01:39 virt1 pmxcfs[34727]: [status] notice: update cluster info (cluster name virt, version = 21)
Do you have any idea how to delete it properly and (most importantly) what have I done wrong?
Thanks for your time.