[SOLVED] How to totally destroy a Cluster, then re-create it

Razva

Renowned Member
Dec 3, 2013
252
10
83
Romania
cncted.com
Hello,

I would like to totally destroy a Cluster, then re-create it without reinstalling each node.

All nodes have running VMs.

Can you please let me know how can I achieve this?

Thank you,
Razvan
 
that's not easily possible - so naturally the first question is why do you want to do that?
 
why do you want to do that
I need to do that because one of the nodes got "randomly disconnected" and I can't manage to connect it back to the Cluster. It's a small, 3 nodes cluster, so creating the entire thing seemed more reasonable than debugging this.

In the current scenario two nodes are still connected, but the third is out. Can you please let me know what logs should I check in order to see why the node is refusing to connect? There are no IP changes, no network changes. The root passwords were changed, but I guess that the Cluster is not based on passwords but SSH keys?

Thank you!
 
post the following:

pveversion -v
pvecm status

from each node, and check the contents of /etc/corosync.conf (should be identical on each node) and the logs of corosync and pmxcfs (journalctl -b -u corosync -u pve-cluster)
 
pveversion -v
Here are the results:

Code:
root@px1-ms:~# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

Code:
root@px2-ms:~# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

Code:
root@px3-ms:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-helper: 6.4-8
pve-kernel-5.4: 6.4-7
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve1~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1

pvecm status
Here are the results:

Code:
root@px1-ms:~# pvecm status
Cluster information
-------------------
Name:             px-ms
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov  4 13:42:17 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.7c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.101 (local)
0x00000002          1 192.168.1.102

Code:
root@px2-ms:~# pvecm status
Cluster information
-------------------
Name:             px-ms
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov  4 13:42:19 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.7c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.101
0x00000002          1 192.168.1.102 (local)

Code:
root@px3-ms:~# pvecm status
Cluster information
-------------------
Name:             px-ms
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov  4 13:42:14 2021
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000003
Ring ID:          3.41
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 192.168.1.103 (local)
check the contents of /etc/corosync.conf
Currently here's the result:

Code:
cat: /etc/corosync.conf: No such file or directory

the logs of corosync and pmxcfs
Here are the logs:
- px1: http://sprunge.us/Yk1kFK
- px2: http://sprunge.us/gW5G3F
- px3: http://sprunge.us/0xtHQO

Basically PX3 refuses to connect to PX1+PX2.
 
sorry, the path is /etc/corosync/corosync.conf
 
the path is /etc/corosync/corosync.conf
Thank you, here's the result:

Code:
root@px1-ms:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: px1-ms
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.101
  }
  node {
    name: px2-ms
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.102
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: px-ms
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Code:
root@px2-ms:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: px1-ms
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.101
  }
  node {
    name: px2-ms
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.102
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: px-ms
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Code:
root@px3-ms:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: px1-ms
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.101
  }
  node {
    name: px2-ms
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.102
  }
  node {
    name: px3-ms
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.1.103
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: px-ms
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Is there anything I'm missing or doing wrong?
 
yeah, the third node's changes haven't been applied at the first two nodes. assuming you don't have any guests located on the third node (since it's newly joined), the following should work:

- scp corosync.conf from third node to first or second node
- cp it into /etc/pve/corosync.conf (different path! this is the one used by pmxcfs)
- check /etc/corosync/corosync.conf on first and second node, it should be identical to the third node now
- check pvecm status on all three nodes
 
Hi,

just a note if that matters? - px1-ms and px2-ms are on version 7.x but px3-ms is on version 6.x.

Greetz
 
it's not ideal and you should upgrade the third node to 7.x as well (e.g., migration from new to old is not guaranteed to work), but it's unrelated to the issue at hand.
 
  • Like
Reactions: fluxX04
just a note if that matters? - px1-ms and px2-ms are on version 7.x but px3-ms is on version 6.x.
Yes, you are totally right. I totally forgot about this matter, and I guess that was the reason.

it's not ideal and you should upgrade the third node to 7.x as well (e.g., migration from new to old is not guaranteed to work), but it's unrelated to the issue at hand.

I would like to totally reinstall PX3. I know there were some issues with nodes that have a previously used hostname.

Is there any way to see if PX3 was even adopted initially in the PVE7 cluster?

Because it might be a "human error" on my side, and maybe I didn't even added PX3 to the cluster after reinstalling PVE7. At this point, unfortunately, I cannot recall this information.
 
did you previously have a pve 6 cluster and already attempted to destroy and reinstall the cluster (that might have been information to include in your question ;)).

node 3 has a corosync config with all three nodes
nodes 1 & 2 have a corosync config with just those two nodes

so either the join failed, or something else is missing from this picture..
 
did you previously have a pve 6 cluster and already attempted to destroy and reinstall the cluster

I worked at this cluster about 5 months ago and I totally forgot what I did/didn't.

I didn't do a PVE6-PVE7 upgrade so I guess that I made a fresh PVE7 reinstall on two nodes and didn't made on the 3rd one. But just to be sure, is there any way to confirm this?
 
you can check task logs for cluster create/join operations, and the apt history and shell history might give some clues regarding a time line as well.
 
  • Like
Reactions: Razva

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!