[SOLVED] Properly flush cluster settings to recreate a new one

MartinL

New Member
Nov 21, 2014
17
2
3
Hi,

I'm setting up a new Proxmox VE cluster but I made some mistakes during cluster creation (cluster created on wrong interface, nodes hostname was not resolved, ...). Then after some modification made manually, nothing work any more.
Is there a way to flush all cluster settings to come back to a fresh install state or I need to reinstall entirely both nodes?
 
Hi,

no there is no cluster reset function.
Can you send the corosync.conf
 
Hi,

There is my corosync.conf:
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pm2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: pm2
  }

  node {
    name: pm1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: pm1
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: PMO
  config_version: 2
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 10.88.103.114
    ringnumber: 0
  }

}

I had manually modify "bindnetaddr" because it was initialy set on the WAN interface.
When I executed "pvecm add 10.88.103.114" on the second node, process was stuck on "waiting for quorum...".

After some try to modify corosync.conf, adding nodes' hostname on local DNS, reboot, restart services, ... Corosync is not starting any more:
Code:
May 19 14:46:20 pm1 corosync[1425]: [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
May 19 14:46:20 pm1 corosync[1425]: [QB    ] server name: cpg
May 19 14:46:20 pm1 corosync[1425]: [SERV  ] Service engine loaded: corosync profile loading service [4]
May 19 14:46:20 pm1 corosync[1425]: [QUORUM] Using quorum provider corosync_votequorum
May 19 14:46:20 pm1 corosync[1425]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
May 19 14:46:20 pm1 corosync[1425]: [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
May 19 14:47:21 pm1 corosync[1416]: Starting Corosync Cluster Engine (corosync): [FAILED]

And pve-cluster output is looking like:
Code:
May 22 09:47:26 pm1 pmxcfs[1321]: [dcdb] crit: cpg_initialize failed: 2
May 22 09:47:26 pm1 pmxcfs[1321]: [status] crit: cpg_initialize failed: 2
May 22 09:47:32 pm1 pmxcfs[1321]: [quorum] crit: quorum_initialize failed: 2
May 22 09:47:32 pm1 pmxcfs[1321]: [confdb] crit: cmap_initialize failed: 2
May 22 09:47:32 pm1 pmxcfs[1321]: [dcdb] crit: cpg_initialize failed: 2
May 22 09:47:32 pm1 pmxcfs[1321]: [status] crit: cpg_initialize failed: 2
May 22 09:47:38 pm1 pmxcfs[1321]: [quorum] crit: quorum_initialize failed: 2
May 22 09:47:38 pm1 pmxcfs[1321]: [confdb] crit: cmap_initialize failed: 2
May 22 09:47:38 pm1 pmxcfs[1321]: [dcdb] crit: cpg_initialize failed: 2
May 22 09:47:38 pm1 pmxcfs[1321]: [status] crit: cpg_initialize failed: 2
 
Hi you can use this (I have used it last week on PVE 5.0 beta1):

### Removing cluster configuration:

(Perform the following steps on each node)

## stop the services
Code:
systemctl stop pvestatd.service
systemctl stop pvedaemon.service
systemctl stop pve-cluster.service
systemctl stop corosync
systemctl stop pve-cluster

## edit through sqlite, check, delete, verify
Code:
$ sqlite3 /var/lib/pve-cluster/config.db
sqlite> select * from tree where name = 'corosync.conf';
254327|0|254329|0|1480944811|8|corosync.conf|totem {
version: 2
[...]
sqlite> delete from tree where name = 'corosync.conf';
sqlite> select * from tree where name = 'corosync.conf';
sqlite> .quit

## Remove directories
Code:
pmxcfs -l
rm /etc/pve/corosync.conf
rm /etc/corosync/*
rm /var/lib/corosync/*
rm -rf /etc/pve/nodes/*

Dont forget to remove nodes that you dont want from /etc/pve/priv/authorized_keys

Reboot the servers and start again.
 
*This worked.*

But did not mess around with editing sqlite file.

I just removed the whole sqlite db.
Code:
rm -rf /var/lib/pve-cluster/*
It gets recreated after reboot.

Proxmox v6.0-9
 
  • Like
Reactions: tsohr
*This worked.*

But did not mess around with editing sqlite file.

I just removed the whole sqlite db.
Code:
rm -rf /var/lib/pve-cluster/*
It gets recreated after reboot.

Proxmox v6.0-9
I did this and now pveproxy fails to start.

Code:
May 18 14:18:49 pve pveproxy[1912]: starting 1 worker(s)
May 18 14:18:49 pve pveproxy[1912]: worker 3393 started
May 18 14:18:49 pve pveproxy[3393]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1699.
May 18 14:18:49 pve pveproxy[3391]: worker exit
May 18 14:18:49 pve pveproxy[3392]: worker exit
May 18 14:18:49 pve pveproxy[1912]: worker 3391 finished
May 18 14:18:49 pve pveproxy[1912]: starting 1 worker(s)
May 18 14:18:49 pve pveproxy[1912]: worker 3392 finished
May 18 14:18:49 pve pveproxy[1912]: worker 3394 started
May 18 14:18:49 pve pveproxy[3394]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1699.
May 18 14:18:51 pve pvestatd[1877]: authkey rotation error: error with cfs lock 'authkey': pve cluster filesystem not online.

How can one think is OK to design something only with half mgmt control? No cluster delete button, no disk delete button.
 
  • Like
Reactions: tsohr
Don't be so agressive. Did you also follow user micro advice? My edit was only complementary to his solution. And there is storage removal in proxmox UI
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!