[SOLVED] Cannot setup qdevice after node reinstall

May 23, 2012
19
0
41
Hi forum.
I'm having trouble getting qdevice to work after reinstalling 2 out of 4 cluster nodes.
I forget to delete the qdevice before reinstalling.
Anyways I guess there's some way to do this in case a node is damaged and qdevice cannot be removed before reinstall.

the command pvecm qdevice remove yields:

error during cfs-locked 'file-corosync_conf' operation: No QDevice configured!

but pvecm status shows that:

Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate Qdevice

Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 NA,NV,NMW xxxxxx
0x00000002 1 NA,NV,NMW xxxxxx
0x00000003 1 NA,NV,NMW xxxxxx (local)
0x00000004 1 NA,NV,NMW xxxxxx
0x00000000 0 Qdevice (votes 0)


A reinstall of qdevice fails too trying pvecm qdevice setup xxxxxx

/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
Certificate database already exists. Delete it to continue
Certificate database already exists. Delete it to continue
Host key verification failed.
Certificate database already exists. Delete it to continue

INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n iccbroadcast' failed: exit code 1


Now, two nodes can run corosync-qdevice while the reinstalled ones can't bring it up.
How could I remove the whole qdevice thing from the cluster and set it up fresh new after all nodes are reinstalled?

Thank you very much in advance.
Best regards.
 
The main problem was, surprisingly, the classic missing SSH key in host file...
By issuing the command: ssh -o 'HostKeyAlias=X.X.X.X root@X.X.X.X (I'm using IPs not hostnames) on every node ... the SSH part was solved.

On the other hand, resetting the qdevice setup (thus, making it able to try to set it up again) can be done apparently safely removing /etc/corosync/qdevice/net/nssdb ... so rm -rf /etc/corosync/qdevice/net/nssdb on every node

Then:

pvecm qdevice setup A.B.C.D

And finally:

pvecm qdevice remove


Now I can keep reinstalling the remaining cluster nodes.
As a suggestion, since I though I would never see those classic early-PROXMOX SSH-key troubles (PROXMOX has matured a lot :-) I would consider to make the pvecm command node deletion procedure to consider the qdevice status/presence on the cluster, so at least, it may warn the administrator and give it the option to cancel and remove the qdevice setup first, or try to handle the qdevice cluster situation... Also I think the documentation should clearly WARN about the NECESSITY of qdevice deletion prior to node operations to prevent foreseeable/known troubles if not done, not just state/mention this as a trivial thing.

Regards.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!