[SOLVED] Issue with qdevice installation after adding nodes to the cluster with existing qdevice

MH_MUC

Well-Known Member
May 24, 2019
67
6
48
37
Hi,
I am running a two node cluster with qdevice.
I want to replace the existing servers by 2 new ones so I added the 2 new machines to the cluster to migrate the CTs/VMs and to remove the existing ones afterwards.

I noticed that the qdevice wasn't shown in the pvecm status of the new servers so I removed the qdevice and wanted to add it again. Somehow this is always failing.
Here is what I did / know so far:
I updated the certificates on all cluster nodes with pvecm updatecerts.

pvecm status shows (on all nodes, where .160 and .25 are the two new ones):
Code:
Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1  NA,NV,NMW 91.XXX.XXX.7
0x00000002          1  NA,NV,NMW 91.XXX.XXX.28
0x00000003          1         NR 91.XXX.XXX.160 (local)
0x00000004          1         NR 91.XXX.XXX.25
0x00000000          0            Qdevice (votes 0)

When trying to remove the qdevice I get:
Code:
pvecm qdevice remove
No QDevice configured!

I tried to purge the qdevice package on the machine that is running the vote support package (apt-get purge corosync-qnetd) and reinstalled it.
When trying to add the qdevice again it fails with:
Code:
pvecm qdevice setup 91.XXX.XXX.18
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)

QDevice certificate store already initialised, set force to delete!
So I tried it with the -f option
Code:
root@server20:/home/max# pvecm qdevice setup 91.XXX.XXX.18 -f
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes

node 'server20': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'server20': Creating new key and cert db
node 'server20': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'server20': Importing CACertificate database already exists. Delete it to continue
Certificate database already exists. Delete it to continue
Certificate database already exists. Delete it to continue

INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server

INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-cluster01.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'server20': Importing cluster certificate and key
node 'server20': pk12util: PKCS12 IMPORT SUCCESSFUL
pk12util: PKCS12 decode import bags failed: SEC_ERROR_REUSED_ISSUER_AND_SERIAL: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
command 'ssh -o 'BatchMode=yes' -lroot 91.XXX.XXX.28 corosync-qdevice-net-certutil -m -c /etc/pve/qdevice-net-node.p12' failed: exit code 19
node 'server25': Importing cluster certificate and keyroot

How can I solve this issue?
As soon as I am down to two servers again on this cluster I need the qdevice to be working
 
This:

Code:
0x00000000          0            Qdevice (votes 0)

Means that the qdevice is removed, meaning it has no votes anymore. Once you add a qdevice to a cluster it will always show on pvecm status, varying just the amounts of votes it has for quorum.

Did you apt install corosync-qdevice on the two new servers? You will need it to re-enable the qdevice.
 
Thank you for the reply.

I installed the packages on the new nodes. However when trying to add the qdevice again it fails as quoted in my last quote on my first post. Any idea?

I think I have to manually remove the remains of the qdevice as I failed to remove it before adding the new nodes.

Thank you!
 
Actually I solved it myself.
It is hard to find out which server is causing the issue. I had this errormessage:
Code:
node 'server20': Importing cluster certificate and key
node 'server20': pk12util: PKCS12 IMPORT SUCCESSFUL
pk12util: PKCS12 decode import bags failed: SEC_ERROR_REUSED_ISSUER_AND_SERIAL: You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert.
command 'ssh -o 'BatchMode=yes' -lroot 91.XXX.XXX.28 corosync-qdevice-net-certutil -m -c /etc/pve/qdevice-net-node.p12' failed: exit code 19

So I had to find the server with the .28 IP. It was server25 in my case.
The apt purge didn't remove the configs in /etc/corosync/qdevice/net/nssdb/
I deleted the content of that folder. Thereafter adding the qdevice worked just fine. I don't know what the sense of the --force method is if the existing configs aren't overwritten, but I am glad it is working again.

Cheers!