[SOLVED] 'pvecm qdevice setup' fails

joachim

Member
Nov 19, 2015
19
0
21
Hi,

Trying to set up QDevice for a 2-node PVE cluster.

I've installed corosync-qnetd on a Raspberry Pi, and corosync-qdevice on both PVE nodes.

When trying to run the configuration from one of the PVE nodes, it fails;

Code:
root@gridlock:~# pvecm qdevice setup 2001:123:123:123::123
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
        (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
Certificate database already exists. Delete it to continue
Host key verification failed.

INFO: generating cert request
Certificate database doesn't exists. Use /usr/sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n pve-cluster1' failed: exit code 1


The install log from QDevice on the Raspberry Pi;

Code:
Preparing to unpack .../corosync-qnetd_3.0.0-4+deb10u1_armhf.deb ...
Unpacking corosync-qnetd (3.0.0-4+deb10u1) ...
Setting up corosync-qnetd (3.0.0-4+deb10u1) ...
Creating /etc/corosync/qnetd/nssdb
Creating new key and cert db
password file contains no data
Creating new noise file /etc/corosync/qnetd/nssdb/noise.txt
Creating new CA


Generating key.  This may take a few moments...

Is this a CA certificate [y/N]?
Enter the path length constraint, enter to skip [<0 for unlimited path]: > Is this a critical extension [y/N]?


Generating key.  This may take a few moments...

Notice: Trust flag u is set automatically if the private key is present.
QNetd CA certificate is exported as /etc/corosync/qnetd/nssdb/qnetd-cacert.crt
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qnetd.service → /lib/systemd/system/corosync-qnetd.service.
Processing triggers for man-db (2.8.5-2) ...
Processing triggers for systemd (241-7~deb10u5+rpi1) ...

qnetd service running on the Raspberry Pi:

Code:
root@gumpii:~# systemctl status corosync-qnetd
● corosync-qnetd.service - Corosync Qdevice Network daemon
   Loaded: loaded (/lib/systemd/system/corosync-qnetd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2021-05-05 02:13:35 CEST; 32min ago
     Docs: man:corosync-qnetd
 Main PID: 22326 (corosync-qnetd)
    Tasks: 1 (limit: 2063)
   CGroup: /system.slice/corosync-qnetd.service
           └─22326 /usr/bin/corosync-qnetd -f

May 05 02:13:35 gumpii systemd[1]: Starting Corosync Qdevice Network daemon...
May 05 02:13:35 gumpii systemd[1]: Started Corosync Qdevice Network daemon.
 
Last edited:
hi,

what do you see from pvecm status?

if you see the qdevice but it doesn't have a vote, please try following:
Code:
pvecm qdevice remove
pvecm qdevice setup <IP> --force

and then check cluster status again with pvecm status
 
hi,

what do you see from pvecm status?

if you see the qdevice but it doesn't have a vote, please try following:
Code:
pvecm qdevice remove
pvecm qdevice setup <IP> --force

and then check cluster status again with pvecm status

Hi,

No QDevice is configured;

Code:
root@gridlock:~# pvecm status
Cluster information
-------------------
Name:             pve-cluster1
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed May  5 13:14:43 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.9
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 2000:123:123:123::10%32635 (local)
0x00000002          1 2000:123:123:123::11%32635
root@gridlock:~# pvecm qdevice remove
error during cfs-locked 'file-corosync_conf' operation: No QDevice configured!
 
have you enabled PermitRootLogin yes on your /etc/ssh/sshd_config in the Pi and restarted sshd?

you should be able to ssh from your node to there with a password, which you type when you're setting up the qdevice via pvecm qdevice setup <IP>
 
have you enabled PermitRootLogin yes on your /etc/ssh/sshd_config in the Pi and restarted sshd?

you should be able to ssh from your node to there with a password, which you type when you're setting up the qdevice via pvecm qdevice setup <IP>

Yes, that works just fine. If you look at the logs in my first post, you can see that ssh-copy-id "complains" that the SSH-key already exists on the target system (impling that it a) can log in, and b) the SSH-key is already present). Doing a manual login using password also still works.
 
and is there a key there already from before maybe?
 
and is there a key there already from before maybe?

Yes, there are multiple keys there (for other logins). Not sure how that is relevant? The SSH-key is present, and SSH-key based login from the two PVE nodes works just fine...
 
please remove all the keys from there and try to run the setup again. you can add your keys afterwards
 
please remove all the keys from there and try to run the setup again. you can add your keys afterwards
Tried that, but still the same;


Code:
root@gridlock:~# pvecm qdevice setup 2000:123:123:123::210
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@2001:67c:197c:110::210's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@2000:123:123:123::210'"
and check to make sure that only the key(s) you wanted were added.


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
Certificate database already exists. Delete it to continue
Host key verification failed.

INFO: generating cert request
Certificate database doesn't exists. Use /usr/sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n pve-cluster1' failed: exit code 1
 
No other ideas?

What logic lies behind pvecm qdevice setup? Is there some manual steps that can be done to set it up, rather than relying on that command?
 
What logic lies behind pvecm qdevice setup? Is there some manual steps that can be done to set it up, rather than relying on that command?
yes you can also set it up manually but the recommended way is by using pvecm. the qdevice code in pvecm automates the process, normally it can be done by using corosync-qdevice-net-certutil and corosync-qnetd-certutil

in your log i see the following:
Code:
INFO: copying CA cert and initializing on all nodes
Certificate database already exists. Delete it to continue
Host key verification failed.

but when i run the commands following the wiki [0] in my test setup, it works:

Code:
INFO: copying CA cert and initializing on all nodes

node 'pve-1': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve-1': Creating new key and cert db
node 'pve-1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve-1': Importing CA
node 'pve-2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve-2': Creating new key and cert db
node 'pve-2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve-2': Importing CA
INFO: generating cert request
Creating new certificate request

please make sure you follow the wiki steps.
i can even remove it and add it back again without any issues here.

what i did:
1. set up 2 PVE nodes with latest versions installed
2. cluster them
3. set up clean debian machine running buster (is your raspi also running buster?)
4. debian: PermitRootLogin yes in /etc/ssh/sshd_config, systemctl restart sshd
5. debian: apt install corosync-qnetd
6. both PVE nodes: apt install corosync-qdevice
7. make sure i can SSH from both nodes to the debian machine as root with password
8. on one of the nodes: pvecm qdevice setup <debian_ip>

and it works. here's the full log from my side:

Code:
root@pve-2:~# pvecm qdevice setup 192.168.22.159
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes

node 'pve-1': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve-1': Creating new key and cert db
node 'pve-1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve-1': Importing CA
node 'pve-2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve-2': Creating new key and cert db
node 'pve-2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve-2': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server

INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-mycluster.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'pve-1': Importing cluster certificate and key
node 'pve-1': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'pve-2': Importing cluster certificate and key
node 'pve-2': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'pve-1'...
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.                                                                                                              

INFO: start and enable corosync qdevice daemon on node 'pve-2'...
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.                                                                                                              
Reloading corosync.conf...
Done


Code:
root@pve-2:~# pvecm status
Cluster information
-------------------
Name:             mycluster
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu May 20 16:25:03 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.19
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate Qdevice 

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.22.155
0x00000002          1    A,V,NMW 192.168.22.156 (local)
0x00000000          1            Qdevice


[0]: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support
 
No other ideas?
can you SSH between your nodes?

for example from node1 to node2, you should be able to just run ssh node2_ip and it should login without any interaction from your side.
 
  • Like
Reactions: HPE
  • I use two Proxmox-nodes (node1 + node2)
  • I use a Raspberry Pi as the third device (ext1)
  • All three nodes run Buster
  • I can SSH without issues from node1 to node2 using IP, FQDN and just hostname. I get root shell and no SSH client warnings.
  • I can SSH without issues from node2 to node1 using IP, FQDN and just hostname. I get root shell and no SSH client warnings.
  • I can SSH without issues from node1 and node2 to ext1 using IP and FQDN. I get root shell and no SSH client warnings.
  • I can SSH from ext1 to node1 and node2 without SSH client warnings (but it prompts for password, as the SSH-keys are not copied that way).
  • I've tried to uninstall + purge corosync-qnetd and corosync-qdevice from all three nodes, and then reinstall. Same issue.
It complains that command corosync-qdevice-net-certutil -r -n pve-cluster1 fails with exit code 1 when running pvecm qdevice setup on node1. I assume the corosync-qdevice-net-certutil command is ran on the PVE node that I executed the pvecm qdevice setup command on? (as the corosync-qdevice-net-certutil binary is only included in the corosync-qdevice, which is only installed on the PVE nodes).

Trying to execute this command manually on node1 (where I ran the pvecm qdevice setup command), I get the following error;
Code:
root@node1:~# corosync-qdevice-net-certutil -r -n pve-cluster1
Certificate database doesn't exists. Use /usr/sbin/corosync-qdevice-net-certutil -i to create it

Trying to create the database, I get the following error;
Code:
root@node1:~# corosync-qdevice-net-certutil -i
Can't open certificate file

Tried to run the pvecm qdevice setup on node2 as well;
Code:
root@node2:~# pvecm qdevice setup 2001::210
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
        (if you think this is a mistake, you may want to use -f option)

QDevice certificate store already initialised, set force to delete!

Tried to re-run it again with the force flag yields some more action, but alas, it fails with the same issue as on node1;
Code:
root@node2:~# pvecm qdevice setup 2001::210 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
        (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
Host key verification failed.

node 'node1': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'node1': Creating new key and cert db
node 'node1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'node1': Importing CA
INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n pve-cluster1' failed: exit code 1

edit1:

Looking into this a bit further, the manual for corosync-qdevice-net-certutil states the following for the -i parameter;
Initialize the QDevice Net NSS certificate database. The default directory for the database is /etc/corosync/qdevice/net/. This directory has to be writable by the current user. It needs the QNetd CA certificate passed as the -c parameter. This certificate can be found on the server running QNetd in the file /etc/corosync/qnetd/nssdb/qnetd-cacert.crt.

Note the requirement for the -c parameter. It also says that the certificate is running on the QNetd device (ext1 in my case), so I'm not sure how all of this is supposed to be glued together.

Looking at some other sources, namely this and this, there seems to be some confusion on what packages are to be installed on what nodes. The documentation and wiki states that corosync-qnetd is ONLY to be installed on the external server (ext1 in my case), and corosync-qdevice should ONLY be installed on the PVE nodes (node1 and node2 in my case). However, these other sources claim that both packages are to be installed on all nodes (node1, node2, ext1), or that both packages are to be installed on all PVE nodes (node1 and node2), while the external only should get corosync-qnetd.

What is actually correct here?


edit2:

Seems like maybe the steps are as following;
  1. Copy SSH-keys
  2. Initialize qnetd database on external node (ext1)
  3. Copy the qnetd CA cert from external node (ext1) to all cluster nodes (node1 and node2)
  4. Do some stuff with corosync-qdevice-net-certutil
  5. Do some more stuff
Step 1 and 2 seems to go OK for me. Step 3 seems to maybe silently failing (given the Host key verification failed. error). When trying to do step 4, it fails due to missing CA cert that was supposed to be copied in step 3.
 
Last edited:
And I found the error, as part of the discussion here. Had to run pvecm updatecerts on all of the PVE nodes, and everything worked flawlessly. Maybe that should be added to the documentation/wiki in regards to people getting the Host key verification failed. errors (there seems to be more than me (= ).

Code:
root@node1:~# pvecm qdevice setup 2001::210
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
        (if you think this is a mistake, you may want to use -f option)

QDevice certificate store already initialised, set force to delete!

root@node1:~# pvecm qdevice setup 2001::210 --force
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
        (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes

node 'node2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'node2': Creating new key and cert db
node 'node2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'node2': Importing CA
node 'node1': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'node1': Creating new key and cert db
node 'node1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'node1': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server

INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-pve-cluster1.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'node2': Importing cluster certificate and key
node 'node2': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'node1': Importing cluster certificate and key
node 'node1': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'node2'...
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.

INFO: start and enable corosync qdevice daemon on node 'node1'...
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.
Reloading corosync.conf...
Done

Code:
root@node1:~# pvecm status
Cluster information
-------------------
Name:             pve-cluster1
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jun 18 16:53:13 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.3d
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 2001::10%32703 (local)
0x00000002          1    A,V,NMW 2001::11%32703
0x00000000          1            Qdevice

Code:
root@node2:~# pvecm status
Cluster information
-------------------
Name:             pve-cluster1
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jun 18 16:53:16 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.3d
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 2001::10%32657
0x00000002          1    A,V,NMW 2001::11%32657 (local)
0x00000000          1            Qdevice
 
Last edited:
And I found the error, as part of the discussion here. Had to run pvecm updatecerts on all of the PVE nodes, and everything worked flawlessly. Maybe that should be added to the documentation/wiki in regards to people getting the Host key verification failed. errors (there seems to be more than me (= ).
thanks for sharing the solution! i'll make sure to add that into the documentation :)

you can mark your thread [SOLVED] by editing the thread prefix (click "Edit thread" on top right)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!