[SOLVED] Trying to add a QDevice

puntoboy

New Member
Apr 11, 2024
16
2
3
I've been trying for a few days now to add a QDevice to my PVE setup. I had a single node, and wanted to add a second with ZFR replication. Because of that I wanted to add a QDevice as a third vote.

I Created the PVE cluster, that was all fine but I'm stuck at adding the QDevice. I was getting "Host key verification failed." but after reading a few posted I manually copied the file from the node to the second node. This cleared that error but I still don't think the QDevice is actually doing anything.

The Contents of pvecm status on node1 is below

Code:
root@pve:~# pvecm status
Cluster information
-------------------
Name:             pvc
Config Version:   8
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun May  5 17:19:06 2024
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.311
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.108 (local)
0x00000002          1 192.168.1.109

So I only see 2 votes required and I'm expecting to see 3.

This is what I ran on the QDevice and it's output now.

Code:
root@pve:~# pvecm qdevice setup 192.168.1.150 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found

node 'pve': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve': Creating new key and cert db
node 'pve': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server

INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-pvc.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found
command 'ssh -o 'BatchMode=yes' -lroot 192.168.1.109 corosync-qdevice-net-certutil -m -c /etc/pve/qdevice-net-node.p12' failed: exit code 127
 
I've gone through this a gazillion times today. It keeps failing for me also.

I keep getting key verification failed although I've regenerated the certs and I can SSH between nodes and into 192.168.0.88 just using ssh@ip

Code:
root@proxR86S:~# pvecm qdevice setup 192.168.0.88 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed


/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)




INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db


INFO: copying CA cert and initializing on all nodes
Host key verification failed.
Certificate database already exists. Delete it to continue


INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n ProxCluster1' failed: exit code 1

Have corosync-qdevice installed on all nodes.
 
Last edited:
I've gone through this a gazillion times today. It keeps failing for me also.

I keep getting key verification failed although I've regenerated the certs and I can SSH between nodes and into 192.168.0.88 just using ssh@ip

Code:
root@proxR86S:~# pvecm qdevice setup 192.168.0.88 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed


/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)




INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db


INFO: copying CA cert and initializing on all nodes
Host key verification failed.
Certificate database already exists. Delete it to continue


INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n ProxCluster1' failed: exit code 1

Have corosync-qdevice installed on all nodes.

Simply ssh from between all nodes and qdevice and accept everytime once with "yes"
Its stupid, but it fails because of the first time ssh dialog, where it asks if "you are sure want to connect" or accept the key: yes/no dialog.
So you have to ssh connect once between all nodes and retry the setup then.
 
  • Like
Reactions: tomachi
Simply ssh from between all nodes and qdevice and accept everytime once with "yes"
Its stupid, but it fails because of the first time ssh dialog, where it asks if "you are sure want to connect" or accept the key: yes/no dialog.
So you have to ssh connect once between all nodes and retry the setup then.

Issue resolved, see above. :) thanks
 
  • Like
Reactions: Ramalama
Simply ssh from between all nodes and qdevice and accept everytime once with "yes"
Its stupid, but it fails because of the first time ssh dialog, where it asks if "you are sure want to connect" or accept the key: yes/no dialog.
So you have to ssh connect once between all nodes and retry the setup then.
Yeah no, I thought of that and did that already also if you check my post again :)


But what I did now is I went and ssh-keygen -R on all nodes and qdevice for all of them. Basically went on each and ran the command for each IP node and qdevice.

I then went and SSH-ed from each device to each other device
I then deleted all the nssdb folders on nodes and qdevice
I then reran the command without --force



Code:
root@proxR86S:~# pvecm qdevice setup 192.168.0.88
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed


/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)




INFO: initializing qnetd server
Creating /etc/corosync/qnetd/nssdb
Creating new key and cert db
password file contains no data
Creating new noise file /etc/corosync/qnetd/nssdb/noise.txt
Creating new CA




Generating key.  This may take a few moments...


Is this a CA certificate [y/N]?
Enter the path length constraint, enter to skip [<0 for unlimited path]: > Is this a critical extension [y/N]?




Generating key.  This may take a few moments...


Notice: Trust flag u is set automatically if the private key is present.
QNetd CA certificate is exported as /etc/corosync/qnetd/nssdb/qnetd-cacert.crt


INFO: copying CA cert and initializing on all nodes
Host key verification failed.


node 'proxmox2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'proxmox2': Creating new key and cert db
node 'proxmox2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'proxmox2': Importing CA
INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n ProxCluster1' failed: exit code 1



Also got this error when I reran everything again with force and deleting all the files and some other stuff, basically I'm just throwing stuff at the keyboard at this point:



Code:
root@proxR86S:~# pvecm qdevice setup 192.168.0.88 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
Host key verification failed.

node 'proxmox2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'proxmox2': Creating new key and cert db
node 'proxmox2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'proxmox2': Importing CA
INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n ProxCluster1' failed: exit code 1


While on the other node it looked different. Ran it right after the one above. As you can see this one has error CAHOst key verification failed:

Code:
root@proxmox2:~# pvecm qdevice setup 192.168.0.88 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed


/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)




INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db


INFO: copying CA cert and initializing on all nodes


node 'proxR86S': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'proxR86S': Creating new key and cert db
node 'proxR86S': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'proxR86S': Importing CAHost key verification failed.


INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n ProxCluster1' failed: exit code 1



EDIT: FINALY!!

So what I had to do was also add keys for all hostnames and alternate ways of connecting. Can't figure out which way it was trying to connect so I just went and SSH-ed into all the ways all across everything.

Also changed the hosts file to a completely simple .local everything.
 
Last edited:
  • Like
Reactions: Ramalama
My setup seems to also be failing at the `scp` phase. my personal theory was due not having a /etc/pve folder on the machine I intend to run qdevice on. Also how do I very that daemon?


Executing /etc/init.d/corosync-qnetd start ..

Starting corosync-qnetd (via systemctl): corosync-qnetd.serviceJob for corosync-qnetd.service failed because the control process exited with error code.
See "systemctl status corosync-qnetd.service" and "journalctl -xeu corosync-qnetd.service" for details.
failed!

Lately I've been chown'ing /etc as my user account (on my Kubuntu workstation).
If I knew were the NSS DB directory lives I could check it. Re:

`Jul 18 05:06:23 putin corosync-qnetd[117430]: Can't open NSS DB directory (13): Permission denied`

Code:
ls -la /etc/corosync/qnetd/nssdb 
total 100
drwxrwx--- 2 root root 133 Jul 17 02:49 .
drwxr-xr-x 3 root root 19 Jul 17 02:49 ..
-rw-rw---- 1 root root 28672 Jul 17 02:49 cert9.db
-rw-rw---- 1 root root 53248 Jul 17 02:49 key4.db
-rw-rw---- 1 root root    41 Jul 17 02:49 noise.txt
-rw-rw---- 1 root root   432 Jul 17 02:49 pkcs11.txt
-rw-rw---- 1 root root     0 Jul 17 02:49 pwdfile.txt
-rw-r--r-- 1 root root  4272 Jul 17 02:49 qnetd-cacert.crt
-rw-rw---- 1 root root     4 Jul 17 02:49 serial.txt


× corosync-qnetd.service - Corosync Qdevice Network daemon
Loaded: loaded (/lib/systemd/system/corosync-qnetd.service; enabled; preset: enabled)
Active: failed (Result: exit-code) since Thu 2024-07-18 05:06:23 NZST; 1min 5s ago
Docs: man:corosync-qnetd
Process: 117430 ExecStart=/usr/bin/corosync-qnetd -f $COROSYNC_QNETD_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 117430 (code=exited, status=1/FAILURE)
CPU: 9ms

Jul 18 05:06:23 putin systemd[1]: Starting corosync-qnetd.service - Corosync Qdevice Network daemon...
Jul 18 05:06:23 putin corosync-qnetd[117430]: Can't open NSS DB directory (13): Permission denied
Jul 18 05:06:23 putin systemd[1]: corosync-qnetd.service: Main process exited, code=exited, status=1/FAILURE
Jul 18 05:06:23 putin systemd[1]: corosync-qnetd.service: Failed with result 'exit-code'.
Jul 18 05:06:23 putin systemd[1]: Failed to start corosync-qnetd.service - Corosync Qdevice Network daemon.

Ah, corosync wants to run with:
`User=coroqnetd`
So I checked and this user has home dir of /etc/corosync/qnetd and a shell of /usr/sbin/nologin

I will try giving this robot a shell...
 
So i do a `su coroqnetd` and then cd ~/nssdb and was denied.

Then I dabble with a litte...
Code:
chgrp -Rv coroqnetd /etc/corosync

I felt certain that the following would work but no luck. Then I...
Code:
mkdir /etc/pve
root@putin:/# chown coroqnetd /etc/pve
root@putin:/# su coroqnetd
[coroqnetd@putin]$ cd /etc/pve


command 'scp -o 'BatchMode=yes' 'root@[10.0.0.11]:/etc/corosync/qnetd/nssdb/qnetd-cacert.crt'
/etc/pve/qnetd-cacert.crt' failed: exit code 255

Running that command myself from the proxmox shell, yields:

/etc/pve/qnetd-cacert.crt: No such file or directory
I just noticed my /etc/pve is totally empty on this workstation. Beats me why proxmox doesn't re-gen my keys when it logs in as root. In my auth.log:

2024-07-18T05:26:56.047019+12:00 putin sshd[131866]: subsystem request for sftp by user root failed, subsystem not found

Hmmmm, that's weird, SFTP is not up, it was disabled. Then I uncomment:

Subsystem sftp /usr/lib/openssh/sftp-server
systemctl reload ssh
/sbin/corosync-qdevice-net-certutil -i

no such command? Finally, the correct filename for me is, due to my uniquely relevant cluster name:
corosync-qnetd-certutil -i -n fucker

Oh boy I will have to try again tomorrow. Nearly there.
rm -v /etc/corosync/qnetd/nssdb/*
rm -v /etc/pve/*

I think I am confused as to which machine the software refers to in this error message during that ssh session where it moves the keys about.
 
Simply ssh from between all nodes and qdevice and accept everytime once with "yes"

This is good advice. Also, verify your SFTP subsystem is running OK by transferring a file via ssh/scp. Verify permissions for the user coroqnetd.

I was finally able to get it going after refreshing my keys a few times and rebooting my workstation. Pretty sure it's working but no idea how to verify.

So to test....I am about to reboot one of the cluster members, after first stopping the cluster with :
systemctl stop pve-cluster
After the reboot I will use
systemctl start pve-cluster
If that goes well I will repeat the process on the other node.

Any better to verify the quorum do you think?

Code:
pvecm status
Cluster information
-------------------
Name:             fucker
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jul 19 16:20:55 2024
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.86
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate Qdevice 

Membership information
----------------------
Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 10.0.0.5
0x00000002          1    A,V,NMW 10.0.0.43 (local)
0x00000000          1            Qdevice
root@hulk:~#

Seems odd the Qdevice is not showing an IP.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!