[SOLVED] Error when adding QDevice to existing cluster

framf

New Member
May 8, 2024
2
0
1
Hi everyone,

I'm currently facing a problem when adding a QDevice to a 2-node cluster.

Current cluster: 2x Proxmox VE 8.2.2
QDevice: 1x Proxmox Backup-Server 3.2-2

- I can remote from both nodes into PBS as root
- corosync-qdevice is installed on both nodes
- corosync-qnetd and corosync-qdevice is installed on PBS

when I try to add the QDevice to my cluster i get the following error:

Code:
root@proxmox-n1:~# pvecm qdevice setup 10.0.0.3
user config - ignore invalid acl token 'user1@pve!migrate'
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed


/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)




INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db


INFO: copying CA cert and initializing on all nodes
Host key verification failed.
Host key verification failed.


INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n proxmox1' failed: exit code 1

I discovered that the corosync-qdevice.service isn't properly running on both nodes.

Code:
root@proxmox-n1:~# systemctl status corosync-qdevice.service
× corosync-qdevice.service - Corosync Qdevice daemon
     Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; disabled; preset: enabled)
     Active: failed (Result: exit-code) since Mon 2024-05-27 11:47:02 CEST; 1h 10min ago
       Docs: man:corosync-qdevice
   Main PID: 74240 (code=exited, status=1/FAILURE)
        CPU: 8ms

May 27 11:47:02 proxmox-n1 systemd[1]: corosync-qdevice.service: Scheduled restart job, restart counter is at 5.
May 27 11:47:02 proxmox-n1 systemd[1]: Stopped corosync-qdevice.service - Corosync Qdevice daemon.
May 27 11:47:02 proxmox-n1 systemd[1]: corosync-qdevice.service: Start request repeated too quickly.
May 27 11:47:02 proxmox-n1 systemd[1]: corosync-qdevice.service: Failed with result 'exit-code'.
May 27 11:47:02 proxmox-n1 systemd[1]: Failed to start corosync-qdevice.service - Corosync Qdevice daemon.

When I try to run corosync-qdevice manually, I get this error:

Code:
root@proxmox-n1:~# /usr/sbin/corosync-qdevice -f -d
May 27 12:57:19 debug   Initializing votequorum
May 27 12:57:19 debug   Initializing local socket
May 27 12:57:19 debug   Registering qdevice models
May 27 12:57:19 debug   Configuring qdevice
May 27 12:57:19 error   Can't read quorum.device.model cmap key.

I already tried purging and reinstalling the packages on both nodes and the PBS. I also rebooted every device just to make sure.
On my PBS corosync-qnetd is running and corosync-qdevice.service is inactive.

Maybe some of you have already encountered this problem and can help me fix this issue.

Greetings!