[SOLVED] QDevice unable to start, no corosync config generated -- Port 5403/tcp required, not open

Twitchytoes

New Member
Oct 4, 2023
9
1
3
I currently have a setup with 2 PVE nodes, a Lenovo thin client running CentOS 9 (this hosts a NFS for HA VMs/CTs and is the Qdevice). Upon setting up the Qdevice, everything seems to go well, but the output of
Code:
pvecm status
shows no vote output. Once I checked for the corosync service to be running on the Qdevice, it is not. Journalctl -xeu output shows no corosync config. I have tried adding the config from the nodes to it, which then prompts an error for no authkey. I'm unsure of where I'm failing to generate config.

Output of pvecm status
Code:
Cluster information
-------------------
Name:             TwitchyCluster
Config Version:   9
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Dec 20 07:44:52 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.325
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1   A,NV,NMW 192.168.50.6
0x00000002          1   A,NV,NMW 192.168.50.11 (local)
0x00000000          0            Qdevice (votes 1)

Output of journalctl -xeu corosync.service
Code:
Dec 20 07:45:16 Lenovo systemd[1]: Starting Corosync Cluster Engine...
░░ Subject: A start job for unit corosync.service has begun execution
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit corosync.service has begun execution.
░░
░░ The job identifier is 9511.
Dec 20 07:45:16 Lenovo corosync[7278]: Can't read file /etc/corosync/corosync.conf: No such file or directory
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ An ExecStart= process belonging to unit corosync.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 8.
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit corosync.service has entered the 'failed' state with result 'exit-code'.
Dec 20 07:45:16 Lenovo systemd[1]: Failed to start Corosync Cluster Engine.
░░ Subject: A start job for unit corosync.service has failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit corosync.service has finished with a failure.
░░
░░ The job identifier is 9511 and the job result is failed.

Output from one node for systemctl status corosync-qdevice.service
Code:
● corosync-qdevice.service - Corosync Qdevice daemon
     Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-20 07:51:15 CST; 11min ago
       Docs: man:corosync-qdevice
   Main PID: 20174 (corosync-qdevic)
      Tasks: 2 (limit: 76712)
     Memory: 1.4M
        CPU: 81ms
     CGroup: /system.slice/corosync-qdevice.service
             ├─20174 /usr/sbin/corosync-qdevice -f
             └─20175 /usr/sbin/corosync-qdevice -f

Dec 20 08:01:48 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:01:48 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 08:01:56 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:01:56 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 08:02:04 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:02:04 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 08:02:12 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:02:12 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 08:02:20 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:02:20 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)

Please let me know if any other info is required and I'll add as soon as possible.
 
I currently have a setup with 2 PVE nodes, a Lenovo thin client running CentOS 9 (this hosts a NFS for HA VMs/CTs and is the Qdevice). Upon setting up the Qdevice, everything seems to go well

How did you setup the Qdevice, any output you can find from that process (scrolling up in the terminal window)?
 
How did you setup the Qdevice, any output you can find from that process (scrolling up in the terminal window)?
Sure thing, I've still got the terminals open. Output of pvecm qdevice setup 192.168.50.14 --force below.

Code:
root@pve:/home# pvecm qdevice setup 192.168.50.14 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '192.168.50.14 (192.168.50.14)' can't be established.
ED25519 key fingerprint is SHA256:jlvTdtNDV6utlNFe10crLbl21nW0StNmBhUgZBkF+xg.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes 
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Authorized uses only. All activity may be monitored and reported.
Creating /etc/corosync/qnetd/nssdb
Creating new key and cert db
password file contains no data
Creating new noise file /etc/corosync/qnetd/nssdb/noise.txt
Creating new CA


Generating key.  This may take a few moments...

Is this a CA certificate [y/N]?
Enter the path length constraint, enter to skip [<0 for unlimited path]: > Is this a critical extension [y/N]?


Generating key.  This may take a few moments...

Notice: Trust flag u is set automatically if the private key is present.
QNetd CA certificate is exported as /etc/corosync/qnetd/nssdb/qnetd-cacert.crt

INFO: copying CA cert and initializing on all nodes
Authorized uses only. All activity may be monitored and reported.

node 'pve': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve': Creating new key and cert db
node 'pve': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve': Importing CA
node 'twitchycube': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'twitchycube': Creating new key and cert db
node 'twitchycube': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'twitchycube': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server
Authorized uses only. All activity may be monitored and reported.

INFO: sign and export cluster cert
Authorized uses only. All activity may be monitored and reported.
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-TwitchyCluster.crt

INFO: copy exported CRT
Authorized uses only. All activity may be monitored and reported.

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'pve': Importing cluster certificate and key
node 'pve': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'twitchycube': Importing cluster certificate and key
node 'twitchycube': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'pve'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.

INFO: start and enable corosync qdevice daemon on node 'twitchycube'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.
Reloading corosync.conf...
Done
 
Just to be sure, can you redo it?

Code:
pvecm qdevice remove
pvecm qdevice setup 192.168.50.14 --force

Also, on both nodes, can you ensure apt install corosync-qdevice is installed?

You can ssh as root (from the nodes) into the QDevice machine, correct?
 
Hello, did you carefully follow [1]?

You need to install `corosync-qnetd` on the qdevice and `corosync-qdevice` on *all* the nodes of the cluster.

`A,NV,NMW` from your output suggests that the QDevice is not casting votes, you can see more info on these flags on [1] too.

[1] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_qdevice_net_setup
 
  • Like
Reactions: esi_y
Hello, did you carefully follow [1]?

You need to install `corosync-qnetd` on the qdevice and `corosync-qdevice` on *all* the nodes of the cluster.

`A,NV,NMW` from your output suggests that the QDevice is not casting votes, you can see more info on these flags on [1] too.

[1] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_qdevice_net_setup
Looks like I missed corosync-qnetd on the nodes... I've installed that, rerun qdevice remove and setup. I'm able to ssh into the Qdevice from any device I've tried as root. The nodes are sshing without password, so the certs appear to be working as well.


Code:
root@pve:~# pvecm qdevice remove
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable corosync-qdevice
Removed "/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service".
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable corosync-qdevice
Removed "/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service".
Reloading corosync.conf...
Done

Removed Qdevice.
root@pve:~# pvecm qdevice setup 192.168.50.14 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Authorized uses only. All activity may be monitored and reported.
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
Authorized uses only. All activity may be monitored and reported.

node 'pve': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve': Creating new key and cert db
node 'pve': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve': Importing CA
node 'twitchycube': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'twitchycube': Creating new key and cert db
node 'twitchycube': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'twitchycube': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server
Authorized uses only. All activity may be monitored and reported.

INFO: sign and export cluster cert
Authorized uses only. All activity may be monitored and reported.
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-TwitchyCluster.crt

INFO: copy exported CRT
Authorized uses only. All activity may be monitored and reported.

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'pve': Importing cluster certificate and key
node 'pve': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'twitchycube': Importing cluster certificate and key
node 'twitchycube': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'pve'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.

INFO: start and enable corosync qdevice daemon on node 'twitchycube'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.
Reloading corosync.conf...
Done
root@pve:~# pvecm status
Cluster information
-------------------
Name:             TwitchyCluster
Config Version:   13
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Dec 20 10:03:16 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.332
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1   A,NV,NMW 192.168.50.6 (local)
0x00000002          1  NA,NV,NMW 192.168.50.11
0x00000000          0            Qdevice (votes 1)

Still getting a lack of votes. The Qdevice is still unable to start up the corosync service due to no config, I'm assuming this is where it's failing to be able to vote. Should the command pvecm qdevice setup IP --force be creating an config on the Qdevice itself?
 
Looks like I missed corosync-qnetd on the nodes... I've installed that, rerun qdevice remove and setup. I'm able to ssh into the Qdevice from any device I've tried as root. The nodes are sshing without password, so the certs appear to be working as well.

I think it's getting messed up here, the corosync-qnetd should be on your QD machine, the corosync-qdevice should be on BOTH of the nodes.

Should the command pvecm qdevice setup IP --force be creating an config on the Qdevice itself?

No, this is to be run on any node, you do it correctly.
 
Do you have the Node 2 (.50.11) also with corosync-qdevice installed?
Node 2 (50.11) has the following corosync packages:
Code:
corosync-qdevice/stable,now 3.0.3-1 amd64 [installed]
corosync-qnetd/stable,now 3.0.3-1 amd64 [installed]
corosync/stable,now 3.1.7-pve3 amd64 [installed]
libcorosync-common4/stable,now 3.1.7-pve3 amd64 [installed]

Node 1 (50.6) has the following corosync packages:
Code:
corosync-qdevice/stable,now 3.0.3-1 amd64 [installed]
corosync-qnetd/stable,now 3.0.3-1 amd64 [installed]
corosync/stable,now 3.1.7-pve3 amd64 [installed]
libcorosync-common4/stable,now 3.1.7-pve3 amd64 [installed]
\

The qdevice (50.14) has the following corosync packages:
Code:
corosync.x86_64                                  3.1.8-1.el9                      @highavailability
corosync-qdevice.x86_64                          3.0.2-2.el9                      @highavailability
corosync-qnetd.x86_64                            3.0.2-2.el9                      @highavailability
corosynclib.x86_64                               3.1.8-1.el9                      @appstream

I think it's getting messed up here, the corosync-qnetd should be on your QD machine, the corosync-qdevice should be on BOTH of the nodes.



No, this is to be run on any node, you do it correctly.
To further clarify, I'm running the command pvecm qdevice setup 192.168.50.14 --force on the node, but I am not getting any corosync config generated on the qdevice@192.168.50.14. My question was if that command should be generating a config there. My apologies if I'm not being very clear.
 
To further clarify, I'm running the command pvecm qdevice setup 192.168.50.14 --force on the node, but I am not getting any corosync config generated on the qdevice@192.168.50.14. My question was if that command should be generating a config there. My apologies if I'm not being very clear.

Sorry myself, I misread your question. Nope, this is not making corosync.conf on the QD, it's populating /etc/corosync/qnetd/...

But! I think I know what's up with the CentOS, considering that you installed what you have shown you did, in fact the only necessary is:
corosync-qnetd.x86_64 3.0.2-2.el9 @highavailability

Then the only thing you are missing would be literally:
firewall-cmd --zone=public --add-port=5403/tcp

If that was it, please confirm and I would suggest to @Maximiliano to add this port number to the docs, it's not the usual 5404/5/6...

Also, if you want to check if the service is up or what might have gone wrong, the one to look at is:
journalctl -xeu corosync-qnetd

Sorry I missed it in your original (really comprehensive) post.

EDIT: All the remarks above concern your QDevice (it's obviously the CentOS in this thread, but someone might find it later and start reading not from the start.)

EDIT2: I would remove all the superfluous packages on the QDevice, similarly, you only need the following on the nodes:
Code:
corosync-qdevice/stable,now 3.0.3-1 amd64 [installed]
corosync/now 3.1.7-pve3 amd64 [installed,local]
libcorosync-common4/now 3.1.7-pve3 amd64 [installed,local]
 
Last edited:
Sorry myself, I misread your question. Nope, this is not making corosync.conf on the QD, it's populating /etc/corosync/qnetd/...

But! I think I know what's up with the CentOS, considering that you installed what you have shown you did, in fact the only necessary is:
corosync-qnetd.x86_64 3.0.2-2.el9 @highavailability

Then the only thing you are missing would be literally:
firewall-cmd --zone=public --add-port=5403/tcp

If that was it, please confirm and I would suggest to @Maximiliano to add this port number to the docs, it's not the usual 5404/5/6...

Also, if you want to check if the service is up or what might have gone wrong, the one to look at is:
journalctl -xeu corosync-qnetd

Sorry I missed it in your original (really comprehensive) post.

EDIT: All the remarks above concern your QDevice (it's obviously the CentOS in this thread, but someone might find it later and start reading not from the start.)
Gotcha, I removed everything but corosync-qnetd. I still have corosync service on the qdevice. I've attached the systemctl status for both.
Code:
Dec 20 07:45:16 Lenovo systemd[1]: Starting Corosync Cluster Engine...
Dec 20 07:45:16 Lenovo corosync[7278]: Can't read file /etc/corosync/corosync.conf: No such file or directory
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Failed with result 'exit-code'.
Dec 20 07:45:16 Lenovo systemd[1]: Failed to start Corosync Cluster Engine.
[root@Lenovo ~]# systemctl status corosync-qnetd.service
● corosync-qnetd.service - Corosync Qdevice Network daemon
     Loaded: loaded (/usr/lib/systemd/system/corosync-qnetd.service; enabled; preset: disabled)
     Active: active (running) since Wed 2023-12-20 13:21:07 CST; 2min 17s ago
       Docs: man:corosync-qnetd
   Main PID: 22308 (corosync-qnetd)
      Tasks: 1 (limit: 44997)
     Memory: 6.4M
        CPU: 124ms
     CGroup: /system.slice/corosync-qnetd.service
             └─22308 /usr/bin/corosync-qnetd -f

Dec 20 13:21:07 Lenovo systemd[1]: Starting Corosync Qdevice Network daemon...
Dec 20 13:21:07 Lenovo systemd[1]: Started Corosync Qdevice Network daemon.

Code:
[root@Lenovo ~]# journalctl -xeu corosync-qnetd
░░ Support: https://access.redhat.com/support
░░
░░ The unit corosync-qnetd.service has successfully entered the 'dead' state.
Dec 20 07:48:17 Lenovo systemd[1]: Stopped Corosync Qdevice Network daemon.
░░ Subject: A stop job for unit corosync-qnetd.service has finished
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A stop job for unit corosync-qnetd.service has finished.
░░
░░ The job identifier is 9895 and the job result is done.
Dec 20 13:21:07 Lenovo systemd[1]: Starting Corosync Qdevice Network daemon...
░░ Subject: A start job for unit corosync-qnetd.service has begun execution
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit corosync-qnetd.service has begun execution.
░░
░░ The job identifier is 24382.
Dec 20 13:21:07 Lenovo systemd[1]: Started Corosync Qdevice Network daemon.
░░ Subject: A start job for unit corosync-qnetd.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit corosync-qnetd.service has finished successfully.
░░
░░ The job identifier is 24382.

Readded the device to cluster to verify it not adding correctly, no voting still.
Code:
Cluster information
-------------------
Name:             TwitchyCluster
Config Version:   15
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Dec 20 13:26:17 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.332
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1   A,NV,NMW 192.168.50.6 (local)
0x00000002          1   A,NV,NMW 192.168.50.11
0x00000000          0            Qdevice (votes 1)

I had already added port 5403 to the firewall but not for zone=public in particular. I did add that before prior steps. This is super strange...
 
Gotcha, I removed everything but corosync-qnetd. I still have corosync service on the qdevice. I've attached the systemctl status for both.
Code:
Dec 20 07:45:16 Lenovo systemd[1]: Starting Corosync Cluster Engine...
Dec 20 07:45:16 Lenovo corosync[7278]: Can't read file /etc/corosync/corosync.conf: No such file or directory
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Failed with result 'exit-code'.
Dec 20 07:45:16 Lenovo systemd[1]: Failed to start Corosync Cluster Engine.
[root@Lenovo ~]# systemctl status corosync-qnetd.service
● corosync-qnetd.service - Corosync Qdevice Network daemon
     Loaded: loaded (/usr/lib/systemd/system/corosync-qnetd.service; enabled; preset: disabled)
     Active: active (running) since Wed 2023-12-20 13:21:07 CST; 2min 17s ago
       Docs: man:corosync-qnetd
   Main PID: 22308 (corosync-qnetd)
      Tasks: 1 (limit: 44997)
     Memory: 6.4M
        CPU: 124ms
     CGroup: /system.slice/corosync-qnetd.service
             └─22308 /usr/bin/corosync-qnetd -f

Dec 20 13:21:07 Lenovo systemd[1]: Starting Corosync Qdevice Network daemon...
Dec 20 13:21:07 Lenovo systemd[1]: Started Corosync Qdevice Network daemon.

Code:
[root@Lenovo ~]# journalctl -xeu corosync-qnetd
░░ Support: https://access.redhat.com/support
░░
░░ The unit corosync-qnetd.service has successfully entered the 'dead' state.
Dec 20 07:48:17 Lenovo systemd[1]: Stopped Corosync Qdevice Network daemon.
░░ Subject: A stop job for unit corosync-qnetd.service has finished
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A stop job for unit corosync-qnetd.service has finished.
░░
░░ The job identifier is 9895 and the job result is done.
Dec 20 13:21:07 Lenovo systemd[1]: Starting Corosync Qdevice Network daemon...
░░ Subject: A start job for unit corosync-qnetd.service has begun execution
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit corosync-qnetd.service has begun execution.
░░
░░ The job identifier is 24382.
Dec 20 13:21:07 Lenovo systemd[1]: Started Corosync Qdevice Network daemon.
░░ Subject: A start job for unit corosync-qnetd.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit corosync-qnetd.service has finished successfully.
░░
░░ The job identifier is 24382.

Readded the device to cluster to verify it not adding correctly, no voting still.
Code:
Cluster information
-------------------
Name:             TwitchyCluster
Config Version:   15
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Dec 20 13:26:17 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.332
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1   A,NV,NMW 192.168.50.6 (local)
0x00000002          1   A,NV,NMW 192.168.50.11
0x00000000          0            Qdevice (votes 1)

I had already added port 5403 to the firewall but not for zone=public in particular. I did add that before prior steps. This is super strange...

Can you remove/setup cycle it once again from the node? I would really want to be sure that it's going through the 5403 before trying to dig out anything else, is the traffic hitting the QD?
 
Can you remove/setup cycle it once again from the node? I would really want to be sure that it's going through the 5403 before trying to dig out anything else, is the traffic hitting the QD?
Code:
root@pve:/# pvecm qdevice remove
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable corosync-qdevice
Removed "/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service".
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable corosync-qdevice
Removed "/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service".
Reloading corosync.conf...
Done

Removed Qdevice.
root@pve:/# pvecm qdevice setup 192.168.50.14 --force
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Authorized uses only. All activity may be monitored and reported.
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
Authorized uses only. All activity may be monitored and reported.

node 'pve': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve': Creating new key and cert db
node 'pve': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve': Importing CA
node 'twitchycube': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'twitchycube': Creating new key and cert db
node 'twitchycube': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'twitchycube': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server
Authorized uses only. All activity may be monitored and reported.

INFO: sign and export cluster cert
Authorized uses only. All activity may be monitored and reported.
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-TwitchyCluster.crt

INFO: copy exported CRT
Authorized uses only. All activity may be monitored and reported.

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'pve': Importing cluster certificate and key
node 'pve': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'twitchycube': Importing cluster certificate and key
node 'twitchycube': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'pve'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.

INFO: start and enable corosync qdevice daemon on node 'twitchycube'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.
Reloading corosync.conf...
Done

Traffic appears to be hitting, any way I can confirm?
 
Gotcha, I removed everything but corosync-qnetd. I still have corosync service on the qdevice. I've attached the systemctl status for both.
Code:
Dec 20 07:45:16 Lenovo systemd[1]: Starting Corosync Cluster Engine...
Dec 20 07:45:16 Lenovo corosync[7278]: Can't read file /etc/corosync/corosync.conf: No such file or directory

Can you ditch that too, it's not meant to be there and you had been copying around config files, just to be sure there's no crazy traffic going on the 5405 from the QD ...
 
Output from one node for systemctl status corosync-qdevice.service
Code:
● corosync-qdevice.service - Corosync Qdevice daemon
     Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-20 07:51:15 CST; 11min ago
       Docs: man:corosync-qdevice
   Main PID: 20174 (corosync-qdevic)
      Tasks: 2 (limit: 76712)
     Memory: 1.4M
        CPU: 81ms
     CGroup: /system.slice/corosync-qdevice.service
             ├─20174 /usr/sbin/corosync-qdevice -f
             └─20175 /usr/sbin/corosync-qdevice -f

Dec 20 08:01:48 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:01:48 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 08:01:56 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:01:56 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 08:02:04 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:02:04 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 08:02:12 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:02:12 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 08:02:20 twitchycube corosync-qdevice[20174]: Connect timeout
Dec 20 08:02:20 twitchycube corosync-qdevice[20174]: Can't connect to qnetd host. (-5986): Network address not available (in use?)

You should not be seeing this on the NODES, what does it say now that you have ports open?
 
Can you ditch that too, it's not meant to be there and you had been copying around config files, just to be sure there's no crazy traffic going on the 5405 from the QD ...
Corosync package is removed, that service is lingering for some reason. It shows:
Code:
[root@Lenovo ~]# systemctl status --failed
× corosync.service
     Loaded: not-found (Reason: Unit corosync.service not found.)
     Active: failed (Result: exit-code) since Wed 2023-12-20 07:45:16 CST; 5h 58min ago
   Main PID: 7278 (code=exited, status=8)
        CPU: 11ms

Dec 20 07:45:16 Lenovo systemd[1]: Starting Corosync Cluster Engine...
Dec 20 07:45:16 Lenovo corosync[7278]: Can't read file /etc/corosync/corosync.conf: No such file or directory
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Failed with result 'exit-code'.
Dec 20 07:45:16 Lenovo systemd[1]: Failed to start Corosync Cluster Engine.
Code:
[root@Lenovo ~]# dnf list --installed | grep corosync
corosync-qnetd.x86_64

You should not be seeing this on the NODES, what does it say now that you have ports open?
Code:
root@pve:/# systemctl status corosync-qdevice.service
● corosync-qdevice.service - Corosync Qdevice daemon
     Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-20 13:32:44 CST; 9min ago
       Docs: man:corosync-qdevice
   Main PID: 89738 (corosync-qdevic)
      Tasks: 2 (limit: 115837)
     Memory: 1.7M
        CPU: 175ms
     CGroup: /system.slice/corosync-qdevice.service
             ├─89738 /usr/sbin/corosync-qdevice -f
             └─89740 /usr/sbin/corosync-qdevice -f

Dec 20 13:42:05 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:05 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 13:42:13 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:13 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 13:42:21 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:21 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 13:42:29 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:29 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 13:42:37 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:37 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
root@pve:/# systemctl restart corosync-qdevice.service
root@pve:/# systemctl status corosync-qdevice.service
● corosync-qdevice.service - Corosync Qdevice daemon
     Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-20 13:42:46 CST; 771ms ago
       Docs: man:corosync-qdevice
   Main PID: 93623 (corosync-qdevic)
      Tasks: 2 (limit: 115837)
     Memory: 1.4M
        CPU: 40ms
     CGroup: /system.slice/corosync-qdevice.service
             ├─93623 /usr/sbin/corosync-qdevice -f
             └─93626 /usr/sbin/corosync-qdevice -f

Dec 20 13:42:46 pve systemd[1]: Starting corosync-qdevice.service - Corosync Qdevice daemon...
Dec 20 13:42:46 pve systemd[1]: Started corosync-qdevice.service - Corosync Qdevice daemon.
Dec 20 13:42:46 pve corosync-qdevice[93623]: Can't connect to qnetd host. (-5986): Network address not available (in use?)

Still receiving that output.

Code:
[root@Lenovo ~]# firewall-cmd --list-ports
5404/udp 5405/udp

Code:
root@pve:/# nmap 192.168.50.14 -p 5403
Starting Nmap 7.93 ( https://nmap.org ) at 2023-12-20 13:48 CST
Nmap scan report for Lenovo.TwitchyDispatcher (192.168.50.14)
Host is up (0.00082s latency).

PORT     STATE    SERVICE
5403/tcp filtered hpoms-ci-lstn
MAC Address: 00:23:24:EC:26:00 (G-pro Computer)

Nmap done: 1 IP address (1 host up) scanned in 0.23 seconds
root@pve:/# nmap 192.168.50.14 -p 5404
Starting Nmap 7.93 ( https://nmap.org ) at 2023-12-20 13:48 CST
Nmap scan report for Lenovo.TwitchyDispatcher (192.168.50.14)
Host is up (0.00099s latency).

PORT     STATE    SERVICE
5404/tcp filtered hpoms-dps-lstn
MAC Address: 00:23:24:EC:26:00 (G-pro Computer)

Nmap done: 1 IP address (1 host up) scanned in 0.29 seconds
 
@tempacc346235
i have exactly the same problem, but i am trying to get the qdevice to run on a raspberry bookworm.
:D excellent ... i just want to make sure with everyone that:

On the nodes, you have only: corosync, libcorosync-common4, corosync-qdevice
On the Q device, you have only: corosync-qnetd

Then what does the corosync-qdevice log for the service say on a node and what does corosync-qnetd say on the device?
 
Last edited:
Corosync package is removed, that service is lingering for some reason. It shows:
Code:
[root@Lenovo ~]# systemctl status --failed
× corosync.service
     Loaded: not-found (Reason: Unit corosync.service not found.)
     Active: failed (Result: exit-code) since Wed 2023-12-20 07:45:16 CST; 5h 58min ago
   Main PID: 7278 (code=exited, status=8)
        CPU: 11ms

Dec 20 07:45:16 Lenovo systemd[1]: Starting Corosync Cluster Engine...
Dec 20 07:45:16 Lenovo corosync[7278]: Can't read file /etc/corosync/corosync.conf: No such file or directory
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Dec 20 07:45:16 Lenovo systemd[1]: corosync.service: Failed with result 'exit-code'.
Dec 20 07:45:16 Lenovo systemd[1]: Failed to start Corosync Cluster Engine.
Code:
[root@Lenovo ~]# dnf list --installed | grep corosync
corosync-qnetd.x86_64
This is good, it's gone.

Code:
root@pve:/# systemctl status corosync-qdevice.service
● corosync-qdevice.service - Corosync Qdevice daemon
     Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-20 13:32:44 CST; 9min ago
       Docs: man:corosync-qdevice
   Main PID: 89738 (corosync-qdevic)
      Tasks: 2 (limit: 115837)
     Memory: 1.7M
        CPU: 175ms
     CGroup: /system.slice/corosync-qdevice.service
             ├─89738 /usr/sbin/corosync-qdevice -f
             └─89740 /usr/sbin/corosync-qdevice -f

Dec 20 13:42:05 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:05 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 13:42:13 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:13 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 13:42:21 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:21 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 13:42:29 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:29 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
Dec 20 13:42:37 pve corosync-qdevice[89738]: Connect timeout
Dec 20 13:42:37 pve corosync-qdevice[89738]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
root@pve:/# systemctl restart corosync-qdevice.service
root@pve:/# systemctl status corosync-qdevice.service
● corosync-qdevice.service - Corosync Qdevice daemon
     Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-20 13:42:46 CST; 771ms ago
       Docs: man:corosync-qdevice
   Main PID: 93623 (corosync-qdevic)
      Tasks: 2 (limit: 115837)
     Memory: 1.4M
        CPU: 40ms
     CGroup: /system.slice/corosync-qdevice.service
             ├─93623 /usr/sbin/corosync-qdevice -f
             └─93626 /usr/sbin/corosync-qdevice -f

Dec 20 13:42:46 pve systemd[1]: Starting corosync-qdevice.service - Corosync Qdevice daemon...
Dec 20 13:42:46 pve systemd[1]: Started corosync-qdevice.service - Corosync Qdevice daemon.
Dec 20 13:42:46 pve corosync-qdevice[93623]: Can't connect to qnetd host. (-5986): Network address not available (in use?)

Still receiving that output.

I don't know your timezone, but the question is if you are getting these entries constantly, once it's reaching it will stop complaining.

Code:
[root@Lenovo ~]# firewall-cmd --list-ports
5404/udp 5405/udp

Now this should have 5403/tcp there.

Code:
root@pve:/# nmap 192.168.50.14 -p 5403
Starting Nmap 7.93 ( https://nmap.org ) at 2023-12-20 13:48 CST
Nmap scan report for Lenovo.TwitchyDispatcher (192.168.50.14)
Host is up (0.00082s latency).

PORT     STATE    SERVICE
5403/tcp filtered hpoms-ci-lstn
MAC Address: 00:23:24:EC:26:00 (G-pro Computer)

Nmap done: 1 IP address (1 host up) scanned in 0.23 seconds
root@pve:/# nmap 192.168.50.14 -p 5404
Starting Nmap 7.93 ( https://nmap.org ) at 2023-12-20 13:48 CST
Nmap scan report for Lenovo.TwitchyDispatcher (192.168.50.14)
Host is up (0.00099s latency).

PORT     STATE    SERVICE
5404/tcp filtered hpoms-dps-lstn
MAC Address: 00:23:24:EC:26:00 (G-pro Computer)

Nmap done: 1 IP address (1 host up) scanned in 0.29 seconds

Nmap also tells you it's filtered (behind firewall), it's just smart enough to know there's a machine there, it's not open.
 
This is good, it's gone.



I don't know your timezone, but the question is if you are getting these entries constantly, once it's reaching it will stop complaining.



Now this should have 5403/tcp there.



Nmap also tells you it's filtered (behind firewall), it's just smart enough to know there's a machine there, it's not open.
5403 open now, I allowed it through earlier. Guess it didn't take, showing open now.
Code:
root@pve:/# pvecm status
Cluster information
-------------------
Name:             TwitchyCluster
Config Version:   17
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Dec 20 13:57:39 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.332
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 192.168.50.6 (local)
0x00000002          1    A,V,NMW 192.168.50.11
0x00000000          1            Qdevice
We have votes!
 
  • Like
Reactions: esi_y

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!