Qdevice is not voting

lukash

New Member
Feb 21, 2021
6
0
1
24
Hello guys,
I have a problem with my Qdevice. If I type "pvemc status", my first node give the following result:
Code:
root@pve1:~# pvecm status
Cluster information
-------------------
Name:             server
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sat Feb 27 00:16:05 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.42
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 10.10.10.1 (local)
0x00000002          1  NA,NV,NMW 10.10.10.2
0x00000000          1            Qdevice
root@pve1:~#

It seems normal. But the second node returns the following

Code:
root@pve2:~# pvecm status
Cluster information
-------------------
Name:             server
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sat Feb 27 00:16:13 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.42
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 10.10.10.1
0x00000002          1  NA,NV,NMW 10.10.10.2 (local)
0x00000000          0            Qdevice (votes 1)
root@pve2:~#

The Qdevice is not voting for the Server

The node and the pi are in the same network.
Code:
root@pve2:~# ping 10.10.10.99
PING 10.10.10.99 (10.10.10.99) 56(84) bytes of data.
64 bytes from 10.10.10.99: icmp_seq=1 ttl=64 time=0.883 ms

Code:
root@raspberrypi:~# ping 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=64 time=0.487 ms

What I have found is, that corosync-qdevice.service on node 2 run into an error.

Code:
root@pve2:~# systemctl start corosync-qdevice.service
Job for corosync-qdevice.service failed because the control process exited with error code.
See "systemctl status corosync-qdevice.service" and "journalctl -xe" for details.

Code:
root@pve2:~# systemctl status corosync-qdevice.service
● corosync-qdevice.service - Corosync Qdevice daemon
   Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sat 2021-02-27 00:26:14 CET; 1min 25s ago
     Docs: man:corosync-qdevice
  Process: 676324 ExecStart=/usr/sbin/corosync-qdevice -f $COROSYNC_QDEVICE_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 676324 (code=exited, status=1/FAILURE)

Feb 27 00:26:14 pve2 systemd[1]: Starting Corosync Qdevice daemon...
Feb 27 00:26:14 pve2 corosync-qdevice[676324]: Can't init nss (-8174): security library: bad database.
Feb 27 00:26:14 pve2 systemd[1]: corosync-qdevice.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 00:26:14 pve2 systemd[1]: corosync-qdevice.service: Failed with result 'exit-code'.
Feb 27 00:26:14 pve2 systemd[1]: Failed to start Corosync Qdevice daemon.


Code:
-- A start job for unit corosync-qdevice.service has begun execution.
--
-- The job identifier is 288886.
Feb 27 00:28:42 pve2 corosync-qdevice[677667]: Can't init nss (-8174): security library: bad database.
Feb 27 00:28:42 pve2 systemd[1]: corosync-qdevice.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit corosync-qdevice.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Feb 27 00:28:42 pve2 systemd[1]: corosync-qdevice.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit corosync-qdevice.service has entered the 'failed' state with result 'exit-code'.
Feb 27 00:28:42 pve2 systemd[1]: Failed to start Corosync Qdevice daemon.
-- Subject: A start job for unit corosync-qdevice.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit corosync-qdevice.service has finished with a failure.
--
-- The job identifier is 288886 and the job result is failed.


Is there a way to fix this problem?

I have look around for many hours, but I didn't find the solution.

Can you help me?
Tanks
 
Hi,

please post the output of pveversion -v and Corosync config cat /etc/pve/corosync.conf as well
 
pve1 is the node with 3 votes
pve2 is the node with only 2 votes (the problem)

Code:
root@pve1:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
root@pve1:~#


Code:
root@pve2:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
root@pve2:~#

Code:
root@pve1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.1
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.10.2
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 10.10.10.99
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: server
  config_version: 7
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@pve1:~#

Code:
root@pve2:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.1
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.10.2
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 10.10.10.99
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: server
  config_version: 7
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@pve2:~#
 
It seems, that the solution was to copy the folder /etc/corosync from the working pve to the not working pve.
 
For me it turned out that the problem node wasn't able to ssh passwordless to itself.
Once that was fixed, I could add the qdevice and all worked as expected.