Qdevice is not voting

lukash

New Member
Feb 21, 2021
6
0
1
24
Hello guys,
I have a problem with my Qdevice. If I type "pvemc status", my first node give the following result:
Code:
root@pve1:~# pvecm status
Cluster information
-------------------
Name:             server
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sat Feb 27 00:16:05 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.42
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 10.10.10.1 (local)
0x00000002          1  NA,NV,NMW 10.10.10.2
0x00000000          1            Qdevice
root@pve1:~#

It seems normal. But the second node returns the following

Code:
root@pve2:~# pvecm status
Cluster information
-------------------
Name:             server
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sat Feb 27 00:16:13 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.42
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2 
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW 10.10.10.1
0x00000002          1  NA,NV,NMW 10.10.10.2 (local)
0x00000000          0            Qdevice (votes 1)
root@pve2:~#

The Qdevice is not voting for the Server

The node and the pi are in the same network.
Code:
root@pve2:~# ping 10.10.10.99
PING 10.10.10.99 (10.10.10.99) 56(84) bytes of data.
64 bytes from 10.10.10.99: icmp_seq=1 ttl=64 time=0.883 ms

Code:
root@raspberrypi:~# ping 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=64 time=0.487 ms

What I have found is, that corosync-qdevice.service on node 2 run into an error.

Code:
root@pve2:~# systemctl start corosync-qdevice.service
Job for corosync-qdevice.service failed because the control process exited with error code.
See "systemctl status corosync-qdevice.service" and "journalctl -xe" for details.

Code:
root@pve2:~# systemctl status corosync-qdevice.service
● corosync-qdevice.service - Corosync Qdevice daemon
   Loaded: loaded (/lib/systemd/system/corosync-qdevice.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sat 2021-02-27 00:26:14 CET; 1min 25s ago
     Docs: man:corosync-qdevice
  Process: 676324 ExecStart=/usr/sbin/corosync-qdevice -f $COROSYNC_QDEVICE_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 676324 (code=exited, status=1/FAILURE)

Feb 27 00:26:14 pve2 systemd[1]: Starting Corosync Qdevice daemon...
Feb 27 00:26:14 pve2 corosync-qdevice[676324]: Can't init nss (-8174): security library: bad database.
Feb 27 00:26:14 pve2 systemd[1]: corosync-qdevice.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 00:26:14 pve2 systemd[1]: corosync-qdevice.service: Failed with result 'exit-code'.
Feb 27 00:26:14 pve2 systemd[1]: Failed to start Corosync Qdevice daemon.


Code:
-- A start job for unit corosync-qdevice.service has begun execution.
--
-- The job identifier is 288886.
Feb 27 00:28:42 pve2 corosync-qdevice[677667]: Can't init nss (-8174): security library: bad database.
Feb 27 00:28:42 pve2 systemd[1]: corosync-qdevice.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit corosync-qdevice.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Feb 27 00:28:42 pve2 systemd[1]: corosync-qdevice.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit corosync-qdevice.service has entered the 'failed' state with result 'exit-code'.
Feb 27 00:28:42 pve2 systemd[1]: Failed to start Corosync Qdevice daemon.
-- Subject: A start job for unit corosync-qdevice.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit corosync-qdevice.service has finished with a failure.
--
-- The job identifier is 288886 and the job result is failed.


Is there a way to fix this problem?

I have look around for many hours, but I didn't find the solution.

Can you help me?
Tanks
 
Hi,

please post the output of pveversion -v and Corosync config cat /etc/pve/corosync.conf as well
 
pve1 is the node with 3 votes
pve2 is the node with only 2 votes (the problem)

Code:
root@pve1:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
root@pve1:~#


Code:
root@pve2:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
root@pve2:~#

Code:
root@pve1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.1
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.10.2
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 10.10.10.99
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: server
  config_version: 7
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@pve1:~#

Code:
root@pve2:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.1
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.10.2
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 10.10.10.99
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: server
  config_version: 7
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@pve2:~#
 
It seems, that the solution was to copy the folder /etc/corosync from the working pve to the not working pve.
 
For me it turned out that the problem node wasn't able to ssh passwordless to itself.
Once that was fixed, I could add the qdevice and all worked as expected.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!