[Solved] QDevice - service not starting on node

le_top

Renowned Member
Sep 6, 2013
42
0
71
I am trying to set up a qdevice node to a 2 node setup .

I fixed some issues, and I am almost there but the last step fails:

Code:
node 'p3': Importing cluster certificate and key
node 'p3': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'p5': Importing cluster certificate and key
node 'p5': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'p3'...
Job for corosync-qdevice.service failed because the control process exited with error code.
See "systemctl status corosync-qdevice.service" and "journalctl -xe" for details.
command 'ssh -o 'BatchMode=yes' -lroot 10.0.0.3 systemctl start corosync-qdevice' failed: exit code 1

In the journal my attention goes to this:
Code:
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] got nodeinfo message from cluster node 4
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] nodeinfo message[0]: votes: 1, expected: 0 flags: 0

In a more complete (but somewhat filtered transcript):
Code:
Mar  2 00:31:12 p3 pmxcfs[3044]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 65)
Mar  2 00:31:12 p3 corosync[3396]: notice  [CFG   ] Config reload requested by node 1
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] Configuration reloaded. Dumping actual totem config.
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] Token Timeout (1000 ms) retransmit timeout (238 ms)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] token hold (180 ms) retransmits before loss (4 retrans)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] join (50 ms) send_join (0 ms) consensus (1200 ms) merge (200 ms)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] downcheck (1000 ms) fail to recv const (2500 msgs)
Mar  2 00:31:12 p3 corosync[3396]:  [CFG   ] Config reload requested by node 1
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] seqno unchanged const (30 rotations) Maximum network MTU 1301
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] missed count const (5 messages)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] RRP token expired timeout (238 ms)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] RRP token problem counter (2000 ms)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] RRP threshold (10 problem count)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] RRP multicast threshold (100 problem count)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] RRP automatic recovery check timeout (1000 ms)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] RRP mode set to none.
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] heartbeat_failures_allowed (0)
Mar  2 00:31:12 p3 corosync[3396]: debug   [TOTEM ] max_network_delay (50 ms)
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] Reading configuration (runtime: 1)
Mar  2 00:31:12 p3 corosync[3396]: crit    [VOTEQ ] configuration error: quorum.device.votes is too high or expected_votes is too low
Mar  2 00:31:12 p3 corosync[3396]: crit    [VOTEQ ] disabling quorum device operations
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] ev_tracking=0, ev_tracking_barrier = 0: expected_votes = 2
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: Yes First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] Configuration reloaded. Dumping actual totem config.
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] Token Timeout (1000 ms) retransmit timeout (238 ms)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] token hold (180 ms) retransmits before loss (4 retrans)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] join (50 ms) send_join (0 ms) consensus (1200 ms) merge (200 ms)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] downcheck (1000 ms) fail to recv const (2500 msgs)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] seqno unchanged const (30 rotations) Maximum network MTU 1301
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] missed count const (5 messages)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] RRP token expired timeout (238 ms)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] RRP token problem counter (2000 ms)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] RRP threshold (10 problem count)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] RRP multicast threshold (100 problem count)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] RRP automatic recovery check timeout (1000 ms)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] RRP mode set to none.
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] heartbeat_failures_allowed (0)
Mar  2 00:31:12 p3 corosync[3396]:  [TOTEM ] max_network_delay (50 ms)
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] Reading configuration (runtime: 1)
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] configuration error: quorum.device.votes is too high or expected_votes is too low
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] disabling quorum device operations
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] ev_tracking=0, ev_tracking_barrier = 0: expected_votes = 2
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: Yes First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] got nodeinfo message from cluster node 4
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] nodeinfo message[4]: votes: 1, expected: 2 flags: 5
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: Yes First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] got nodeinfo message from cluster node 4
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] nodeinfo message[0]: votes: 1, expected: 0 flags: 0
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] got nodeinfo message from cluster node 1
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 2 flags: 5
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: Yes First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] total_votes=3, expected_votes=2
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] Sending expected votes callback
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] node 1 state=1, votes=1, expected=3
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] node 4 state=1, votes=1, expected=2
Mar  2 00:31:12 p3 corosync[3396]: notice  [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] got nodeinfo message from cluster node 1
Mar  2 00:31:12 p3 corosync[3396]: debug   [VOTEQ ] nodeinfo message[0]: votes: 1, expected: 0 flags: 0
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] got nodeinfo message from cluster node 4
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] nodeinfo message[4]: votes: 1, expected: 2 flags: 5
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: Yes First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] got nodeinfo message from cluster node 4
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] nodeinfo message[0]: votes: 1, expected: 0 flags: 0
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] got nodeinfo message from cluster node 1
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 2 flags: 5
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: Yes First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] total_votes=3, expected_votes=2
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] Sending expected votes callback
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] node 1 state=1, votes=1, expected=3
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] node 4 state=1, votes=1, expected=2
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] Waiting for all cluster members. Current votes: 2 expected_votes: 3
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] got nodeinfo message from cluster node 1
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] nodeinfo message[0]: votes: 1, expected: 0 flags: 0
Mar  2 00:31:12 p3 corosync-qdevice[32770]: Initializing votequorum
Mar  2 00:31:12 p3 corosync-qdevice[32770]: Initializing votequorum
Mar  2 00:31:12 p3 corosync[3396]: info    [VOTEQ ] Registration of quorum device is disabled by incorrect corosync.conf. See logs for more information
Mar  2 00:31:12 p3 corosync-qdevice[32770]: Can't register votequorum device. Error CS_ERR_ACCESS
Mar  2 00:31:12 p3 systemd[1]: corosync-qdevice.service: Main process exited, code=exited, status=1/FAILURE
Mar  2 00:31:12 p3 corosync[3396]: debug   [CMAP  ] exit_fn for conn=0x555c3215c840
Mar  2 00:31:12 p3 systemd[1]: corosync-qdevice.service: Unit entered failed state.
Mar  2 00:31:12 p3 systemd[1]: corosync-qdevice.service: Failed with result 'exit-code'.
Mar  2 00:31:12 p3 corosync[3396]:  [VOTEQ ] Registration of quorum device is disabled by incorrect corosync.conf. See logs for more information
Mar  2 00:31:12 p3 corosync-qdevice[32770]: Can't register votequorum device. Error CS_ERR_ACCESS
Mar  2 00:31:12 p3 corosync[3396]:  [CMAP  ] exit_fn for conn=0x555c3215c840

corosync.conf before adding qdevice (after pvecm qdevice remove)
JSON:
logging {
  debug: on
  to_syslog: yes
}

nodelist {
  node {
    name: p3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.0.0.3
  }
  node {
    name: p5
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.0.0.5
  }
}

quorum {
  expected_votes: 2
  provider: corosync_votequorum
}

totem {
  cluster_name: ourcluster
  config_version: 64
  interface {
    bindnetaddr: 10.0.0.3
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}



I fixed it:
After adding the qdevice, I had this in the configuration:
Code:
quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: <HIDDEN_IP>
      tls: on
    }
    votes: 1
  }
  expected_votes: 2
  provider: corosync_votequorum
}

I changed the expected_votes to '3' and did a

> systemctl start corosync-qdevice

After which it worked.