Corosync service fails to start on all nodes and I don't know what I did? (Kind of solved)

jlficken

New Member
Sep 6, 2022
29
2
3
ETA: I'm VERY new to Proxmox and just working with it at home.

Any help would be appreciated.

All nodes can ping eachother.

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: FSPVE1
    nodeid: 1
    quorum_votes: 2
    ring0_addr: 192.168.1.52
  }
  node {
    name: FSPVE2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.53
  }
  node {
    name: FSPVE3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.1.54
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: lms
      host: 192.168.1.55
      tls: on
    }
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: FSProxmox
  config_version: 6
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Code:
root@FSPVE1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2022-09-28 13:30:33 CDT; 19min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
    Process: 4196 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=20)
   Main PID: 4196 (code=exited, status=20)
        CPU: 107ms

Sep 28 13:30:33 FSPVE1 corosync[4196]:   [WD    ] resource memory_used missing a recovery key.
Sep 28 13:30:33 FSPVE1 corosync[4196]:   [WD    ] no resources configured.
Sep 28 13:30:33 FSPVE1 corosync[4196]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Sep 28 13:30:33 FSPVE1 corosync[4196]:   [QUORUM] Using quorum provider corosync_votequorum
Sep 28 13:30:33 FSPVE1 corosync[4196]:   [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Sep 28 13:30:33 FSPVE1 corosync[4196]:   [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: quorum.device.votes must be specified when not all nodes votes 1'
Sep 28 13:30:33 FSPVE1 corosync[4196]:   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.
Sep 28 13:30:33 FSPVE1 systemd[1]: corosync.service: Main process exited, code=exited, status=20/n/a
Sep 28 13:30:33 FSPVE1 systemd[1]: corosync.service: Failed with result 'exit-code'.
Sep 28 13:30:33 FSPVE1 systemd[1]: Failed to start Corosync Cluster Engine.
 
Last edited:
I removed this section from all of the /etc/corosync/corosync.conf files on each node:
device {
model: net
net {
algorithm: lms
host: 192.168.1.55
tls: on
}
}

Then I restarted the corosync service using "systemctl restart corosync.service" and they all came back up.

I guess I'll just leave 2 nodes running at all times even though I really didn't want to.

Unless that is I can somehow have 1 node of 3 running yet keep quorum somehow?
 
Last edited:
I believe if I change the device node and add a "votes: 1" attribute that I should be able to fix this error correct?
"quorum.device.votes must be specified when not all nodes votes 1"

ETA: I think I got it figured out. It should be:
Code:
quorum {
  device {
    model: net
    net {
      algorithm: lms
      host: 192.168.1.55
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}
 
Last edited:
While the service works the QDevice is kind of odd as it only shows up sometimes.

I think I'll just give up and give the main node 3 votes in the corosync.conf file so that I can shut the other 2 nodes off.
 
Last edited:
a qdevice only makes sense as tie-breaker in clusters with an even node count. since you have 3 nodes it doesn't improve the situation in any way. if you have one node that's running 24/7, and two nodes that are only up sometimes but you want to manage them as a cluster, giving the main node 3 votes is an option. you just have to be aware that the main node being down means no quorum for the others, so no modifications possible while the main node is not up.
 
Thanks Fabian!

When you say "no modifications" what exactly does that mean? Would the VM's still run on the other nodes even without quorum or would nothing run?
 
guests which are already running should continue to run, but no power management (start/stop/..), config changes, migration, backup, ..
 
  • Like
Reactions: jlficken

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!