" Cluster not ready - no quorum? (500)" Erā de VM ga kidō dekinai 51 / 5,000 VM fails to start with error "cluster not ready - no quorum? (500)"

kanhina1217

New Member
Jul 14, 2024
2
0
1
Hi

I have 4 nodes cluster.

Suddenly since yesterday I can't start the VM.

I tried various solutions but they all failed.

Here are some debugs.

pve1 /etc/pve/corosync.conf
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.100.11
  }
  node {
    name: pve2
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.100.12
  }
  node {
    name: pve3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.100.13
  }
  node {
    name: pve4
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.100.14
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

results of # systemctl status pve-cluster
Code:
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-07-14 23:11:11 JST; 51min ago
    Process: 3951 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 3953 (pmxcfs)
      Tasks: 7 (limit: 18952)
     Memory: 35.4M
        CPU: 1.754s
     CGroup: /system.slice/pve-cluster.service
             └─3953 /usr/bin/pmxcfs

Jul 15 00:02:28 pve1 pmxcfs[3953]: [dcdb] crit: cpg_initialize failed: 2
Jul 15 00:02:28 pve1 pmxcfs[3953]: [status] crit: cpg_initialize failed: 2
Jul 15 00:02:34 pve1 pmxcfs[3953]: [quorum] crit: quorum_initialize failed: 2
Jul 15 00:02:34 pve1 pmxcfs[3953]: [confdb] crit: cmap_initialize failed: 2
Jul 15 00:02:34 pve1 pmxcfs[3953]: [dcdb] crit: cpg_initialize failed: 2
Jul 15 00:02:34 pve1 pmxcfs[3953]: [status] crit: cpg_initialize failed: 2
Jul 15 00:02:40 pve1 pmxcfs[3953]: [quorum] crit: quorum_initialize failed: 2
Jul 15 00:02:40 pve1 pmxcfs[3953]: [confdb] crit: cmap_initialize failed: 2
Jul 15 00:02:40 pve1 pmxcfs[3953]: [dcdb] crit: cpg_initialize failed: 2
Jul 15 00:02:40 pve1 pmxcfs[3953]: [status] crit: cpg_initialize failed: 2


results of # systemctl status corosync
Code:
× corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Sun 2024-07-14 23:11:11 JST; 53min ago
   Duration: 3.477s
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
    Process: 3959 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=8)
   Main PID: 3959 (code=exited, status=8)
        CPU: 11ms

Jul 14 23:11:11 pve1 systemd[1]: Starting corosync.service - Corosync Cluster Engine...
Jul 14 23:11:11 pve1 corosync[3959]:   [MAIN  ] Corosync Cluster Engine  starting up
Jul 14 23:11:11 pve1 corosync[3959]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vq>
Jul 14 23:11:11 pve1 corosync[3959]:   [MAIN  ] Could not open /etc/corosync/authkey: No such file or directory
Jul 14 23:11:11 pve1 corosync[3959]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1417.
Jul 14 23:11:11 pve1 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Jul 14 23:11:11 pve1 systemd[1]: corosync.service: Failed with result 'exit-code'.
Jul 14 23:11:11 pve1 systemd[1]: Failed to start corosync.service - Corosync Cluster Engine.

results of # pvecm nodes
Code:
Cannot initialize CMAP service

results of # journalctl -xeu corosync.service
Code:
░░ A start job for unit corosync.service has begun execution.
░░
░░ The job identifier is 1032.
Jul 14 23:11:11 pve1 corosync[3959]:   [MAIN  ] Corosync Cluster Engine  starting up
Jul 14 23:11:11 pve1 corosync[3959]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vq>
Jul 14 23:11:11 pve1 corosync[3959]:   [MAIN  ] Could not open /etc/corosync/authkey: No such file or directory
Jul 14 23:11:11 pve1 corosync[3959]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1417.
Jul 14 23:11:11 pve1 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit corosync.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 8.
Jul 14 23:11:11 pve1 systemd[1]: corosync.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit corosync.service has entered the 'failed' state with result 'exit-code'.
Jul 14 23:11:11 pve1 systemd[1]: Failed to start corosync.service - Corosync Cluster Engine.
░░ Subject: A start job for unit corosync.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit corosync.service has finished with a failure.
░░
░░ The job identifier is 1032 and the job result is failed.

pve1 WebUI
1720969709186.png

pve2, pve3 WebUI
1720969774560.png

pve4 WebUI
1720969806676.png


I don't want to rebuild it if possible.
How can this be done?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!