[quorum] crit: quorum_initialize failed: 2

xokia

Member
Apr 8, 2023
95
8
8
Am I getting this error because I dont have a watchdog setup?

Code:
Aug 24 17:33:29 HOME-SERVER pmxcfs[1285]: [quorum] crit: quorum_initialize failed: 2
Aug 24 17:33:29 HOME-SERVER pmxcfs[1285]: [quorum] crit: can't initialize service
Aug 24 17:33:29 HOME-SERVER pmxcfs[1285]: [confdb] crit: cmap_initialize failed: 2
Aug 24 17:33:29 HOME-SERVER pmxcfs[1285]: [confdb] crit: can't initialize service
Aug 24 17:33:29 HOME-SERVER pmxcfs[1285]: [dcdb] crit: cpg_initialize failed: 2
Aug 24 17:33:29 HOME-SERVER pmxcfs[1285]: [dcdb] crit: can't initialize service
Aug 24 17:33:29 HOME-SERVER pmxcfs[1285]: [status] crit: cpg_initialize failed: 2
Aug 24 17:33:29 HOME-SERVER pmxcfs[1285]: [status] crit: can't initialize service

Code:
root@HOME-SERVER:~# journalctl -u corosync.service
Aug 11 15:09:56 HOME-SERVER systemd[1]: Starting Corosync Cluster Engine...
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [MAIN  ] Corosync Cluster Engine 3.1.7 starting up
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [TOTEM ] Initializing transport (Kronosnet).
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [TOTEM ] totemknet initialized
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [KNET  ] pmtud: MTU manually set to: 0
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [QB    ] server name: cmap
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [QB    ] server name: cfg
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [QB    ] server name: cpg
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [WD    ] Watchdog not enabled by configuration
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [WD    ] resource load_15min missing a recovery key.
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [WD    ] resource memory_used missing a recovery key.
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [WD    ] no resources configured.
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [QUORUM] Using quorum provider corosync_votequorum
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [QUORUM] This node is within the primary component and will provide service.
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [QUORUM] Members[0]:
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [QB    ] server name: votequorum
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Aug 11 15:09:56 HOME-SERVER corosync[1210]:   [QB    ] server name: quorum

Code:
root@HOME-SERVER:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: HOME-SERVER
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.3.12
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: server
  config_version: 1
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Code:
root@HOME-SERVER:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-10-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-5
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2: 6.2.16-10
pve-kernel-6.2.9-1-pve: 6.2.9-1
pve-kernel-5.15.111-1-pve: 5.15.111-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.4
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.7
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
 
Last edited:
Can you post the resulting syslog.txt (attached) as well as the output of the other command? Please always use CODE tags for posting command output, otherwise it is hard to parse.

Code:
journalctl --since '2023-08-10' > syslog.txt
pvecm status
 
syslog is to large to upload here even if I zip it. 10MB unzipped 1MB zipped

It fits if I 7-Zip it. So you might need to change the .zip to .7z and use 7-zip to unzip


Code:
root@HOME-SERVER:~# pvecm status
Cluster information
-------------------
Name:             server
Config Version:   1
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Aug 25 08:16:04 2023
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.eb
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.3.12 (local)
 

Attachments

Last edited:
Hi,
do you have any actual issues? If not, the error message is just that the cluster filesystem pmxcfs needs Corosync to start up first to be operational.
 
Hi,
do you have any actual issues? If not, the error message is just that the cluster filesystem pmxcfs needs Corosync to start up first to be operational.
Yes system crashes randomly was looking for any reason that might be happening.
 
Yes system crashes randomly was looking for any reason that might be happening.
Looking at the logs again, it seems there are quite a few segfaults and then later page faults the kernel isn't able to handle. I'd start out with running a memory test, e.g. using memtest86+ on the Proxmox VE installer ISO (Advanced options). How does the load on your host look like before a crash happens?
 
Looking at the logs again, it seems there are quite a few segfaults and then later page faults the kernel isn't able to handle. I'd start out with running a memory test, e.g. using memtest86+ on the Proxmox VE installer ISO (Advanced options). How does the load on your host look like before a crash happens?
memory test passes no issues. Loads and system runs fine for about a day then a random crash. Tested memory every way possible and I can never get it to fail. I am using G.Skills RAM and just stock speeds. I9 13900 CPU Asus B760-I MB. I have even had the CPU replaced with a new one. So not completely ruling out a HW issue but seems unlikely. Running windows directly on this same system (no proxmox) system is stable.

My system is fairly basic. a Single VM running windows 11 which is 90% of the time disabled. A LXC running ubuntu with plex media server and GPU pass through for the iGPU. If I kill the LXC and just run proxmox nothing else active I will still get the random crashes after about a day or two.

If the CPU stays active .i.e. some kind of load on the CPU and keep it doing something system doesnt crash. Issue only happens when the system goes idle for long periods of time.
 
Last edited:
If the CPU stays active .i.e. some kind of load on the CPU and keep it doing something system doesnt crash. Issue only happens when the system goes idle for long periods of time.
Hmm, maybe it has something to do with p-states or c-states?
 
Is that a known issue?
I think there were one or two reports in the forum a few months ago, but I don't know about something concrete, I'm just guessing from the symptom.
 
  • Like
Reactions: xokia
I think there were one or two reports in the forum a few months ago, but I don't know about something concrete, I'm just guessing from the symptom.
unofficial is fine. Just wanted to make sure I did not have something obvious in my config that I needed to fix. Thank You!
 
Is there a way to disable corosync? Not sure I follow what it does or why I need it.

Do I get this correct?:
You have only one node and some time ago you unintentionally created a cluster (named: server), but you do not plan to actually use the cluster by adding additional nodes to it?

If so, you might want to follow this: [1] to revert back to a pure standalone node.
If I remember correctly, after this, the corosync service does also not start anymore, because of the non-existing config file.

[1] https://forum.proxmox.com/threads/proxmox-ve-6-removing-cluster-configuration.56259/post-259203
 
  • Like
Reactions: xokia
Do I get this correct?:
You have only one node and some time ago you unintentionally created a cluster (named: server), but you do not plan to actually use the cluster by adding additional nodes to it?

If so, you might want to follow this: [1] to revert back to a pure standalone node.
If I remember correctly, after this, the corosync service does also not start anymore, because of the non-existing config file.

[1] https://forum.proxmox.com/threads/proxmox-ve-6-removing-cluster-configuration.56259/post-259203
I only have 1 node. I think I created a cluster because after some update it was producing an error that there was no cluster. It pretty much forced corosync on you.

That worked thank you!
 
Last edited:
Whatever happens around the 24 hour mark in proxmox is what causes machine to crash. It consistently crashes.

maybe NVME or memory going to sleep? not sure but whatever it is proxmox always takes a dump.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!