Corosync is not working and I can't log in to the web console.

efg

Member
Nov 24, 2020
14
0
6
40
Please help. Cluster stopped working. After rebooting the server vm does not start. From the console, too, it is not possible to launch them.

Code:
pvecm status
Cluster information
-------------------
Name:             asodesk
Config Version:   52
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Apr 29 05:03:51 2022
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.19
Quorate:          No

Votequorum information
----------------------
Expected votes:   14
Highest expected: 14
Total votes:      5
Quorum:           8 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          5 IP1 (local)


Code:
systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-04-29 04:16:39 CEST; 46min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 24977 (corosync)
      Tasks: 9 (limit: 308854)
     Memory: 197.8M
        CPU: 2h 30min 348ms
     CGroup: /system.slice/corosync.service
             └─24977 /usr/sbin/corosync -f

Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).

Code:
pvecm e 1
Unable to set expected votes: CS_ERR_INVALID_PARAM

Code:
qm start 100
cluster not ready - no quorum?
 
Last edited:
Please help. Cluster stopped working. After rebooting the server vm does not start. From the console, too, it is not possible to launch them.

Code:
pvecm status
Cluster information
-------------------
Name:             asodesk
Config Version:   52
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Apr 29 05:03:51 2022
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.19
Quorate:          No

Votequorum information
----------------------
Expected votes:   14
Highest expected: 14
Total votes:      5
Quorum:           8 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          5 168.119.78.190 (local)


Code:
systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-04-29 04:16:39 CEST; 46min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 24977 (corosync)
      Tasks: 9 (limit: 308854)
     Memory: 197.8M
        CPU: 2h 30min 348ms
     CGroup: /system.slice/corosync.service
             └─24977 /usr/sbin/corosync -f

Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).
Apr 29 05:02:28 x8.asodesk.com corosync[24977]:   [TOTEM ] entering GATHER state from 11(merge during join).

Code:
pvecm e 1
Unable to set expected votes: CS_ERR_INVALID_PARAM

Code:
qm start 100
cluster not ready - no quorum?
pvecm expected 1
 
It did not help. I am getting error
Code:
Unable to set expected votes: CS_ERR_INVALID_PARAM
 
I only have 10 nodes


Code:
cat /etc/corosync/corosync.conf
logging {
  debug: on
  to_syslog: yes
}

nodelist {
  node {
    name: staging
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 136.243.39.81
  }
  node {
    name: x3
    nodeid: 10
    quorum_votes: 1
    ring0_addr: 95.216.17.92
  }
  node {
    name: x4
    nodeid: 8
    quorum_votes: 1
    ring0_addr: 65.108.103.217
  }
  node {
    name: x5
    nodeid: 9
    quorum_votes: 1
    ring0_addr: 142.132.128.83
  }
  node {
    name: x8
    nodeid: 1
    quorum_votes: 5
    ring0_addr: 168.119.78.190
  }
  node {
    name: x9
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 162.55.90.37
  }
  node {
    name: z1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 116.202.222.171
  }
  node {
    name: z2
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 136.243.133.76
  }
  node {
    name: z3
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 136.243.132.216
  }
  node {
    name: z4
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 176.9.11.85
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: asodesk
  config_version: 52
  interface {
    linknumber: 0
  }
  ip_version: ipv4
  link_mode: passive
  secauth: on
  version: 2
}
 
I only have 10 nodes


Code:
cat /etc/corosync/corosync.conf
logging {
  debug: on
  to_syslog: yes
}

nodelist {
  node {
    name: staging
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 136.243.39.81
  }
  node {
    name: x3
    nodeid: 10
    quorum_votes: 1
    ring0_addr: 95.216.17.92
  }
  node {
    name: x4
    nodeid: 8
    quorum_votes: 1
    ring0_addr: 65.108.103.217
  }
  node {
    name: x5
    nodeid: 9
    quorum_votes: 1
    ring0_addr: 142.132.128.83
  }
  node {
    name: x8
    nodeid: 1
    quorum_votes: 5
    ring0_addr: 168.119.78.190
  }
  node {
    name: x9
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 162.55.90.37
  }
  node {
    name: z1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 116.202.222.171
  }
  node {
    name: z2
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 136.243.133.76
  }
  node {
    name: z3
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 136.243.132.216
  }
  node {
    name: z4
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 176.9.11.85
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: asodesk
  config_version: 52
  interface {
    linknumber: 0
  }
  ip_version: ipv4
  link_mode: passive
  secauth: on
  version: 2
}
so I read that in order to be able to login to main node, and set to 1 you need to shut down all other nodes first and then set to 1 the main one
 
I'm afraid to turn off all nodes, because now VMs are running on the remaining nodes, and if the quorum is not met after restarting the nodes, the VMs will not start.
Maybe there is a safer way?
 
I'm afraid to turn off all nodes, because now VMs are running on the remaining nodes, and if the quorum is not met after restarting the nodes, the VMs will not start.
Maybe there is a safer way?
lets wait for a Proxmox Support team member then...
to see what they suggest for you.
@efg maybe we will get @Fabian_E to give us a solution. Is very Excellent support member and has helped me before.
PS the E in his name is for Excellence ;)
 
Last edited:
lets wait for a Proxmox Support team member then...
to see what they suggest for you.
@efg maybe we will get @Fabian_E to give us a solution. Is very Excellent support member and has helped me before.
PS the E in his name is for Excellence ;)
Spirog, thank you very much. It's great to have someone to help:).
 
@efg can you post # pveversion -v
pveversion -v proxmox-ve: 7.1-1 (running kernel: 5.13.19-3-pve) pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe) pve-kernel-helper: 7.1-8 pve-kernel-5.13: 7.1-6 pve-kernel-5.4: 6.4-12 pve-kernel-5.13.19-3-pve: 5.13.19-7 pve-kernel-5.4.162-1-pve: 5.4.162-2 ceph-fuse: 14.2.21-1 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: residual config ifupdown2: 3.1.0-1+pmx3 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.1 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-6 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-2 libpve-guest-common-perl: 4.0-3 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.0-15 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.11-1 lxcfs: 4.0.11-pve1 novnc-pve: 1.3.0-1 proxmox-backup-client: 2.1.4-1 proxmox-backup-file-restore: 2.1.4-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-5 pve-cluster: 7.1-3 pve-container: 4.1-3 pve-docs: 7.1-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.3-4 pve-ha-manager: 3.3-3 pve-i18n: 2.6-2 pve-qemu-kvm: 6.1.0-3 pve-xtermjs: 4.16.0-1 qemu-server: 7.1-4 smartmontools: 7.2-pve2 spiceterm: 3.2-2 swtpm: 0.7.0~rc1+2 vncterm: 1.7-1
 
Hi,
please don't use commands like pvecm expected 1 if you still have multiple working nodes, because that would allow the single node to change cluster status.

Please make sure that the nodes can ping each other. Cluster communication needs a low network latency, so using a dedicated network for it is highly recommended. You should try to get enough nodes talking to each other to reach quorum again. Please see also see here for more information.

One of your nodes is configured to have 5 votes, and you might want to change the configuration once the cluster is healthy again, if there is not a good reason for that.
 
@Spirog After hetzner removed the traffic limit, my problem was solved by turning off one of the servers completely and turning it back on :) !

I would like to report on the probable causes of the destruction of the cluster.
When the cluster collapsed, all cluster nodes were up and available (ping). But I got the following message from my ISP:

Code:
Unfortunately, Falkenstein servers are currently experiencing very large inbound attacks. Our technicians are already working on a solution.
We apologize for the inconvenience caused.
Thank you for your understanding.

Due to always different directions (IP addresses, ports, packet size) we unfortunately had to restrict this traffic. This affects UDP traffic on port 9000-65535.
We apologize for the inconvenience caused.


@Fabian_E, could the 9000-65535(udp) limit be the reason for the destruction of the cluster? As far as I know corosync works on ports 5404-5405
 
@Spirog After hetzner removed the traffic limit, my problem was solved by turning off one of the servers completely and turning it back on :) !
awesome... I am glad its working.. that was huge issue with hetzner then... hopefully they got it under control :)
 
@Fabian_E, could the 9000-65535(udp) limit be the reason for the destruction of the cluster? As far as I know corosync works on ports 5404-5405
I don't think so, but likely the network wasn't stable enough for cluster communication. The most important thing is having low latency.
 
  • Like
Reactions: Spirog

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!