Problem with Claster

sharipov.i.i

New Member
Feb 22, 2023
2
0
1
Good afternoon
The problem lies in the collapse of the cluster.
there is a cluster with 5 nodes, they were all visible in the web interface, but a couple of days ago something happened and now 2 nodes in the web interfaces are marked with a red cross, while I can see and run virtual machines.
If you go to the web interface of these nodes, they will be marked with a check mark, and other nodes will be marked with a red cross.
I tried a lot of things, but I don't know what to do anymore:
1. executed commands
killall -9 corosync
systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd
systemctl restart pve cluster
systemctl restart corosync
2. pmxcfs -l
3. edited files
/etc/pve/corosync.conf
/etc/corosync/corosync.conf
/etc/corosync/authkey
at each node and brought them into one view.

Here is the content of corosync.conf:

logging {
debug:eek:ff
to_syslog: yes
}

nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.16.1.204
}
node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.16.1.205
}
node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.16.1.211
}
node {
name: pve4
nodeid: 4
quorum_votes: 1
ring0_addr: 10.16.3.125
}
node {
name: pve5
nodeid: 5
quorum_votes: 1
ring0_addr: 10.16.1.216
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: CLUSTER
config_version: 41
interface {
linknumber: 0
transport: udpu
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}

4. Next, I rebooted each node and started the services:

service pve cluster start
service pvedaemon start
service pvestatd start
service pveproxy start

5. Through the command pvecm add 10.16...... --force added a node to the cluster, often there were Quorum errors or the fact that the node is already in the cluster.

Used the command:

pvecme(1,2,3...)

6. After that, I restarted the services again

systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd
systemctl restart pve-cluster
systemctl restart corosync

I updated the web interface and, for example, a cluster where there were 3 nodes, I wanted to attach another node to it, and instead of that, he took one of the nodes and created a separate one with it

The error in 1 node occurred due to many attempts to return everything to one cluster, it was not possible to assemble everything into one, a maximum of 4 nodes were together, and one was still a red cross, or 2 nodes each:

root@pve1:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2023-02-22 15:12:05 MSK; 3min 33s ago
Process: 19813 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)

Feb 22 15:12:03 pve1 systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.
Feb 22 15:12:03 pve1 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Feb 22 15:12:05 pve1 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Feb 22 15:12:05 pve1 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 22 15:12:05 pve1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 22 15:12:05 pve1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Feb 22 15:12:07 pve1 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 22 15:12:07 pve1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 22 15:12:07 pve1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.

pvecm status output:

Node 1:

root@pve4:~# pvecm status
cluster information
--------------------
Name: CLUSTER
Configuration Version: 42
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activities blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)


Node 2:

root@pve2:~# pvecm status
cluster information
--------------------
Name: CLUSTER
Configuration Version: 42
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Feb 22 15:18:22 2023
Quorum provider: corosync_votequorum
nodes: 2
Node ID: 0x00000002
Ring ID: 2.2e20a
Quorate: No

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activities blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.205 (local)
0x00000004 1 10.16.1.216

Node 3:

root@pve3:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Feb 22 15:19:11 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000003
Ring ID: 3.2e18e
Quorate: No

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211 (local)
0x00000004 1 10.16.3.125
root@pve3:~#


Node 4:

root@pve4:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)

Node 5:

root@pve4:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)
 
Last edited:
I'm sorry if the design is incorrect

and another question is often, Quorum errors have started to appear, how to configure them correctly?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!