Good afternoon
The problem lies in the collapse of the cluster.
there is a cluster with 5 nodes, they were all visible in the web interface, but a couple of days ago something happened and now 2 nodes in the web interfaces are marked with a red cross, while I can see and run virtual machines.
If you go to the web interface of these nodes, they will be marked with a check mark, and other nodes will be marked with a red cross.
I tried a lot of things, but I don't know what to do anymore:
1. executed commands
killall -9 corosync
systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd
systemctl restart pve cluster
systemctl restart corosync
2. pmxcfs -l
3. edited files
/etc/pve/corosync.conf
/etc/corosync/corosync.conf
/etc/corosync/authkey
at each node and brought them into one view.
Here is the content of corosync.conf:
logging {
debug
ff
to_syslog: yes
}
nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.16.1.204
}
node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.16.1.205
}
node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.16.1.211
}
node {
name: pve4
nodeid: 4
quorum_votes: 1
ring0_addr: 10.16.3.125
}
node {
name: pve5
nodeid: 5
quorum_votes: 1
ring0_addr: 10.16.1.216
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: CLUSTER
config_version: 41
interface {
linknumber: 0
transport: udpu
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
4. Next, I rebooted each node and started the services:
service pve cluster start
service pvedaemon start
service pvestatd start
service pveproxy start
5. Through the command pvecm add 10.16...... --force added a node to the cluster, often there were Quorum errors or the fact that the node is already in the cluster.
Used the command:
pvecme(1,2,3...)
6. After that, I restarted the services again
systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd
systemctl restart pve-cluster
systemctl restart corosync
I updated the web interface and, for example, a cluster where there were 3 nodes, I wanted to attach another node to it, and instead of that, he took one of the nodes and created a separate one with it
The error in 1 node occurred due to many attempts to return everything to one cluster, it was not possible to assemble everything into one, a maximum of 4 nodes were together, and one was still a red cross, or 2 nodes each:
root@pve1:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2023-02-22 15:12:05 MSK; 3min 33s ago
Process: 19813 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
Feb 22 15:12:03 pve1 systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.
Feb 22 15:12:03 pve1 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Feb 22 15:12:05 pve1 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Feb 22 15:12:05 pve1 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 22 15:12:05 pve1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 22 15:12:05 pve1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Feb 22 15:12:07 pve1 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 22 15:12:07 pve1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 22 15:12:07 pve1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
pvecm status output:
Node 1:
root@pve4:~# pvecm status
cluster information
--------------------
Name: CLUSTER
Configuration Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activities blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)
Node 2:
root@pve2:~# pvecm status
cluster information
--------------------
Name: CLUSTER
Configuration Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:18:22 2023
Quorum provider: corosync_votequorum
nodes: 2
Node ID: 0x00000002
Ring ID: 2.2e20a
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activities blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.205 (local)
0x00000004 1 10.16.1.216
Node 3:
root@pve3:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:19:11 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000003
Ring ID: 3.2e18e
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211 (local)
0x00000004 1 10.16.3.125
root@pve3:~#
Node 4:
root@pve4:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)
Node 5:
root@pve4:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)
The problem lies in the collapse of the cluster.
there is a cluster with 5 nodes, they were all visible in the web interface, but a couple of days ago something happened and now 2 nodes in the web interfaces are marked with a red cross, while I can see and run virtual machines.
If you go to the web interface of these nodes, they will be marked with a check mark, and other nodes will be marked with a red cross.
I tried a lot of things, but I don't know what to do anymore:
1. executed commands
killall -9 corosync
systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd
systemctl restart pve cluster
systemctl restart corosync
2. pmxcfs -l
3. edited files
/etc/pve/corosync.conf
/etc/corosync/corosync.conf
/etc/corosync/authkey
at each node and brought them into one view.
Here is the content of corosync.conf:
logging {
debug

to_syslog: yes
}
nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.16.1.204
}
node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.16.1.205
}
node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.16.1.211
}
node {
name: pve4
nodeid: 4
quorum_votes: 1
ring0_addr: 10.16.3.125
}
node {
name: pve5
nodeid: 5
quorum_votes: 1
ring0_addr: 10.16.1.216
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: CLUSTER
config_version: 41
interface {
linknumber: 0
transport: udpu
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
4. Next, I rebooted each node and started the services:
service pve cluster start
service pvedaemon start
service pvestatd start
service pveproxy start
5. Through the command pvecm add 10.16...... --force added a node to the cluster, often there were Quorum errors or the fact that the node is already in the cluster.
Used the command:
pvecme(1,2,3...)
6. After that, I restarted the services again
systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd
systemctl restart pve-cluster
systemctl restart corosync
I updated the web interface and, for example, a cluster where there were 3 nodes, I wanted to attach another node to it, and instead of that, he took one of the nodes and created a separate one with it
The error in 1 node occurred due to many attempts to return everything to one cluster, it was not possible to assemble everything into one, a maximum of 4 nodes were together, and one was still a red cross, or 2 nodes each:
root@pve1:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2023-02-22 15:12:05 MSK; 3min 33s ago
Process: 19813 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
Feb 22 15:12:03 pve1 systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.
Feb 22 15:12:03 pve1 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Feb 22 15:12:05 pve1 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Feb 22 15:12:05 pve1 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 22 15:12:05 pve1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 22 15:12:05 pve1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Feb 22 15:12:07 pve1 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 22 15:12:07 pve1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 22 15:12:07 pve1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
pvecm status output:
Node 1:
root@pve4:~# pvecm status
cluster information
--------------------
Name: CLUSTER
Configuration Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activities blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)
Node 2:
root@pve2:~# pvecm status
cluster information
--------------------
Name: CLUSTER
Configuration Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:18:22 2023
Quorum provider: corosync_votequorum
nodes: 2
Node ID: 0x00000002
Ring ID: 2.2e20a
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activities blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.205 (local)
0x00000004 1 10.16.1.216
Node 3:
root@pve3:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:19:11 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000003
Ring ID: 3.2e18e
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211 (local)
0x00000004 1 10.16.3.125
root@pve3:~#
Node 4:
root@pve4:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)
Node 5:
root@pve4:~# pvecm status
Cluster information
-------------------
Name: CLUSTER
Config Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Feb 22 15:19:47 2023
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000004
Ring ID: 3.2e18e
Quorate: No
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 2
Quorum: 3 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000003 1 10.16.1.211
0x00000004 1 10.16.3.125 (local)
Last edited: