Cluster not working after moving all nodes to new DataCenter

mehhos

Member
Jan 27, 2021
8
0
6
56
Hi,
Hope someone can help me here.
We moved proxmox with 6 nodes to new DataCenter, after that Cluster not working. I restart cluster but i didn't help:
root@vsr-app1:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: vsr-app5
nodeid: 2
quorum_votes: 1
ring0_addr: vsr-app5
}

node {
name: vsr-app3
nodeid: 4
quorum_votes: 1
ring0_addr: vsr-app3
}

node {
name: vsr-app6
nodeid: 1
quorum_votes: 1
ring0_addr: vsr-app6
}

node {
name: vsr-app2
nodeid: 5
quorum_votes: 1
ring0_addr: vsr-app2
}

node {
name: vsr-app1
nodeid: 6
quorum_votes: 1
ring0_addr: vsr-app1
}

node {
name: vsr-app4
nodeid: 3
quorum_votes: 1
ring0_addr: vsr-app4
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: vsrappcluster1
config_version: 6
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 10.112.65.19
ringnumber: 0
}

}

root@vsr-app1:~#

root@vsr-app1:~# pvecm status
Quorum information
------------------
Date: Wed Mar 8 14:10:48 2023
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000006
Ring ID: 6/516
Quorate: No

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 1
Quorum: 4 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000006 1 10.112.65.14 (local)
root@vsr-app1:~#

root@vsr-app1:~# pvecm status
Quorum information
------------------
Date: Wed Mar 8 14:10:48 2023
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000006
Ring ID: 6/516
Quorate: No

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 1
Quorum: 4 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000006 1 10.112.65.14 (local)
root@vsr-app1:~# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: exit-code) since Wed 2023-03-08 13:38:48 CET; 32min ago
Process: 39680 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 29490 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=255)
Main PID: 39678 (code=exited, status=0/SUCCESS)

Mar 08 13:38:38 vsr-app1 pmxcfs[29490]: [main] notice: unable to aquire pmxcfs lock - trying again
Mar 08 13:38:38 vsr-app1 pmxcfs[29490]: [main] notice: unable to aquire pmxcfs lock - trying again
Mar 08 13:38:48 vsr-app1 pmxcfs[29490]: [main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
Mar 08 13:38:48 vsr-app1 pmxcfs[29490]: [main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
Mar 08 13:38:48 vsr-app1 pmxcfs[29490]: [main] notice: exit proxmox configuration filesystem (-1)
Mar 08 13:38:48 vsr-app1 systemd[1]: pve-cluster.service: control process exited, code=exited status=255
Mar 08 13:38:48 vsr-app1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Mar 08 13:38:48 vsr-app1 systemd[1]: Unit pve-cluster.service entered failed state.
root@vsr-app1:~#

root@vsr-app1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: active (running) since Wed 2023-03-08 13:37:23 CET; 34min ago
Process: 29120 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
Process: 29141 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
Main PID: 29152 (corosync)
CGroup: /system.slice/corosync.service
└─29152 corosync

Mar 08 13:40:13 vsr-app1 corosync[29152]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 08 13:40:18 vsr-app1 corosync[29152]: [TOTEM ] A new membership (10.112.65.14:508) was formed. Members
Mar 08 13:40:18 vsr-app1 corosync[29152]: [QUORUM] Members[1]: 6
Mar 08 13:40:18 vsr-app1 corosync[29152]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 08 13:40:23 vsr-app1 corosync[29152]: [TOTEM ] A new membership (10.112.65.14:512) was formed. Members
Mar 08 13:40:23 vsr-app1 corosync[29152]: [QUORUM] Members[1]: 6
Mar 08 13:40:23 vsr-app1 corosync[29152]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 08 13:40:27 vsr-app1 corosync[29152]: [TOTEM ] A new membership (10.112.65.14:516) was formed. Members
Mar 08 13:40:27 vsr-app1 corosync[29152]: [QUORUM] Members[1]: 6
Mar 08 13:40:27 vsr-app1 corosync[29152]: [MAIN ] Completed service synchronization, ready to provide service.
root@vsr-app1:~#


root@vsr-app1:~# systemctl status pveproxy
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled)
Active: active (running) since Wed 2023-03-08 06:25:16 CET; 7h ago
Process: 17170 ExecStop=/usr/bin/pveproxy stop (code=exited, status=0/SUCCESS)
Process: 17233 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 17238 (pveproxy)
CGroup: /system.slice/pveproxy.service
├─17238 pveproxy
├─36752 pveproxy worker
├─38228 pveproxy worker
└─46986 pveproxy worker

Mar 08 13:51:22 vsr-app1 pveproxy[17960]: problem with client 10.80.45.53; ssl3_read_bytes: ssl handshake failure
Mar 08 13:51:22 vsr-app1 pveproxy[17960]: Can't call method "timeout_reset" on an undefined value at /usr/share/perl5/PVE/HTTPServer.pm line 227.
Mar 08 13:51:34 vsr-app1 pveproxy[34603]: worker exit
Mar 08 13:51:34 vsr-app1 pveproxy[17238]: worker 34603 finished
Mar 08 13:51:34 vsr-app1 pveproxy[17238]: starting 1 worker(s)
Mar 08 13:51:34 vsr-app1 pveproxy[17238]: worker 38228 started
Mar 08 14:05:00 vsr-app1 pveproxy[17960]: worker exit
Mar 08 14:05:00 vsr-app1 pveproxy[17238]: worker 17960 finished
Mar 08 14:05:00 vsr-app1 pveproxy[17238]: starting 1 worker(s)
Mar 08 14:05:00 vsr-app1 pveproxy[17238]: worker 46986 started
root@vsr-app1:~#
 

Attachments

  • proxmox_1.jpg
    proxmox_1.jpg
    163.2 KB · Views: 7
Can you ping between the nodes on the corosync interface?
Did any IP addresses change while moving?
 
Expected votes: 6
Highest expected: 6
Total votes: 1
Quorum: 4 Activity blocked
This host is isolated, corosync was not able to find its five neighbours. You need to re-establish network connectivity - in a compatible way compared to the past...

Good luck!
 
This host is isolated, corosync was not able to find its five neighbours. You need to re-establish network connectivity - in a compatible way compared to the past...

Good luck!
tnx for replay. All nodes are in same VLAN, all nodes can SSH to each other.
 
Again: "Total votes: 1 Quorum: 4 Activity blocked" tells you that the cluster network does not work. What is the output of corosync-cfgtool -s?

SSH is not sufficient for this. And in your config there are lines like "ring0_addr: vsr-app3". Do these names resolve?
 
Again: "Total votes: 1 Quorum: 4 Activity blocked" tells you that the cluster network does not work. What is the output of corosync-cfgtool -s?

SSH is not sufficient for this. And in your config there are lines like "ring0_addr: vsr-app3". Do these names resolve?
root@vso-app1:~# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id = 10.112.14.163
status = ring 0 active with no faults
root@vso-app1:~#
 
root@vso-app1:~# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id = 10.112.14.163
status = ring 0 active with no faults
root@vso-app1:~#
root@vso-app1:~# nslookup vsr-app3
Server: 89.254.64.20
Address: 89.254.64.20#53

Non-authoritative answer:
Name: vsr-app3.dax.net
Address: 10.112.65.16

root@vso-app1:~#
 
This IP looks off in the output of corosync-cfgtool:
Code:
id = 10.112.14.163

The IPs from pvecm and from the nslookup are in a different subnet maybe? (if this is a /24 subnet)
Code:
10.112.65.16

The output from the corosync-cfgtool looks a bit different than on my local cluster, what versions are you running? (pveversion -v)

What do the hosts files look like? Might the IPs there be wrong? Are you trying to ping with the IP or the hostname? Are UDP packets maybe blocked?

I would also recommend using IPs instead of hostnames in the corosync configuration.

As @UdoB already pointed out, this is certainly a problem with the network configuration somewhere.


From at least two hosts, can you provide the following output?
Code:
cat /etc/network/interfaces
systemctl status networking
ip a
cat /etc/hosts
ping <other_host_ip>
ping <other_host_hostname>
pvecm status
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!