Cluster issue /etc/pve/.members

PretoX

Well-Known Member
Apr 5, 2016
44
10
48
38
Hi,
please help, I can't figure out how this file is generated
.members file takes external dns name and external ip, not the one that are provided by corosync
so cluster says it's ok, but no migrations or gui configuration is possible, see screen
Знімок екрана 2017-06-14 12.48.46.png
# cat /etc/pve/.members
{
"nodename": "pm2-bne2",
"version": 4,
"cluster": { "name": "ING-FOREST", "version": 6, "nodes": 2, "quorate": 1 },
"nodelist": {
"pm2-bne2": { "id": 1, "online": 1, "ip": "210.xxx.xxx.100"},
"backup-bne2": { "id": 2, "online": 1, "ip": "210.xxx.xxx.6"}
}
}
Quorum information
------------------
Date: Wed Jun 14 19:15:17 2017
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 2/156
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.153.6
0x00000001 1 192.168.153.100 (local)
# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
2 1 backup-bne2.cluster
1 1 pm2-bne2.cluster (local)
# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pm2-bne2
nodeid: 1
quorum_votes: 1
ring0_addr: pm2-bne2.cluster
}

node {
name: backup-bne2
nodeid: 2
quorum_votes: 1
ring0_addr: backup-bne2.cluster
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: ING-FOREST
config_version: 6
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.153.100
ringnumber: 0
}

}
#Cluster data!
192.168.153.100 pm2-bne2.cluster
192.168.153.6 backup-bne2.cluster
 
Hi,

restart this 2 services
systemctl restart pve-cluster.service
systemctl restart corosync.service
 
  • Like
Reactions: hvisage
root@pm2-bne2:/etc# systemctl restart pve-cluster.service
root@pm2-bne2:/etc# systemctl restart corosync.service
root@pm2-bne2:/etc# cat /etc/pve/.members
{
"nodename": "pm2-bne2",
"version": 4,
"cluster": { "name": "ING-FOREST", "version": 7, "nodes": 2, "quorate": 0 },
"nodelist": {
"pm2-bne2": { "id": 1, "online": 1, "ip": "210.xxx.xxx.100"},
"backup-bne2": { "id": 2, "online": 1, "ip": "210.xxx.xxx.6"}
}
}
Tried changing corosync version. Still the same

# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Wed 2017-06-14 21:25:09 AEST; 1min 22s ago
Process: 8569 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 8565 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 8567 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─8567 /usr/bin/pmxcfs

Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [dcdb] notice: received sync request (epoch 1/8567/00000002)
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [status] notice: received sync request (epoch 1/8567/00000002)
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [dcdb] notice: received all states
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [dcdb] notice: leader is 2/22050
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [dcdb] notice: synced members: 2/22050
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [dcdb] notice: waiting for updates from leader
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [status] notice: received all states
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [status] notice: all data is up to date
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [dcdb] notice: update complete - trying to commit (got 2 inode updates)
Jun 14 21:25:20 pm2-bne2 pmxcfs[8567]: [dcdb] notice: all data is up to date


root@pm2-bne2:/etc# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: active (running) since Wed 2017-06-14 21:25:15 AEST; 1min 49s ago
Process: 8589 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
Process: 8618 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
Main PID: 8639 (corosync)
CGroup: /system.slice/corosync.service
└─8639 corosync

Jun 14 21:25:14 pm2-bne2 corosync[8639]: [QB ] server name: quorum
Jun 14 21:25:14 pm2-bne2 corosync[8639]: [TOTEM ] A new membership (192.168.153.100:168) was formed. Members joined: 1
Jun 14 21:25:14 pm2-bne2 corosync[8639]: [QUORUM] Members[1]: 1
Jun 14 21:25:14 pm2-bne2 corosync[8639]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 14 21:25:14 pm2-bne2 corosync[8639]: [TOTEM ] A new membership (192.168.153.6:172) was formed. Members joined: 2
Jun 14 21:25:14 pm2-bne2 corosync[8639]: [QUORUM] This node is within the primary component and will provide service.
Jun 14 21:25:14 pm2-bne2 corosync[8639]: [QUORUM] Members[2]: 2 1
Jun 14 21:25:14 pm2-bne2 corosync[8639]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 14 21:25:15 pm2-bne2 corosync[8618]: Starting Corosync Cluster Engine (corosync): [ OK ]
Jun 14 21:25:15 pm2-bne2 systemd[1]: Started Corosync Cluster Engine.
 
Can you please send the output of
ps aux | grep pmxcfs
 
# ps aux | grep pmxcfs
root 8567 0.2 0.0 644684 44100 ? Ssl 21:25 0:02 /usr/bin/pmxcfs
root 13342 0.0 0.0 12732 1916 pts/0 S+ 21:37 0:00 grep pmxcfs

I tested creating files in /etc/pve dir, files get synced good.
 
Hi, any help will be appreciated, now I have to enable ssh on external interface to have cluster working
 
can you write on the pxmcfs?

Code:
touch /etc/pve/test
 
Is the .memebers on both nodes wrong?
 
Is the .memebers on both nodes wrong?
all except "version", don't know what's the difference between this version and cluster version
# cat /etc/pve/.members
{
"nodename": "pm2-bne2",
"version": 6,
"cluster": { "name": "ING-FOREST", "version": 7, "nodes": 2, "quorate": 1 },
"nodelist": {
"pm2-bne2": { "id": 1, "online": 1, "ip": "210.xxx.xxx.100"},
"backup-bne2": { "id": 2, "online": 1, "ip": "210.xxx.xxx.6"}
}
}

# cat /etc/pve/.members
{
"nodename": "backup-bne2",
"version": 15,
"cluster": { "name": "ING-FOREST", "version": 7, "nodes": 2, "quorate": 1 },
"nodelist": {
"pm2-bne2": { "id": 1, "online": 1, "ip": "210.xxx.xxx.100"},
"backup-bne2": { "id": 2, "online": 1, "ip": "210.xxx.xxx.6"}
}
}
 
Do you have HA enabled?
 
Pleas disable all HA Services temporary.
If you do not remove the HA Services you can be fenced.

Then increase the version to 21 in the /etc/pve/corosync.conf
Now check if this is copied in /etc/corosysnc/corosync.conf on both nodes.

Do this on both nodes
Stop the cororsync.service
Stop the pve-cluster.service
kill the pmxcfs

now start corosync and pve-cluster
 
Pleas disable all HA Services temporary.
If you do not remove the HA Services you can be fenced.

Then increase the version to 21 in the /etc/pve/corosync.conf
Now check if this is copied in /etc/corosysnc/corosync.conf on both nodes.

Do this on both nodes
Stop the cororsync.service
Stop the pve-cluster.service
kill the pmxcfs

now start corosync and pve-cluster

root@pm2-bne2:~# cat /etc/pve/.members
{
"nodename": "pm2-bne2",
"version": 4,
"cluster": { "name": "ING-FOREST", "version": 21, "nodes": 2, "quorate": 1 },
"nodelist": {
"pm2-bne2": { "id": 1, "online": 1, "ip": "210.xxx.xxx.100"},
"backup-bne2": { "id": 2, "online": 1, "ip": "210.xxx.xxx.6"}
}
}
root@pm2-bne2:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pm2-bne2
nodeid: 1
quorum_votes: 1
ring0_addr: pm2-bne2.cluster
}

node {
name: backup-bne2
nodeid: 2
quorum_votes: 1
ring0_addr: backup-bne2.cluster
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: ING-FOREST
config_version: 21
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.xxx.xxx.100
ringnumber: 0
}

}
 
Your problem is your /etc/hosts
You send not all data in this file so you resolve in pm2-bne2 and not pm2-bne2.cluster.

I would change the corosync.conf and write the ip instead the name of the node.
 
So how should I change that to make it work?
root@pm2-bne2:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
210.xxx.xxx.100 pm2-bne2.xxxxxx.au pm2-bne2 pvelocalhost

#Cluster data!
192.168.153.100 pm2-bne2.cluster
192.168.153.6 backup-bne2.cluster
root@backup-bne2:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
210.xxx.xxx.6 backup-bne2.xxxxxx.au backup-bne2 pvelocalhost

#Cluster data!
192.168.153.100 pm2-bne2.cluster
192.168.153.6 backup-bne2.cluster
 
Change the corosync.conf in /etc/pve/
the filed ring0_addr: host_ip
Then you have to restart the daemons.

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: pm2-bne2
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.153.100
}

node {
name: backup-bne2
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.153.6
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: ING-FOREST
config_version: 7
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.153.100
ringnumber: 0
}

}
 
Finally got opportunity to reboot nodes - after reboot I lost connection between nodes partially:
Знімок екрана 2017-06-30 16.34.05.png
Still external IP in the members file....
# cat /etc/pve/.members
{
"nodename": "pm2-bne2",
"version": 3,
"cluster": { "name": "ING-FOREST", "version": 25, "nodes": 2, "quorate": 0 },
"nodelist": {
"pm2-bne2": { "id": 1, "online": 1, "ip": "xxx.xxx.153.100"},
"backup-bne2": { "id": 2, "online": 0}
}
}
 
Finally got opportunity to reboot nodes - after reboot I lost connection between nodes partially:
View attachment 5424
Still external IP in the members file....
# cat /etc/pve/.members

In our case we needed to edit /etc/hosts with the own internal ip addres for each host and then we made

systemctl restart corosync.service
systemctl restart pve-cluster.service
systemctl restart pvedaemon.service
systemctl restart pveproxy.service


without restaring the nodes
 
In our case we needed to edit /etc/hosts with the own internal ip addres for each host and then we made

systemctl restart corosync.service
systemctl restart pve-cluster.service
systemctl restart pvedaemon.service
systemctl restart pveproxy.service


without restaring the nodes
yeah, that's I tried first of all
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!