Hi all
I have a three node cluster. In the web gui of node1 and node2, node3 is shown as offline, while the web gui of node3 shows node1 and node2 offline.
daemon.log of node3:
This appears every minute. Node1 and node2 have no errors.
I have seen, that this could be a multicast(IGMP) problem, but I think this isnt the case here.
node3 - omping:
node3 - pvecm status:
corosync:
On all three nodes the config_version is 3.
I am not using HA.
pveversion
node1 - pve-manager/5.2-10/6f892b40 (running kernel: 4.15.18-7-pve)
node2 - pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
node3 - pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
I have really no clue, what the problem could be. Does anybody know this problem? I would be glad, if someone could help or guide me in the right direction.
Best regards
Joel
I have a three node cluster. In the web gui of node1 and node2, node3 is shown as offline, while the web gui of node3 shows node1 and node2 offline.
daemon.log of node3:
Code:
Jan 8 11:49:00 drax systemd[1]: Starting Proxmox VE replication runner...
Jan 8 11:49:02 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:03 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:04 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:05 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:06 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:07 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:08 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:09 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:10 drax pvesr[12250]: trying to acquire cfs lock 'file-replication_cfg' ...
Jan 8 11:49:11 drax pvesr[12250]: error with cfs lock 'file-replication_cfg': no quorum!
Jan 8 11:49:11 drax systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jan 8 11:49:11 drax systemd[1]: Failed to start Proxmox VE replication runner.
Jan 8 11:49:11 drax systemd[1]: pvesr.service: Unit entered failed state.
Jan 8 11:49:11 drax systemd[1]: pvesr.service: Failed with result 'exit-code'.
I have seen, that this could be a multicast(IGMP) problem, but I think this isnt the case here.
node3 - omping:
Code:
omping -c 10000 -i 0.001 -F -q 10.200.1.20 10.200.1.21 10.200.1.22
10.200.1.20 : waiting for response msg
10.200.1.21 : waiting for response msg
10.200.1.20 : joined (S,G) = (*, 232.43.211.234), pinging
10.200.1.21 : joined (S,G) = (*, 232.43.211.234), pinging
10.200.1.20 : given amount of query messages was sent
10.200.1.21 : waiting for response msg
10.200.1.21 : server told us to stop
10.200.1.20 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.087/0.149/1.495/0.045
10.200.1.20 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.087/0.248/6.404/0.750
10.200.1.21 : unicast, xmt/rcv/%loss = 9029/9029/0%, min/avg/max/std-dev = 0.076/0.112/1.469/0.033
10.200.1.21 : multicast, xmt/rcv/%loss = 9029/9029/0%, min/avg/max/std-dev = 0.087/0.121/2.116/0.036
node3 - pvecm status:
Code:
pvecm status
Quorum information
------------------
Date: Tue Jan 8 11:56:56 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000003
Ring ID: 1/1628
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.200.1.20
0x00000002 1 10.200.1.21
0x00000003 1 10.200.1.22 (local)
corosync:
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: antman
nodeid: 2
quorum_votes: 1
ring0_addr: 10.200.1.21
}
node {
name: drax
nodeid: 3
quorum_votes: 1
ring0_addr: 10.200.1.22
}
node {
name: rocket
nodeid: 1
quorum_votes: 1
ring0_addr: 10.200.1.20
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: proxmox-cluster
config_version: 3
interface {
bindnetaddr: 10.200.1.20
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
I am not using HA.
pveversion
node1 - pve-manager/5.2-10/6f892b40 (running kernel: 4.15.18-7-pve)
node2 - pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
node3 - pve-manager/5.3-5/97ae681d (running kernel: 4.15.18-9-pve)
I have really no clue, what the problem could be. Does anybody know this problem? I would be glad, if someone could help or guide me in the right direction.
Best regards
Joel