Hi.
We have a 4 node cluster, with ceph and subscription repositories, and one of the nodes, in the last updates, is allways in "fence" mode in the cluster. This has as a consequence that i can not migrate VMs with HA active to and from this node.
I have searched everything i can think, and i can't find a reason or a solution to this problem. I have tried service restarts, system restarts, etc, and nothing fixes the issue. I also can't find nothing relevent on Google.
Output from commands on the node afected (nodeD)
ha-manager status --verbose
vi /etc/corosync/corosync.conf
journalctl -u pve-ha-crm
journalctl -u corosync
Package Versions (equal on the 4 nodes):
I someone can help, i appreciate it. Thanks.
We have a 4 node cluster, with ceph and subscription repositories, and one of the nodes, in the last updates, is allways in "fence" mode in the cluster. This has as a consequence that i can not migrate VMs with HA active to and from this node.
I have searched everything i can think, and i can't find a reason or a solution to this problem. I have tried service restarts, system restarts, etc, and nothing fixes the issue. I also can't find nothing relevent on Google.
Output from commands on the node afected (nodeD)
ha-manager status --verbose
Code:
quorum OK
master nodec (active, Sat Oct 6 15:21:22 2018)
lrm nodea (idle, Sat Oct 6 15:21:25 2018)
lrm nodeb (active, Sat Oct 6 15:21:19 2018)
lrm nodec (active, Sat Oct 6 15:21:25 2018)
lrm noded (idle, Sat Oct 6 15:21:25 2018)
full cluster state:
{
"lrm_status" : {
"nodea" : {
"mode" : "active",
"results" : {},
"state" : "wait_for_agent_lock",
"timestamp" : 1538835685
},
"nodeb" : {
"mode" : "active",
"results" : {
"WbYc19DhuDGOXgdSDoo7yA" : {
"exit_code" : 7,
"sid" : "vm:147",
"state" : "started"
}
},
"state" : "active",
"timestamp" : 1538835679
},
"nodec" : {
"mode" : "active",
"results" : {
"gacBZedFAdXV2F0OstGfhA" : {
"exit_code" : 0,
"sid" : "vm:147",
"state" : "migrate"
}
},
"state" : "active",
"timestamp" : 1538835685
},
"noded" : {
"mode" : "active",
"results" : {},
"state" : "wait_for_agent_lock",
"timestamp" : 1538835685
}
},
"manager_status" : {
"master_node" : "nodec",
"node_status" : {
"nodea" : "online",
"nodeb" : "online",
"nodec" : "online",
"noded" : "fence"
},
"service_status" : {},
"timestamp" : 1538835682
},
"quorum" : {
"node" : "noded",
"quorate" : "1"
}
}
vi /etc/corosync/corosync.conf
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: nodea
nodeid: 1
quorum_votes: 1
ring0_addr: nodea
}
node {
name: nodeb
nodeid: 2
quorum_votes: 1
ring0_addr: nodeb
}
node {
name: nodec
nodeid: 3
quorum_votes: 1
ring0_addr: nodec
}
node {
name: noded
nodeid: 4
quorum_votes: 1
ring0_addr: noded
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: zwame
config_version: 4
interface {
bindnetaddr: 10.133.10.3
ringnumber: 0
journalctl -u pve-ha-crm
Code:
-- Logs begin at Tue 2018-10-02 21:36:03 WEST, end at Sat 2018-10-06 15:31:01 WEST. --
Oct 02 21:36:12 noded systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Oct 02 21:36:13 noded pve-ha-crm[5197]: starting server
Oct 02 21:36:13 noded pve-ha-crm[5197]: status change startup => wait_for_quorum
Oct 02 21:36:13 noded systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Oct 02 21:44:38 noded pve-ha-crm[5197]: status change wait_for_quorum => slave
Oct 02 22:07:45 noded pve-ha-crm[5197]: status change slave => wait_for_quorum
journalctl -u corosync
Code:
Oct 02 22:26:06 noded corosync[94145]: [SERV ] Service engine loaded: corosync watchdog service [7]
Oct 02 22:26:06 noded corosync[94145]: [QUORUM] Using quorum provider corosync_votequorum
Oct 02 22:26:06 noded corosync[94145]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Oct 02 22:26:06 noded corosync[94145]: [QB ] server name: votequorum
Oct 02 22:26:06 noded corosync[94145]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Oct 02 22:26:06 noded corosync[94145]: [QB ] server name: quorum
Oct 02 22:26:06 noded corosync[94145]: [TOTEM ] A new membership (10.133.10.6:4244) was formed. Members joined: 4
Oct 02 22:26:06 noded corosync[94145]: [CPG ] downlist left_list: 0 received
Oct 02 22:26:06 noded corosync[94145]: [QUORUM] Members[1]: 4
Oct 02 22:26:06 noded corosync[94145]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 02 22:26:06 noded corosync[94145]: notice [TOTEM ] A new membership (10.133.10.3:4248) was formed. Members joined: 1 2 3
Oct 02 22:26:06 noded corosync[94145]: [TOTEM ] A new membership (10.133.10.3:4248) was formed. Members joined: 1 2 3
Oct 02 22:26:06 noded corosync[94145]: warning [CPG ] downlist left_list: 0 received
Oct 02 22:26:06 noded corosync[94145]: warning [CPG ] downlist left_list: 0 received
Oct 02 22:26:06 noded corosync[94145]: [CPG ] downlist left_list: 0 received
Oct 02 22:26:06 noded corosync[94145]: warning [CPG ] downlist left_list: 0 received
Oct 02 22:26:06 noded corosync[94145]: [CPG ] downlist left_list: 0 received
Oct 02 22:26:06 noded corosync[94145]: [CPG ] downlist left_list: 0 received
Oct 02 22:26:06 noded corosync[94145]: [CPG ] downlist left_list: 0 received
Oct 02 22:26:06 noded corosync[94145]: [QUORUM] This node is within the primary component and will provide service.
Oct 02 22:26:06 noded corosync[94145]: notice [QUORUM] This node is within the primary component and will provide service.
Oct 02 22:26:06 noded corosync[94145]: notice [QUORUM] Members[4]: 1 2 3 4
Oct 02 22:26:06 noded corosync[94145]: notice [MAIN ] Completed service synchronization, ready to provide service.
Oct 02 22:26:06 noded corosync[94145]: [QUORUM] Members[4]: 1 2 3 4
Oct 02 22:26:06 noded corosync[94145]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 04 09:30:06 noded corosync[94145]: notice [TOTEM ] Retransmit List: e0f99 e0f9a e0f9b e0f9c
Oct 04 09:30:06 noded corosync[94145]: [TOTEM ] Retransmit List: e0f99 e0f9a e0f9b e0f9c
Oct 04 09:30:06 noded corosync[94145]: [TOTEM ] Retransmit List: e0f99 e0f9a e0f9b e0f9c
Oct 04 09:30:06 noded corosync[94145]: notice [TOTEM ] Retransmit List: e0f99 e0f9a e0f9b e0f9c
Package Versions (equal on the 4 nodes):
Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.18-5-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-8
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-29
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-27
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-35
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1
I someone can help, i appreciate it. Thanks.