Corosync In 'Activating' or Start .. and not in Running mode

Raja Saha

New Member
Nov 29, 2018
3
0
1
46
Hi!,

I'm trying to evaluate virtualization platform for my future project and proxmox seems interesting.

had installed in two node and works fine for few days. However after a switch failover one day , one of my node started giving issues. All processes are fine except for corosync . It shows as 'Activating' in node 1 and fails after a while. The other master node works fine. When corosync fails I'm not be to migrate the VMs etc. Can please help. Can some kind soul please advise... My version info is as

----Pve version from Node#1----
proxmox-ve: 5.2-2 (running kernel: 4.15.18-8-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-11
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.4.134-1-pve: 4.4.134-112
pve-kernel-4.4.19-1-pve: 4.4.19-66
ceph: 12.2.9-1~bpo90+1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.12.15-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-9
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve2~bpo1

--------- Corosync.conf from Node #1----
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: ils-phy-pri
nodeid: 1
quorum_votes: 1
ring0_addr: ils-phy-pri
}
node {
name: ils-phy-sec
nodeid: 2
quorum_votes: 1
ring0_addr: 172.24.0.2
}
}

quorum {
expected_votes: 2
last_man_standing: 1
last_man_standing_window: 1000
provider: corosync_votequorum
two_node: 1
}

totem {
cluster_name: ILS-PHY-CLUSTER
config_version: 10
interface {
bindnetaddr: 172.24.0.2
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

----Corosync.conf from Master Node----
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: ils-phy-pri
nodeid: 1
quorum_votes: 1
ring0_addr: ils-phy-pri
}
node {
name: ils-phy-sec
nodeid: 2
quorum_votes: 1
ring0_addr: 172.24.0.2
}
}

quorum {
expected_votes: 2
last_man_standing: 1
last_man_standing_window: 1000
provider: corosync_votequorum
two_node: 1
}

totem {
cluster_name: ILS-PHY-CLUSTER
config_version: 10
interface {
bindnetaddr: 172.24.0.2
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

upload_2018-11-29_12-57-56.png

upload_2018-11-29_12-58-43.png
 
what does systemctl status corosync say?
anything in the journal ?
 
@Dominik .. It says 'Activating' for days.. but when I reboot the node , the following service does not start a) pvedaemon pveproxy and corosync. though they are enabled . pvedaemon is loaded and dead and so is pveproxy. However I manually started pvedaemon and corosync stays there as 'Activating' .

Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: activating (start) since Thu 2018-11-29 23:23:34 +08; 5min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Cntrl PID: 66703 (corosync)
Tasks: 2 (limit: 11059)
Memory: 38.6M
CPU: 3.865s
CGroup: /system.slice/corosync.service
└─66703 /usr/sbin/corosync -f

Nov 29 23:23:34 ils-phy-pri corosync[66703]: [CPG ] downlist left_list: 0 received
Nov 29 23:23:34 ils-phy-pri corosync[66703]: [CPG ] downlist left_list: 0 received
Nov 29 23:23:34 ils-phy-pri corosync[66703]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Nov 29 23:23:34 ils-phy-pri corosync[66703]: notice [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Nov 29 23:23:34 ils-phy-pri corosync[66703]: notice [QUORUM] This node is within the primary component and will provide service.
Nov 29 23:23:34 ils-phy-pri corosync[66703]: notice [QUORUM] Members[2]: 1 2
Nov 29 23:23:34 ils-phy-pri corosync[66703]: notice [MAIN ] Completed service synchronization, ready to provide service.
Nov 29 23:23:34 ils-phy-pri corosync[66703]: [QUORUM] This node is within the primary component and will provide service.
Nov 29 23:23:34 ils-phy-pri corosync[66703]: [QUORUM] Members[2]: 1 2
Nov 29 23:23:34 ils-phy-pri corosync[66703]: [MAIN ] Completed service synchronization, ready to provide service.
 
Oh My Bad ... Issue resolved as I discovered that the systemd file of corosync under /lib/systemd/system/corosync.service file was corrupted