Hi,
I have 3 nodes:
History of my problem:
After one reboot of shv2 there was e problem with quorum. Corosync problems. So I updated all nodes. After rebbot shv was ok, but after restart node shv3 problem return on node shv3 (after upgrade). What can be a problem? Some info around... In logs there was a problem with ntp debian servers, but I have set in /etc/ntp.conf my local ntp server, and i've hashed all default debian servers... ?!?
VERSIONS
-----------
Not restarted after update (Broken live migration / some vm are online), so on shv1 is older kernel
SHV3 NODE PROBLEM LOGS
-------------------------------
I have 3 nodes:
shv1 | shv2 | shv3 | |
hostname | ok | ok | ok |
dns | ok | ok | ok |
ssh | ok | ok | ok |
pvecm status | ok | ok | err |
systemctl status corosync | ok | ok | err |
systemctl status pve-cluster | ok (rw) | ok (rw) | err (ro) |
apt-get update && apt-get dist-upgrade | ok (not restarted) | ok (restared) | ok (restared) |
MULTICAST ADDRESS: netstat -g (vmbr0 1 239.192.2.227) | ok | ok | err |
MULTICAST PING (239.192.2.227)
Code:
| ok | ok | err |
MULTICAST PING (239.192.2.227)
Code:
| ok | ok | ok |
date / hwclock | ok | ok | ok |
ntp installed | ok | ok | ok |
History of my problem:
After one reboot of shv2 there was e problem with quorum. Corosync problems. So I updated all nodes. After rebbot shv was ok, but after restart node shv3 problem return on node shv3 (after upgrade). What can be a problem? Some info around... In logs there was a problem with ntp debian servers, but I have set in /etc/ntp.conf my local ntp server, and i've hashed all default debian servers... ?!?
VERSIONS
-----------
Not restarted after update (Broken live migration / some vm are online), so on shv1 is older kernel
Code:
Linux shv1 4.2.2-1-pve #1 SMP Mon Oct 5 18:23:31 CEST 2015 x86_64 GNU/Linux
shv1:~# pveversion -v
proxmox-ve: 4.0-22 (running kernel: 4.2.2-1-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
openvswitch-switch: 2.3.2-1
Code:
Linux shv2 4.2.3-2-pve #1 SMP Sun Nov 15 16:08:19 CET 2015 x86_64 GNU/Linux
shv2:~# pveversion -v
proxmox-ve: 4.0-22 (running kernel: 4.2.3-2-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
openvswitch-switch: 2.3.2-1
Code:
Linux shv3 4.2.3-2-pve #1 SMP Sun Nov 15 16:08:19 CET 2015 x86_64 GNU/Linux
shv3:~# pveversion -v
proxmox-ve: 4.0-22 (running kernel: 4.2.3-2-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
openvswitch-switch: 2.3.2-1
SHV3 NODE PROBLEM LOGS
-------------------------------
Code:
systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Fri 2015-12-04 15:42:25 CET; 12min ago
Process: 2152 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0
Process: 2136 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 2150 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─2150 /usr/bin/pmxcfs
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [quorum] crit: quorum_initialize failed: 2
Dec 04 15:54:36 shv3 pmxcfs[2150]: [confdb] crit: cmap_initialize failed: 2
Dec 04 15:54:36 shv3 pmxcfs[2150]: [dcdb] crit: cpg_initialize failed: 2
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_initialize failed: 2
Code:
systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: failed (Result: exit-code) since Fri 2015-12-04 15:43:25 CET; 11min ago
Process: 2658 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAIL
Dec 04 15:42:25 shv3 corosync[2721]: [QB ] server name: cpg
Dec 04 15:42:25 shv3 corosync[2721]: [SERV ] Service engine loaded: corosync profile lo
Dec 04 15:42:25 shv3 corosync[2721]: [QUORUM] Using quorum provider corosync_votequorum
Dec 04 15:42:25 shv3 corosync[2721]: [QUORUM] Quorum provider: corosync_votequorum faile
Dec 04 15:42:25 shv3 corosync[2721]: [SERV ] Service engine 'corosync_quorum' failed to
Dec 04 15:42:25 shv3 corosync[2721]: [MAIN ] Corosync Cluster Engine exiting with statu
Dec 04 15:43:25 shv3 corosync[2658]: Starting Corosync Cluster Engine (corosync): [FAILE
Dec 04 15:43:25 shv3 systemd[1]: corosync.service: control process exited, code=exited s
Dec 04 15:43:25 shv3 systemd[1]: Failed to start Corosync Cluster Engine.
Dec 04 15:43:25 shv3 systemd[1]: Unit corosync.service entered failed state.
Last edited: