Hi,
I have 3 nodes:
[TABLE="width: 800"]
[TR]
[TD][/TD]
[TD]shv1[/TD]
[TD]shv2[/TD]
[TD]shv3[/TD]
[/TR]
[TR]
[TD]hostname[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]dns[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]ssh[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]pvecm status[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]err[/TD]
[/TR]
[TR]
[TD]systemctl status corosync[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]err[/TD]
[/TR]
[TR]
[TD]systemctl status pve-cluster[/TD]
[TD]ok (rw)[/TD]
[TD]ok (rw)[/TD]
[TD]err (ro)[/TD]
[/TR]
[TR]
[TD]apt-get update && apt-get dist-upgrade[/TD]
[TD]ok (not restarted)[/TD]
[TD]ok (restared)[/TD]
[TD]ok (restared)[/TD]
[/TR]
[TR]
[TD]MULTICAST ADDRESS: netstat -g (vmbr0 1 239.192.2.227)[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]err[/TD]
[/TR]
[TR]
[TD]MULTICAST PING (239.192.2.227)
[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]err[/TD]
[/TR]
[TR]
[TD]MULTICAST PING (239.192.2.227)
[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]date / hwclock[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]ntp installed[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[/TABLE]
History of my problem:
After one reboot of shv2 there was e problem with quorum. Corosync problems. So I updated all nodes. After rebbot shv was ok, but after restart node shv3 problem return on node shv3 (after upgrade). What can be a problem? Some info around... In logs there was a problem with ntp debian servers, but I have set in /etc/ntp.conf my local ntp server, and i've hashed all default debian servers... ?!?
VERSIONS
-----------
Not restarted after update (Broken live migration / some vm are online), so on shv1 is older kernel
SHV3 NODE PROBLEM LOGS
-------------------------------
I have 3 nodes:
[TABLE="width: 800"]
[TR]
[TD][/TD]
[TD]shv1[/TD]
[TD]shv2[/TD]
[TD]shv3[/TD]
[/TR]
[TR]
[TD]hostname[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]dns[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]ssh[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]pvecm status[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]err[/TD]
[/TR]
[TR]
[TD]systemctl status corosync[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]err[/TD]
[/TR]
[TR]
[TD]systemctl status pve-cluster[/TD]
[TD]ok (rw)[/TD]
[TD]ok (rw)[/TD]
[TD]err (ro)[/TD]
[/TR]
[TR]
[TD]apt-get update && apt-get dist-upgrade[/TD]
[TD]ok (not restarted)[/TD]
[TD]ok (restared)[/TD]
[TD]ok (restared)[/TD]
[/TR]
[TR]
[TD]MULTICAST ADDRESS: netstat -g (vmbr0 1 239.192.2.227)[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]err[/TD]
[/TR]
[TR]
[TD]MULTICAST PING (239.192.2.227)
Code:
ping 239.192.2.227PING 239.192.2.227 (239.192.2.227) 56(84) bytes of data.
64 bytes from 10.64.2.1: icmp_seq=1 ttl=64 time=0.028 ms
64 bytes from 10.64.2.2: icmp_seq=1 ttl=64 time=0.218 ms (DUP!)
[TD]ok[/TD]
[TD]ok[/TD]
[TD]err[/TD]
[/TR]
[TR]
[TD]MULTICAST PING (239.192.2.227)
Code:
ping 224.0.0.1
PING 224.0.0.1 (224.0.0.1) 56(84) bytes of data.
64 bytes from 10.64.2.1: icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from 10.64.2.3: icmp_seq=1 ttl=64 time=0.131 ms (DUP!)
64 bytes from 10.64.2.2: icmp_seq=1 ttl=64 time=0.241 ms (DUP!)
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]date / hwclock[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[TR]
[TD]ntp installed[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[TD]ok[/TD]
[/TR]
[/TABLE]
History of my problem:
After one reboot of shv2 there was e problem with quorum. Corosync problems. So I updated all nodes. After rebbot shv was ok, but after restart node shv3 problem return on node shv3 (after upgrade). What can be a problem? Some info around... In logs there was a problem with ntp debian servers, but I have set in /etc/ntp.conf my local ntp server, and i've hashed all default debian servers... ?!?
VERSIONS
-----------
Not restarted after update (Broken live migration / some vm are online), so on shv1 is older kernel
Code:
Linux shv1 4.2.2-1-pve #1 SMP Mon Oct 5 18:23:31 CEST 2015 x86_64 GNU/Linux
shv1:~# pveversion -v
proxmox-ve: 4.0-22 (running kernel: 4.2.2-1-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
openvswitch-switch: 2.3.2-1
Code:
Linux shv2 4.2.3-2-pve #1 SMP Sun Nov 15 16:08:19 CET 2015 x86_64 GNU/Linux
shv2:~# pveversion -v
proxmox-ve: 4.0-22 (running kernel: 4.2.3-2-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
openvswitch-switch: 2.3.2-1
Code:
Linux shv3 4.2.3-2-pve #1 SMP Sun Nov 15 16:08:19 CET 2015 x86_64 GNU/Linux
shv3:~# pveversion -v
proxmox-ve: 4.0-22 (running kernel: 4.2.3-2-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie
openvswitch-switch: 2.3.2-1
SHV3 NODE PROBLEM LOGS
-------------------------------
Code:
systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Fri 2015-12-04 15:42:25 CET; 12min ago
Process: 2152 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0
Process: 2136 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 2150 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─2150 /usr/bin/pmxcfs
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_send_message failed: 9
Dec 04 15:54:36 shv3 pmxcfs[2150]: [quorum] crit: quorum_initialize failed: 2
Dec 04 15:54:36 shv3 pmxcfs[2150]: [confdb] crit: cmap_initialize failed: 2
Dec 04 15:54:36 shv3 pmxcfs[2150]: [dcdb] crit: cpg_initialize failed: 2
Dec 04 15:54:36 shv3 pmxcfs[2150]: [status] crit: cpg_initialize failed: 2
Code:
systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: failed (Result: exit-code) since Fri 2015-12-04 15:43:25 CET; 11min ago
Process: 2658 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAIL
Dec 04 15:42:25 shv3 corosync[2721]: [QB ] server name: cpg
Dec 04 15:42:25 shv3 corosync[2721]: [SERV ] Service engine loaded: corosync profile lo
Dec 04 15:42:25 shv3 corosync[2721]: [QUORUM] Using quorum provider corosync_votequorum
Dec 04 15:42:25 shv3 corosync[2721]: [QUORUM] Quorum provider: corosync_votequorum faile
Dec 04 15:42:25 shv3 corosync[2721]: [SERV ] Service engine 'corosync_quorum' failed to
Dec 04 15:42:25 shv3 corosync[2721]: [MAIN ] Corosync Cluster Engine exiting with statu
Dec 04 15:43:25 shv3 corosync[2658]: Starting Corosync Cluster Engine (corosync): [FAILE
Dec 04 15:43:25 shv3 systemd[1]: corosync.service: control process exited, code=exited s
Dec 04 15:43:25 shv3 systemd[1]: Failed to start Corosync Cluster Engine.
Dec 04 15:43:25 shv3 systemd[1]: Unit corosync.service entered failed state.
Last edited: