[SOLVED] Proxmox nodes goes red

itvietnam

Renowned Member
Aug 11, 2015
132
4
83
Hi,

My cluster running ok, after adding CEPH storage (external) it goes red. May i know how to debug?

upload_2018-2-1_16-55-57.png

My proxmox version:

Code:
root@cp101:~# pveversion -v
proxmox-ve: 5.0-19 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-19
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-3
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90
openvswitch-switch: 2.6.2~pre+git20161223-3
root@cp101:~#

Running update and no luck:

Code:
root@cp101:~# apt-get update && apt-get dist-upgrade
Get:1 http://security.debian.org stretch/updates InRelease [63.0 kB]
Ign:2 http://ftp.debian.org/debian stretch InRelease     
Hit:3 http://ftp.debian.org/debian stretch Release       
Get:5 http://security.debian.org stretch/updates/main amd64 Packages [269 kB]
Get:6 http://security.debian.org stretch/updates/contrib amd64 Packages [1,352 B]
Fetched 333 kB in 2s (134 kB/s)     
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@cp101:~#
 
After upgrade new error come: question mark before all nodes.

I logged and figure out both nodes can not ping to each other. When remove 1 slave in bond NIC and reboot. Now they can ping but problem still there:

upload_2018-2-2_1-17-14.png

Code:
root@cp102:~# pvecm s
Quorum information
------------------
Date:             Fri Feb  2 01:13:26 2018
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1/120
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
   Nodeid      Votes Name
0x00000001          1 10.10.30.1
0x00000002          1 10.10.30.2 (local)
root@cp102:~# pvecm nodes

Membership information
----------------------
   Nodeid      Votes Name
        1          1 10.10.30.1
        2          1 10.10.30.2 (local)
root@cp102:~#
 
I figure out problem caused by MTU 9000, after set switch port to MTU 9000 problem solved now.