Cluster error after power lost

TCr82 · Nov 27, 2018

Hi,

I can't get my cluster to work after a power failure on both nodes (someone of my office was thinging that this machines dont need to run

). It is not a productive system. I used it only to evaluate proxmox. So I want to know how to fix it:

The Error message I get:

Code:

Nov 27 18:58:31 pve2 systemd[1]: Starting Corosync Cluster Engine...
Nov 27 18:58:31 pve2 corosync[29910]:  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Nov 27 18:58:31 pve2 corosync[29910]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Nov 27 18:58:31 pve2 corosync[29910]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Nov 27 18:58:31 pve2 corosync[29910]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Nov 27 18:58:31 pve2 corosync[29910]: warning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Nov 27 18:58:31 pve2 corosync[29910]: warning [MAIN  ] Please migrate config file to nodelist.
Nov 27 18:58:31 pve2 corosync[29910]:  [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Nov 27 18:58:31 pve2 corosync[29910]:  [MAIN  ] Please migrate config file to nodelist.
Nov 27 18:58:31 pve2 corosync[29910]: notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
Nov 27 18:58:31 pve2 corosync[29910]: notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Nov 27 18:58:31 pve2 corosync[29910]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
Nov 27 18:58:31 pve2 corosync[29910]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Nov 27 18:58:31 pve2 corosync[29910]: notice  [TOTEM ] The network interface is down.
Nov 27 18:58:31 pve2 corosync[29910]:  [TOTEM ] The network interface is down.
Nov 27 18:58:31 pve2 corosync[29910]: notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
Nov 27 18:58:31 pve2 corosync[29910]: info    [QB    ] server name: cmap
Nov 27 18:58:31 pve2 corosync[29910]: notice  [SERV  ] Service engine loaded: corosync configuration service [1]
Nov 27 18:58:31 pve2 corosync[29910]: info    [QB    ] server name: cfg
Nov 27 18:58:31 pve2 corosync[29910]: notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Nov 27 18:58:31 pve2 corosync[29910]: info    [QB    ] server name: cpg
Nov 27 18:58:31 pve2 corosync[29910]: notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
Nov 27 18:58:31 pve2 corosync[29910]: notice  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Nov 27 18:58:31 pve2 corosync[29910]: warning [WD    ] Watchdog not enabled by configuration
Nov 27 18:58:31 pve2 corosync[29910]: warning [WD    ] resource load_15min missing a recovery key.
Nov 27 18:58:31 pve2 corosync[29910]: warning [WD    ] resource memory_used missing a recovery key.
Nov 27 18:58:31 pve2 corosync[29910]: info    [WD    ] no resources configured.
Nov 27 18:58:31 pve2 corosync[29910]: notice  [SERV  ] Service engine loaded: corosync watchdog service [7]
Nov 27 18:58:31 pve2 corosync[29910]: notice  [QUORUM] Using quorum provider corosync_votequorum
Nov 27 18:58:31 pve2 corosync[29910]: crit    [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Nov 27 18:58:31 pve2 corosync[29910]: error   [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Nov 27 18:58:31 pve2 corosync[29910]: error   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.
Nov 27 18:58:31 pve2 systemd[1]: corosync.service: Main process exited, code=exited, status=20/n/a
Nov 27 18:58:31 pve2 systemd[1]: Failed to start Corosync Cluster Engine.
Nov 27 18:58:31 pve2 systemd[1]: corosync.service: Unit entered failed state.
Nov 27 18:58:31 pve2 systemd[1]: corosync.service: Failed with result 'exit-code'.

On the WIKI it is written, that is is because of miss-configured /etc/hosts. But I have checked that - they are all right. The two Nodes in that cluster are connected with a direct connection between the two nodes (cluster) and a bonded interface (of two 1GB NICs for the ceph) without Switch.

Ping is possible between the two nodes on bond0.

My config is:

Code:

root@pve1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.212.212.1
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.212.212.2
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pvCluster
  config_version: 2
  interface {
    bindnetaddr: 10.212.212.1
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Code:

root@pve2:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.212.212.1
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.212.212.2
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pvCluster
  config_version: 2
  interface {
    bindnetaddr: 10.212.212.1
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Code:

root@pve1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.33.201
        netmask 255.255.255.0
        gateway 192.168.33.254
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        up sysctl -w net.ipv4.ip_forward=1
        up iptables -t nat -A POSTROUTING -o $IFACE -j MASQUERADE
        down iptables -t nat -D POSTROUTING -o $IFACE -j MASQUERADE

auto vmbr1
iface vmbr1 inet static
        address 192.168.233.254
        netmask 255.255.255.0
        bridge_ports none
        bridge_stp off
        bridge_fd 0

iface eno2 inet static
        address 10.212.212.1
        netmask 255.255.255.0

iface eno3 inet manual

iface eno4 inet manual

auto bond0
iface bond0 inet static
        address 10.222.222.1
        netmask 255.255.255.0
        bond-slaves eno3 eno4
        bond-mode balance-rr
        bond-miimon 100

Code:

root@pve2:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.33.202
        netmask 255.255.255.0
        gateway 192.168.33.254
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        up sysctl -w net.ipv4.ip_forward=1
        up iptables -t nat -A POSTROUTING -o $IFACE -j MASQUERADE
        down iptables -t nat -D POSTROUTING -o $IFACE -j MASQUERADE

auto vmbr1
iface vmbr1 inet static
        address 192.168.233.254
        netmask 255.255.255.0
        bridge_ports none
        bridge_stp off
        bridge_fd 0

iface eno2 inet static
        address 10.212.212.2
        netmask 255.255.255.0

iface eno3 inet manual

iface eno4 inet manual

iface enp6s0f0 inet manual

iface enp6s0f1 inet manual

auto bond0
iface bond0 inet static
        address 10.222.222.2
        netmask 255.255.255.0
        bond-slaves eno3 eno4
        bond-mode balance-rr
        bond-miimon 100

TCr82 · Nov 27, 2018

Ok, all okay - I only checked the bond0 connection.
The problem was the eno2 connection did not come up.

Please close.

Search

Search

Cluster error after power lost

TCr82

New Member

TCr82

New Member

We value your privacy