Quorum 2 actively blocked

em.tie · Jun 13, 2018

Hallo Allerseits,

ich betreibe einen kleinen Proxmox Cluster bestehende aus 2 nodes vhost01 und vhost02. Um der Problematik bezüglich splitbrain / quorum aus dem Weg zu gehen hatte ich in der Vergangenheit einen raspberry pi als qdevice eingebunden. Dies lief auch sehr gut, bis der rasbperry bzw. die SD Karte die Grätsche gemacht hat... Ich habe hier im Wiki dann die varianten mit corosync auf dem PI gesehen und möchte gerne auf diese umstellen. Dazu habe ich das qdevice aus der corosync.conf entfernt. Ich befürchte dabei was falsch gemacht zu haben, denn seither habe ich folgenden status auf den Nodes (pvecm status)

Code:

root@vhost02:~# pvecm status
Quorum information
------------------
Date:             Wed Jun 13 15:08:02 2018
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2/84544
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.1.3 (local)
root@vhost02:~#

und den gleichen status auf dem anderen node...

die corosync.conf sieht folgendermaßen aus (auf beiden Nodes gleich)

Code:

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: vhost01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: vhost01
  }

  node {
    name: vhost02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: vhost02
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: vhosts
  config_version: 3
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 192.168.1.2
    ringnumber: 0
  }

}

Dies ist das corosync log des vhost01:

Code:

Jun 13 15:15:48 vhost01 corosync[8917]:  [TOTEM ] A new membership (192.168.1.2:245440) was formed. Members
Jun 13 15:15:48 vhost01 corosync[8917]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Jun 13 15:15:48 vhost01 corosync[8917]:  [CPG   ] downlist left_list: 0 received
Jun 13 15:15:48 vhost01 corosync[8917]:  [QUORUM] Members[1]: 1
Jun 13 15:15:48 vhost01 corosync[8917]:  [MAIN  ] Completed service synchronization, ready to provide service.

dies ist das corosync log des vhost02

Code:

Jun 12 23:08:08 vhost02 corosync[4066]:  [TOTEM ] A new membership (192.168.1.3:84544) was formed. Members left: 1
Jun 12 23:08:08 vhost02 corosync[4066]: warning [CPG   ] downlist left_list: 1 received
Jun 12 23:08:08 vhost02 corosync[4066]: notice  [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jun 12 23:08:08 vhost02 corosync[4066]: notice  [QUORUM] Members[1]: 2
Jun 12 23:08:08 vhost02 corosync[4066]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Jun 12 23:08:08 vhost02 corosync[4066]:  [TOTEM ] Failed to receive the leave message. failed: 1
Jun 12 23:08:08 vhost02 corosync[4066]:  [CPG   ] downlist left_list: 1 received
Jun 12 23:08:08 vhost02 corosync[4066]:  [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jun 12 23:08:08 vhost02 corosync[4066]:  [QUORUM] Members[1]: 2
Jun 12 23:08:08 vhost02 corosync[4066]:  [MAIN  ] Completed service synchronization, ready to provide service.

Was kann ich tun, damit der Cluster wieder tickt?

Vielen Dank

em.tie

Alwin · Jun 13, 2018

em.tie said:
config_version: 3

Bei jeder Anpassung muss die Version hoch gezählt werden. Ist das auch nach dem Entfernen passiert?

em.tie · Jun 13, 2018

Hallo Alwin

erst nicht, dann ja... (hatte zwischenzeitlich dummerweise die Maschinen durchgestartet) haben dann die Versionsnummer hochgezählt). Das habe ich dann in /etc/corosync und in /etc/pve gemacht...

was kann ich jetzt tun?

cu emtie

Alwin · Jun 13, 2018

Ist die Version bei allen gleich? Und wurden die Dienste dann bei allen Nodes neu gestartet?

em.tie · Jun 13, 2018

Hallo Alwin,

ja, exakt gleich und die Nodes wurden mehrfach neu gestartet. Erst war dann kurz der pvecm status OK und dann nach kurzer Zeit nicht mehr :-(

cu em.tie

em.tie · Jun 14, 2018

Gibt es ggfs. eine Möglichkeit deine Node aus dem Cluster zu nehmen und wieder zu joinen, ohne die VMs auf dem einen Node zu verlieren? Wenn ja, wie geht das?

Search

Search

Quorum 2 actively blocked

em.tie

Active Member

Alwin

Proxmox Retired Staff

em.tie

Active Member

Alwin

Proxmox Retired Staff

em.tie

Active Member

em.tie

Active Member