[SOLVED] kleines Problem nach corosync upgrade

r4a5a88

Renowned Member
Jun 15, 2016
63
3
73
36
Hi Proxmox community

ich bin zur Zeit dabei einen Cluster vorzubereiten auf proxmox 6 zu upgraden.
Heute habe ich das Upgrade von Corosync von 2 auf 3 durchgeführt.
Eine meiner Server ist jetzt nicht mehr synchron

# pvecm status
Quorum information
------------------
Date: Mon Jun 22 09:24:21 2020
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.76f
Quorate: No

Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 1
Quorum: 4 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 IP-Address (local)

Gibt es ein Weg ohne Reboot den Server wieder in den Cluster zu bringen
ich hab es schon mit pvecm expected 1 auf dem Serveer probieert und mit neustart von corosync

auf einem der Server habe ich die Nachricht

[status] crit: cpg_send_message failed: 6
 
1592890154187.png
Auf dem Node pro-07 sieht es anders herum aus

Das ist die derzeitige Situation.
ich habe Proxmox
Virtual Environment 5.4-15
wie man sieht 7 nodes
Hatte jemand schon so etwas und weis was man tun kann ?
Ich versuche es ohne neustart zu reparieren
 
ist HA aktiv?
was sagen die logs von corosync und pve-cluster auf allen nodes ('systemctl status corosync pve-cluster' und 'journalctl --since 2020-06-21 -u pve-cluster -u corosync')?
 
Ha ist nicht aktiv. Die Logs kann ich hier nicht posten , da die Datein zu groß sind bzw die nachrichten sonst zu lang werden

Code:
â corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 08:52:57 CEST; 1 day 4h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 17484 (corosync)
    Tasks: 9 (limit: 4915)
   Memory: 5.0G
      CPU: 3h 29min 15.965s
   CGroup: /system.slice/corosync.service
           ââ17484 /usr/sbin/corosync -f

Jun 23 13:12:35 pro-04-dmed corosync[17484]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 23 13:12:40 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba2f) was formed. Members
Jun 23 13:12:46 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba33) was formed. Members
Jun 23 13:12:51 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba37) was formed. Members
Jun 23 13:12:56 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba3b) was formed. Members
Jun 23 13:13:01 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba3f) was formed. Members
Jun 23 13:13:06 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba43) was formed. Members
Jun 23 13:13:11 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba47) was formed. Members
Jun 23 13:13:17 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba4b) was formed. Members
Jun 23 13:13:22 pro-04-dmed corosync[17484]:   [TOTEM ] A new membership (2.2ba4f) was formed. Members

â pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 08:52:56 CEST; 1 day 4h ago
 Main PID: 17451 (pmxcfs)
    Tasks: 9 (limit: 4915)
   Memory: 53.7M
      CPU: 1min 21.607s
   CGroup: /system.slice/pve-cluster.service
           ââ17451 /usr/bin/pmxcfs

Jun 23 13:13:16 pro-04-dmed pmxcfs[17451]: [status] crit: cpg_send_message failed: 6
Jun 23 13:13:17 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 10
Jun 23 13:13:18 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 20
Jun 23 13:13:19 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 30
Jun 23 13:13:20 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 40
Jun 23 13:13:21 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 50
Jun 23 13:13:22 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 60
Jun 23 13:13:23 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 70
Jun 23 13:13:24 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 80
Jun 23 13:13:25 pro-04-dmed pmxcfs[17451]: [status] notice: cpg_send_message retry 90


pro-06-dmed:~# systemctl status corosync pve-cluster
â corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 14:43:39 CEST; 22h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 10603 (corosync)
    Tasks: 9 (limit: 6144)
   Memory: 2.7G
      CPU: 1h 15min 54.729s
   CGroup: /system.slice/corosync.service
           ââ10603 /usr/sbin/corosync -f

Jun 23 13:14:08 pro-06-dmed corosync[10603]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:14:08 pro-06-dmed corosync[10603]:   [QUORUM] Members[4]: 1 3 5 7
Jun 23 13:14:08 pro-06-dmed corosync[10603]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 23 13:14:13 pro-06-dmed corosync[10603]:   [TOTEM ] A new membership (1.2ba77) was formed. Members
Jun 23 13:14:13 pro-06-dmed corosync[10603]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:14:13 pro-06-dmed corosync[10603]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:14:13 pro-06-dmed corosync[10603]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:14:13 pro-06-dmed corosync[10603]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:14:13 pro-06-dmed corosync[10603]:   [QUORUM] Members[4]: 1 3 5 7
Jun 23 13:14:13 pro-06-dmed corosync[10603]:   [MAIN  ] Completed service synchronization, ready to provide service.

â pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 08:55:41 CEST; 1 day 4h ago
 Main PID: 21167 (pmxcfs)
    Tasks: 9 (limit: 6144)
   Memory: 72.6M
      CPU: 2min 50.782s
   CGroup: /system.slice/pve-cluster.service
           ââ21167 /usr/bin/pmxcfs

Jun 23 13:11:02 pro-06-dmed pmxcfs[21167]: [dcdb] notice: all data is up to date
Jun 23 13:11:02 pro-06-dmed pmxcfs[21167]: [dcdb] notice: dfsm_deliver_queue: queue length 4
Jun 23 13:11:07 pro-06-dmed pmxcfs[21167]: [status] notice: received all states
Jun 23 13:11:07 pro-06-dmed pmxcfs[21167]: [status] notice: all data is up to date
Jun 23 13:11:07 pro-06-dmed pmxcfs[21167]: [status] notice: dfsm_deliver_queue: queue length 255
Jun 23 13:11:18 pro-06-dmed pmxcfs[21167]: [status] notice: cpg_send_message retried 6 times
Jun 23 13:12:38 pro-06-dmed pmxcfs[21167]: [status] notice: cpg_send_message retry 10
Jun 23 13:12:39 pro-06-dmed pmxcfs[21167]: [status] notice: cpg_send_message retry 20
Jun 23 13:12:40 pro-06-dmed pmxcfs[21167]: [status] notice: cpg_send_message retry 30
Jun 23 13:12:40 pro-06-dmed pmxcfs[21167]: [status] notice: cpg_send_message retried 39 times

pro-07-dmed:~#  systemctl status corosync pve-cluster
â corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 16:00:39 CEST; 21h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 31442 (corosync)
    Tasks: 9 (limit: 6144)
   Memory: 5.7G
      CPU: 46min 27.871s
   CGroup: /system.slice/corosync.service
           ââ31442 /usr/sbin/corosync -f

Jun 23 13:14:18 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba7b) was formed. Members
Jun 23 13:14:24 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba7f) was formed. Members
Jun 23 13:14:29 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba83) was formed. Members
Jun 23 13:14:34 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba87) was formed. Members
Jun 23 13:14:39 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba8b) was formed. Members
Jun 23 13:14:44 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba8f) was formed. Members
Jun 23 13:14:49 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba93) was formed. Members
Jun 23 13:14:55 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba97) was formed. Members
Jun 23 13:15:00 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba9b) was formed. Members
Jun 23 13:15:05 pro-07-dmed corosync[31442]:   [TOTEM ] A new membership (2.2ba9f) was formed. Members

â pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 14:51:32 CEST; 22h ago
  Process: 1056 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
  Process: 1015 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
 Main PID: 1034 (pmxcfs)
    Tasks: 10 (limit: 6144)
   Memory: 58.9M
      CPU: 2min 43.226s
   CGroup: /system.slice/pve-cluster.service
           ââ1034 /usr/bin/pmxcfs

Jun 23 13:15:00 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retry 40
Jun 23 13:15:01 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retry 50
Jun 23 13:15:02 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retry 60
Jun 23 13:15:03 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retry 70
Jun 23 13:15:04 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retry 80
Jun 23 13:15:05 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retry 90
Jun 23 13:15:06 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retry 100
Jun 23 13:15:06 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retried 100 times
Jun 23 13:15:06 pro-07-dmed pmxcfs[1034]: [status] crit: cpg_send_message failed: 6
Jun 23 13:15:07 pro-07-dmed pmxcfs[1034]: [status] notice: cpg_send_message retry 10
 
Code:
pro-08-dmed:~# systemctl status corosync pve-cluster
â corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 15:08:36 CEST; 22h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 4596 (corosync)
    Tasks: 9 (limit: 7372)
   Memory: 2.4G
      CPU: 36min 26.281s
   CGroup: /system.slice/corosync.service
           ââ4596 /usr/sbin/corosync -f

Jun 23 13:16:17 pro-08-dmed corosync[4596]:   [TOTEM ] Failed to receive the leave message. failed: 1
Jun 23 13:16:17 pro-08-dmed corosync[4596]:   [TOTEM ] Retransmit List: 1
Jun 23 13:16:18 pro-08-dmed corosync[4596]:   [TOTEM ] A new membership (1.2bb4b) was formed. Members joined: 1 3 5 left: 1 3 5
Jun 23 13:16:18 pro-08-dmed corosync[4596]:   [TOTEM ] Failed to receive the leave message. failed: 1 3 5
Jun 23 13:16:18 pro-08-dmed corosync[4596]:   [TOTEM ] A new membership (1.2bb4f) was formed. Members joined: 1 left: 1
Jun 23 13:16:18 pro-08-dmed corosync[4596]:   [TOTEM ] Failed to receive the leave message. failed: 1
Jun 23 13:16:18 pro-08-dmed corosync[4596]:   [CPG   ] downlist left_list: 1 received
Jun 23 13:16:18 pro-08-dmed corosync[4596]:   [CPG   ] downlist left_list: 1 received
Jun 23 13:16:18 pro-08-dmed corosync[4596]:   [TOTEM ] A new membership (1.2bb57) was formed. Members joined: 1 3 left: 1 3
Jun 23 13:16:18 pro-08-dmed corosync[4596]:   [TOTEM ] Failed to receive the leave message. failed: 1 3
Jun 23 13:16:23 pro-08-dmed corosync[4596]:   [TOTEM ] A new membership (1.2bb5b) was formed. Members
Jun 23 13:16:23 pro-08-dmed corosync[4596]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:16:23 pro-08-dmed corosync[4596]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:16:23 pro-08-dmed corosync[4596]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:16:23 pro-08-dmed corosync[4596]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:16:23 pro-08-dmed corosync[4596]:   [QUORUM] Members[4]: 1 3 5 7
Jun 23 13:16:23 pro-08-dmed corosync[4596]:   [MAIN  ] Completed service synchronization, ready to provide service.

â pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 08:56:27 CEST; 1 day 4h ago
 Main PID: 895 (pmxcfs)
    Tasks: 10 (limit: 7372)
   Memory: 95.3M
      CPU: 3min 29.748s
   CGroup: /system.slice/pve-cluster.service
           ââ895 /usr/bin/pmxcfs

Jun 23 13:15:49 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retry 20
Jun 23 13:15:50 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retry 30
Jun 23 13:15:51 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retry 40
Jun 23 13:15:52 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retry 50
Jun 23 13:15:52 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retried 51 times
Jun 23 13:15:58 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retry 10
Jun 23 13:15:59 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retry 20
Jun 23 13:16:00 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retry 30
Jun 23 13:16:01 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retry 40
Jun 23 13:16:02 pro-08-dmed pmxcfs[895]: [status] notice: cpg_send_message retried 49 times

root@pro-01-dmed:~# systemctl status corosync pve-cluster
â corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-06-23 11:59:04 CEST; 1h 18min ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 28222 (corosync)
    Tasks: 9 (limit: 9830)
   Memory: 416.6M
      CPU: 3min 41.713s
   CGroup: /system.slice/corosync.service
           ââ28222 /usr/sbin/corosync -f

Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [QUORUM] Members[3]: 2 4 6
Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [TOTEM ] A new membership (2.2bc4f) was formed. Members
Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [CPG   ] downlist left_list: 0 received
Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [QUORUM] Members[3]: 2 4 6
Jun 23 13:17:15 pro-01-dmed corosync[28222]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 23 13:17:20 pro-01-dmed corosync[28222]:   [TOTEM ] A new membership (2.2bc53) was formed. Members

â pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-06-22 08:51:35 CEST; 1 day 4h ago
 Main PID: 26221 (pmxcfs)
    Tasks: 8 (limit: 9830)
   Memory: 21.6M
      CPU: 1min 55.734s
   CGroup: /system.slice/pve-cluster.service
           ââ26221 /usr/bin/pmxcfs

Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [status] notice: received sync request (epoch 2/17451/00000235)
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [status] crit: ignore sync request from wrong member 4/26221
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [status] notice: received sync request (epoch 4/26221/000001CB)
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [dcdb] notice: received all states
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [dcdb] notice: leader is 2/17451
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [dcdb] notice: synced members: 2/17451, 4/26221, 6/1034
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [dcdb] notice: all data is up to date
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [status] notice: received all states
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [status] notice: all data is up to date
Jun 23 13:17:15 pro-01-dmed pmxcfs[26221]: [status] notice: dfsm_deliver_queue: queue length 191
 
kannst du noch die corosync.conf hinzufügen?
 
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pro-01-dmed
    nodeid: 4
    quorum_votes: 1
    ring0_addr: pro-01-dmed
  }
  node {
    name: pro-03-dmed
    nodeid: 1
    quorum_votes: 1
    ring0_addr: pro-03-dmed
  }
  node {
    name: pro-04-dmed
    nodeid: 2
    quorum_votes: 1
    ring0_addr: pro-04-dmed
  }
  node {
    name: pro-05-dmed
    nodeid: 5
    quorum_votes: 1
    ring0_addr: pro-05-dmed
  }
  node {
    name: pro-06-dmed
    nodeid: 3
    quorum_votes: 1
    ring0_addr: pro-06-dmed
  }
  node {
    name: pro-07-dmed
    nodeid: 6
    quorum_votes: 1
    ring0_addr: pro-07-dmed
  }
  node {
    name: pro-08-dmed
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 129.206.229.186
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: vm-cluster-02
  config_version: 49
  interface {
    bindnetaddr: 129.206.229.185
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}
 
I suggest starting journalctl -f -u pve-cluster -u corosync > $(hostname).log on all nodes, then systemctl restart corosync and wait a few minutes, then post the generated log files.
 
was sagt corosync-cfgtool -sb auf allen nodes?
 
root@pro-01-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 4

pro-03-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 1
LINK ID 0
addr = 129.206.229.185
status = 3333313

pro-07-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 6

pro-06-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 3
LINK ID 0
addr = 129.206.229.173
status = 3333333


pro-08-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 7

root@pro-04-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 2
LINK ID 0
addr = 129.206.229.164
status = 3333333

root@pro-05-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 5
LINK ID 0
addr = 129.206.229.178
status = 3333333
 
okay, da scheint irgendwas ganz falsch abgebogen zu sein. könntest du nochmal logs von allen nodes sammeln und folgende befehle ausführen

systemctl stop corosync pve-cluster

warten bis services auf allen nodes gestoppt sind. dann node für node
systemctl start corosync pve-cluster
und nach jedem node verifizieren dass alle bereits gestarteten nodes sich wechselseitig sehen (pvecm status / corosync-cfgtool -sb / logs)
 
Die Gute Nachricht ist, 6 von 7 sind wieder im cluster . Einer Nicht. Ich hab von 3 der Server Die Logs seit heute Morgen angefügt
 

Attachments

und was sagt 'pvecm status' auf allen 7 nodes?
 
root@pro-01-dmed:~# pvecm status
Quorum information
------------------
Date: Wed Jun 24 09:43:01 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000004
Ring ID: 2.5a612
Quorate: No

Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 3
Quorum: 4 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 129.206.229.164
0x00000004 1 129.206.229.187 (local)
0x00000006 1 129.206.229.168

pro-08-dmed:~# pvecm status
Quorum information
------------------
Date: Wed Jun 24 09:47:52 2020
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000007
Ring ID: 1.5a99a
Quorate: Yes

Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 4
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 129.206.229.185
0x00000003 1 129.206.229.173
0x00000005 1 129.206.229.178
0x00000007 1 129.206.229.186 (local)

das ist von 2 nodes.
 
pro-06-dmed:~# pvecm status
Quorum information
------------------
Date: Wed Jun 24 09:49:01 2020
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000003
Ring ID: 1.5a9d2
Quorate: Yes

Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 4
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 129.206.229.185
0x00000003 1 129.206.229.173 (local)
0x00000005 1 129.206.229.178
0x00000007 1 129.206.229.186
 
okay - also jetzt ist der cluster in zwei teile zerfallen. nochmal corosync-cfgtool -sb von ALLEN nodes? wie sieht denn das netzwerk aus? hängen alle nodes direkt am selben switch?
 
pro-07-dmed:~# pvecm status
Quorum information
------------------
Date: Wed Jun 24 09:52:30 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000006
Ring ID: 2.5aaa6
Quorate: No

Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 3
Quorum: 4 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 129.206.229.164
0x00000004 1 129.206.229.187
0x00000006 1 129.206.229.168 (local)


Ich bin am überlegen zu corosync 2 zurück zu kehren bis ich das upgrade auf buster mache
 
Alle nodes sind im selben SUbnetz und hängen am selben switch. mit corosync 2 hat es noch funktioniert
 
pro-06-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 3
LINK ID 0
addr = 129.206.229.173
status = 3333333

root@pro-01-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 4

pro-03-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 1
LINK ID 0
addr = 129.206.229.185
status = 3333313

root@pro-04-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 2

root@pro-05-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 5
LINK ID 0
addr = 129.206.229.178
status = 3333333

pro-07-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 6

pro-08-dmed:~# corosync-cfgtool -sb
Printing link status.
Local node ID 7