Proxmox Cluster / Nodes reboot

Byphone · Mar 21, 2017

Hi,

I've just encountered a problem this night. I've a cluster with 4 nodes with Proxmox 4 up to date. a HA GROUP has been created with 2 nodes. This two nodes are located in differents datacenters. This night the nodes have restarted themselves, with an interval of 3 seconds. All my HA strategy go down, and the VM(s) was stopped. Here are the logs (syslog) for the nodes just before restarting :

Nodes A1:
Mar 20 23:17:24 hostA1 kernel: [874541.408572] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.408620] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.408632] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.410499] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 kernel: [874541.410547] mport: dropped over-mtu packet: 1572 > 1500
Mar 20 23:17:24 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] FAILED TO RECEIVE
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] A new membership (10.65.1.99:732) was formed. Members left: 3 2 1
Mar 20 23:17:24 hostA1 corosync[1908]: [TOTEM ] Failed to receive the leave message. failed: 3 2 1
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] notice: members: 4/1814
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [status] notice: members: 4/1814
Mar 20 23:17:24 hostA1 corosync[1908]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Mar 20 23:17:24 hostA1 corosync[1908]: [QUORUM] Members[1]: 4
Mar 20 23:17:24 hostA1 corosync[1908]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [status] notice: node lost quorum
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] crit: received write while not quorate - trigger resync
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] crit: leaving CPG group
Mar 20 23:17:24 hostA1 pmxcfs[1814]: [dcdb] notice: start cluster connection
Mar 20 23:17:24 hostA1 pve-ha-lrm[2058]: lost lock 'ha_agent_hostA1_lock - cfs lock update failed - Device or resource busy
Mar 20 23:17:24 hostA1 pve-ha-crm[1952]: status change slave => wait_for_quorum
Mar 20 23:17:26 hostA1 pve-ha-lrm[2058]: status change active => lost_agent_lock
Mar 20 23:17:34 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:44 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:44 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:45 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:47 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:48 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:50 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:51 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:53 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:54 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:17:54 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:56 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:57 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:17:59 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:00 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:01 hostA1 CRON[6908]: (root) CMD (/usr/local/rtm/bin/rtm 42 > /dev/null 2> /dev/null)
Mar 20 23:18:02 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:03 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:04 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:18:05 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:06 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:08 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:09 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:11 hostA1 watchdog-mux[1694]: client watchdog expired - disable watchdog updates
Mar 20 23:18:11 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:12 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:14 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:14 hostA1 pvestatd[1912]: storage 'BACKUP_POC1' is not online
Mar 20 23:18:15 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:18:17 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Mar 20 23:20:27 hostA1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1807" x-info="http://www.rsyslog.com"] start

Node B1 :

Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] A new membership (10.65.1.1:736) was formed. Members joined: 3 left: 3 2 4
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Failed to receive the leave message. failed: 3 2 4
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 1
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: starting data syncronisation
Mar 20 23:17:24 hostB1 corosync[1965]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Mar 20 23:17:24 hostB1 corosync[1965]: [QUORUM] Members[2]: 3 1
Mar 20 23:17:24 hostB1 corosync[1965]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: cpg_send_message retried 1 times
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: node lost quorum
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: starting data syncronisation
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: received sync request (epoch 1/1885/00000007)
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: received sync request (epoch 1/1885/00000007)
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: received all states
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: leader is 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: synced members: 1/1885, 3/1902
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: start sending inode updates
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: sent all (0) updates
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: dfsm_deliver_queue: queue length 5
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] crit: received write while not quorate - trigger resync
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] crit: leaving CPG group
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: received all states
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: all data is up to date
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [status] notice: dfsm_deliver_queue: queue length 27
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: start cluster connection
Mar 20 23:17:24 hostB1 pve-ha-lrm[2022]: lost lock 'ha_agent_hostB1_lock - cfs lock update failed - Device or resource busy
Mar 20 23:17:24 hostB1 pve-ha-crm[2010]: status change slave => wait_for_quorum
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885
Mar 20 23:17:24 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: members: 1/1885, 3/1902
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: starting data syncronisation
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: received sync request (epoch 1/1885/0000000A)
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: received all states
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: leader is 1/1885
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: synced members: 1/1885, 3/1902
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: start sending inode updates
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: sent all (0) updates
Mar 20 23:17:25 hostB1 pmxcfs[1885]: [dcdb] notice: all data is up to date
Mar 20 23:17:27 hostB1 pve-ha-lrm[2022]: status change active => lost_agent_lock
Mar 20 23:18:01 hostB1 CRON[27196]: (root) CMD (/usr/local/rtm/bin/rtm 58 > /dev/null 2> /dev/null)
Mar 20 23:18:12 hostB1 watchdog-mux[1752]: client watchdog expired - disable watchdog updates
Mar 20 23:20:24 hostB1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1778" x-info="http://www.rsyslog.com"] start

Please Help

Best regards

t.lamprecht · Mar 21, 2017

Byphone said:
This two nodes are located in differents datacenters

This may not be optimal, PVE Clustering expects LAN performance in the network, the cluster software corosync requirements is a latency below 2 ms and normally working multicast (although unicast can be used we do not recommend it with bigger clusters as the overhead is a lot higher with it).

What the network setup between the Datacenters?

Did anything changed, it seems multicast packages got dropped suddenly, this would be a network error then.
Can you post the cluster configuration, located at '/etc/pve/corosnyc.conf'

Also a 2 and 2 configuration is not optimal, because the cluster needs the majority to be quorate and working, on a 4 node cluster the majority is 3 (as with 2 there is a tie, both could achieve it independent).
So if anything fails between the Data center both sub-clusters will loose quorum.

You may give one node of one DC more votes, but this only helps if the other DC fails.
An external arbiter would be the way to go, this should be dooable with PVE 5.X as there is support for corosyncs qdevice.

Byphone · Mar 21, 2017

Thanks for your reply,

Between the Datacenters,it's a private network prodived by the hoster. latency is normally below 2ms.
Here is the config file requested :

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: hostB1
nodeid: 1
quorum_votes: 1
ring0_addr: hostB1
}

node {
name: hostA1
nodeid: 4
quorum_votes: 1
ring0_addr: hostA1
}

node {
name: poc1
nodeid: 3
quorum_votes: 1
ring0_addr: poc1
}

node {
name: poc2
nodeid: 2
quorum_votes: 1
ring0_addr: poc2
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: BPCLOUD
config_version: 6
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 10.65.1.101
ringnumber: 0
}

}

Even if network between datacenters become slowly, why the two hosts reboot ?

Thanks for the time you spend for help me

Best regards

t.lamprecht · Mar 21, 2017

Byphone said:
Between the Datacenters,it's a private network prodived by the hoster. latency is normally below 2ms.

Ok, and multicast works also? Can you do a test with omping, on at least two nodes from different datacenters start the same command:

Code:

omping -c 10000 -i 0.001 -F -q Node1 Node2

You just need to replace NODEX-IP with the IPs from those node you want to test.

I.e. if you want to test all your nodes start this command on all of them:

Code:

omping -c 10000 -i 0.001 -F -q hostA1 hostB1 poc1 poc2

Also the following one would be interesting (needs > 10 minutes to run):

Code:

omping -c 600 -i 1 -q -q hostA1 hostB1 poc1 poc2

Byphone said:
Even if network between datacenters become slowly, why the two hosts reboot ?

The cluster, in general, cannot differentiate between a non working network and on that is really slow. But to trigger a self-fencing you would need to have a really slow network where cluster traffic was blocked for over an minute.

Byphone said:
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Retransmit List: 6 7 8 9

This indicates that there were problems but not a complete outage of the network.

Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] A new membership (10.65.1.1:736) was formed. Members joined: 3 left: 3 2 4
Mar 20 23:17:24 hostB1 corosync[1965]: [TOTEM ] Failed to receive the leave message. failed: 3 2 4

Here hostB1 lost connection to all of the other cluster members, even the one in the same datacenter.

Byphone said:
Mar 20 23:17:24 hostA1 kernel: [874541.410547] mport: dropped over-mtu packet: 1572 > 1500
[...snip...]
Mar 20 23:17:44 hostA1 corosync[1908]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.

This seems strange, like the network failed. In general all looks like a network problem here. Can you please do the omping tests from above and report the results back?

Byphone · Mar 21, 2017

The hoster is OVH. I in the wiki :
https://pve.proxmox.com/wiki/Multicast_notes#Testing_multicast

that we have to use unicast, but ovh support just tells me to add :
post-up ip route add 224.0.0.0/4 dev vmbr2 /
pre-down ip route add 224.0.0.0/4 dev vmbr2

to enable multicast.

One thing seems strange, i never setup unicast as specified in the wiki, but the cluster works for 2 months.

The omping command doesn't work, because without routes specified above mutlicast doesn't work.

Il will try to add the route this evening, to check if multicast works properly with them.

I 've just added the 2 others nodes to HA GROUP, to avoid majority problem, am i right ?

Best regards

Guillaume

Byphone · Mar 22, 2017

Here is the result for omping (omping -c 10000 -i 0.001 -F -q hostA1 hostB1 poc1 poc2):

hostA1 : unicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev = 1.602/1.680/2.371/0.045
hostA1 : multicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev = 1.613/1.687/2.394/0.046
hostB1 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.074/0.126/0.382/0.026
hostB1 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.084/0.130/0.349/0.026
poc1 : unicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev = 1.615/1.647/1.968/0.030
poc1 : multicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev = 1.638/1.669/1.977/0.027

for (omping -c 600 -i 1 -q -q hostA1 hostB1 poc1 poc2) > 10min

hostA1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 1.638/1.767/1.947/0.055
hostA1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 1.643/1.769/1.978/0.055
hostB1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.091/0.184/0.461/0.050
hostB1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.101/0.191/0.466/0.047
poc1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 1.631/1.793/1.995/0.075
poc1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 1.659/1.806/2.022/0.078

So the multicast is ok, without adding any route as OVH support team tells me. Just omping was blocked by firewall cluster rules, i added 4321 udp and omping works on every nodes.

So i seems that you can update the wiki, because multicast is natively allowed in OVH vrack.

When i look at this result, it seems good. How to avoid that this kind of crazy fencing occurs ?
Thanks for suggestion

Best regards

Q-wulf · Mar 22, 2017

Byphone said:
The hoster is OVH. I in the wiki :
https://pve.proxmox.com/wiki/Multicast_notes#Testing_multicast

that we have to use unicast, but ovh support just tells me to add :
post-up ip route add 224.0.0.0/4 dev vmbr2 /
pre-down ip route add 224.0.0.0/4 dev vmbr2

to enable multicast.

One thing seems strange, i never setup unicast as specified in the wiki, but the cluster works for 2 months.

The omping command doesn't work, because without routes specified above mutlicast doesn't work.

Il will try to add the route this evening, to check if multicast works properly with them.

[...]

Best regards

Guillaume

Are you using OVH's vrack solution ?
If so, check this from about a year ago. might be helpfull:
http://pve.proxmox.com/pipermail/pve-user/2016-April/010251.html

disclaimer: (only ever used unicast on openvpn with ovh - personal projects)

Byphone · Mar 22, 2017

Hi,

Yes i use OVH vrack, thanks for the link. But it seems that multicast works now on vrack.
The tests requested by t.lamprecht are ok.

t.lamprecht · Mar 22, 2017

Byphone said:
One thing seems strange, i never setup unicast as specified in the wiki, but the cluster works for 2 months.

Strange indeed, because if not set explicitly then corosync will use multicast, and it seems this worked also for two months by luck but now doesn't - at least not in the same way anymore.

Byphone said:
poc1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 1.659/1.806/2.022/0.078

So the multicast is ok, without adding any route as OVH support team tells me. Just omping was blocked by firewall cluster rules, i added 4321 udp and omping works on every nodes.

Yes, looks good.

Byphone said:
So i seems that you can update the wiki, because multicast is natively allowed in OVH vrack.

When i look at this result, it seems good.

Maybe clear that with OVH support beforehand, just to ensure that it wasn't just by luck/side effect and it does not work anymore if they change something in their network setup.

Byphone said:
I 've just added the 2 others nodes to HA GROUP, to avoid majority problem, am i right ?

Byphone said:
How to avoid that this kind of crazy fencing occurs ?

If you added a node in each Datacenter then you have the same problem, if you added both in one then yes, you made that one a (lets call it) "blessed" data center, which can continue to work without any of the other data center nodes.

Byphone · Mar 22, 2017

Just for to be sure.
I know that it's not a recommended solution to have a cluster with 2 nodes, for quorum purpose and expected vote.
In my case the cluster is formed by 4 nodes, but with 2 nodes in a datacenter and 2 in another datacenter. Do you confirm me that my infrastructure presents the same problem as a two nodes cluster ?

In the case of a HA GROUP with 2 nodes in a cluster of 4 nodes, even if hosts are in the same datacenter, is this problem still occurs ? (with restricted option activated )

Thanks

Search

Search

Proxmox Cluster / Nodes reboot

Byphone

Renowned Member

t.lamprecht

Proxmox Staff Member

Byphone

Renowned Member

t.lamprecht

Proxmox Staff Member

Byphone

Renowned Member

Byphone

Renowned Member

Q-wulf

Renowned Member

Byphone

Renowned Member

t.lamprecht

Proxmox Staff Member

Byphone

Renowned Member

We value your privacy