proxmox ve ve 6.3 restart when GW not available

juniper

Renowned Member
Oct 21, 2013
84
0
71
Hi,

i don't know if it's an issue or my fault:

We have a 5 node Proxmox VE 6.3 cluster and when we make some update on our firewall/gw (when gateway for some time isn't available for example) , proxmox ve nodes reboots (all nodes on the same subnet obviously)

is this a normal behavior? do we need to add a ring link to our cluster?

With old version (multicast) all works fine when GW isn't available.

Thanks in advance.
 
How is your network and PVE cluster set up? Is it possible, that the network for the PVE cluster (/etc/pve/corosync.conf) is down if you reboot the gateway?

Do you have HA enabled? If you do have HA enabled and if a reboot of the GW means that the corosync network is not available for some time, it is expected behavior.

If the corosync communication from one node to the major part of the cluster is broken for about 2 minutes (IIRC), the node will fence itself (hard reset) to make sure that the HA guests on it are off for sure, so that the remaining node can safely start the HA guests without corrupting the VMs disk. If the corosync network does not work for all nodes, all nodes with HA guests will behave like that -> many nodes reboot.

If you don't use HA or if the corosync network is still functioning during a GW reboot, we will have to investigate further what is causing this.
 
How is your network and PVE cluster set up? Is it possible, that the network for the PVE cluster (/etc/pve/corosync.conf) is down if you reboot the gateway?

Do you have HA enabled? If you do have HA enabled and if a reboot of the GW means that the corosync network is not available for some time, it is expected behavior.

If the corosync communication from one node to the major part of the cluster is broken for about 2 minutes (IIRC), the node will fence itself (hard reset) to make sure that the HA guests on it are off for sure, so that the remaining node can safely start the HA guests without corrupting the VMs disk. If the corosync network does not work for all nodes, all nodes with HA guests will behave like that -> many nodes reboot.

If you don't use HA or if the corosync network is still functioning during a GW reboot, we will have to investigate further what is causing this.

No resources added to HA but we have two groups configured (an old test)

Gw and Cluster are on different switches and i think Cluster network works fine during GW reboot (but i have to test...)

Testing is difficult...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!