Proxmox sometimes restarts

3r1c

Member
Jul 15, 2021
6
0
6
24
Hello

I have two Proxmox 6.4-10 servers running ZFS with 18GB of system memory.
The virtual machines are synchronized between the two servers.
Memory usage does not usually go above 14GB.

The problem is that sometimes both servers restart at the same time. In this case, the CPU will jump to 45-50% and then restart. Memory is stagnant.

I wonder if you have experienced this before? If so, how was it solved?

Thanks in advance.
 
Do you have HA enabled? If so, it is possible that for some reason, Corosync (used for the Proxmox VE Cluster) has problems to keep a stable communication between the nodes. Once they lose contact long enough, they will fence themselves to make sure that the HA guests are off in order to be started on one of the remaining nodes in the cluster. In your situation, with only 2 nodes, that means that there is no node remaining to still form a quorum (majority).
 
Do you have HA enabled? If so, it is possible that for some reason, Corosync (used for the Proxmox VE Cluster) has problems to keep a stable communication between the nodes. Once they lose contact long enough, they will fence themselves to make sure that the HA guests are off in order to be started on one of the remaining nodes in the cluster. In your situation, with only 2 nodes, that means that there is no node remaining to still form a quorum (majority).
Hi,

Yes, HA is enabled, i use two node.
 
Hi,

Yes, HA is enabled, i use two node.
Do you have a QDevice set up to have a 3rd vote in the cluster?
Do you have a dedicated network between the nodes, just for the Proxmox VE cluster traffic (corosync)? If not and if that network is used for other things like storage, backups etc, any service that might take up all the bandwidth, it is likely that the latency for the Corosync packets goes up too much and it will consider the link as unusable. If that situation persists for too long (1 or 2 min) and there is no other link for Corosync to switch to, the HA stack will kick and the node fences itself (hard reset) in to make sure that the HA guests are off, so they can be started on the remaining nodes.

Since you only have 2 nodes, they both will not be part of any majority in the cluster anymore and therefore will fence themselves. Ideally, you have a dedicated network for corosync alone so that no other services will use up all the bandwidth. Secondly, and I highly recommend that as it also helps if you need to shut down a node for maintenance, is to add a QDevice to the cluster so that you have 3 votes. If a node is down, the other still gets the vote from the QDevice -> majority is reached -> no fencing.
 
There is currently no third Qdevice set up in the cluster. Unfortunately for financial reasons, I have a test environment where I go with these settings and I don't experience any relapses. The corosinc is configured on a separate network interface directly between the two servers in active-backup. That way no one else will use it.

Can Hard restet be turned off somehow?

Thanks in advance.
 
Can Hard restet be turned off somehow?
Don't use the integrated HA stack. That is the only option in that situation. Should a node fail, and you do have a recent copy of the disk on the other node (replication), you could manually move the VM over to the remaining node by moving the config file:

Code:
mv /etc/pve/nodes/<not working node>/qemu-server/<VMID>.conf /etc/pve/nodes/<still working node>/qemu-server

If you want to manually move an LXC container, use lxc instead of qemu-server.
 
Don't use the integrated HA stack. That is the only option in that situation. Should a node fail, and you do have a recent copy of the disk on the other node (replication), you could manually move the VM over to the remaining node by moving the config file:

Code:
mv /etc/pve/nodes/<not working node>/qemu-server/<VMID>.conf /etc/pve/nodes/<still working node>/qemu-server

If you want to manually move an LXC container, use lxc instead of qemu-server.

Thank you very mutch Aaron.
 
You will also have to play around with expected votes and such (search for it) as otherwise, such actions won't work if the remaining node does not have the majority of the votes. Therefore, a Raspberry Pi or something like it to run the QDevice on is something worth considering.
 
You will also have to play around with expected votes and such (search for it) as otherwise, such actions won't work if the remaining node does not have the majority of the votes. Therefore, a Raspberry Pi or something like it to run the QDevice on is something worth considering.
Good idea. Can proxmox be installed on rpi? Or is it just worth installing corosinc on it?
 
Good idea. Can proxmox be installed on rpi? Or is it just worth installing corosinc on it?
Do you think it is logically correct to create a 1-1 VM on the two physical proxmoxes and install proxmoxes on them and log them into the HA, so if one of the physical servers fails, you still have the majority.
 
Good idea. Can proxmox be installed on rpi? Or is it just worth installing corosinc on it?
Take a look at the documentation: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support All you need on the external machine, is Linux and the corosync-qnetd package and root SSH access (initially with password login to it all up).


Do you think it is logically correct to create a 1-1 VM on the two physical proxmoxes and install proxmoxes on them and log them into the HA, so if one of the physical servers fails, you still have the majority.
I don't really understand what you mean with `1-1 VM`, but all those concepts usually have an issue if any one of the nodes is allowed to be down. An external vote is the safe way to do it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!