Proxmox network crashes during large network usages

Robert0

New Member
Apr 1, 2021
3
0
1
28
We have a Proxmox set-up of three nodes sharing some 15 VMs. The nodes and VMs are behind an actual router.

This has been happily running for a couple of years now without much trouble, until recently. I wanted to update a box from Debian 9 to 10, and during the `apt upgrade` (which would download about 1 GB) I lost my remote connection. All the other VMs and nodes also became unreachable. After about 30 minutes the whole system came back online.

I tried the upgrade a day later with the exact same result. Then a few days later I wanted to download a backup of a VM to do some local tests, which was also quickly interrupted by a lost connection, again making all the VMs and nodes unreachable.

I've attached sections of the syslog for the second and third crash (from 22-03, ~19:04 and 30-03, ~18:43). In both cases the systems had been running fine for days.

The first actual error appears to be:

Mar 30 18:42:56 bismuth corosync[1318]: error [TOTEM ] FAILED TO RECEIVE
Mar 30 18:42:56 bismuth corosync[1318]: [TOTEM ] FAILED TO RECEIVE

I'm at loss what's happening here, I know how to use Proxmox but I know little of the underlying mechanics.
Maybe this thread would be related: https://forum.proxmox.com/threads/new-cluster-totem-failed-to-receive-after-4mins.58935/, that error is in my logs too.

We're using Proxmox VE 5.2-1. (I know, embarrassingly old. With limited physical access to servers we're having a hard time upgrading.)

I would much appreciate any thoughts on what the problem could be!
 
Thank you for your reply!
What does it mean to put Corosync on a separate physical network? Should I install a network card in all my nodes and connect the secondary ethernet ports together with a switch or something?

Note that the nodes are already pretty exclusive to their own physical network now.
 
We have updated to Proxmox 6.2 (using a clean update). We replaced the SDD of one of the nodes because it gave some errors.
We hoped this would be enough, but the issue of crashing during a download still persists...

We currently lack the hardware to set up such a dedicated network.

But it's still strange, we have used this Proxmox network for over a year without any such problems.

I'll make sure to post once I know more.