Hosts randomly isolate himself from network

satiel

Member
Jan 13, 2016
11
0
21
39
Hi,
i'm using Proxmox 4.2.6-1-pve and i think i have a huge problem.
I have 3 nodes configured as cluster, sometimes under heavy load like a VM Cloning i lose one node.
Basically the VMs inside it are responding but they're not accepting new connection, for example if i am on a RDP session i can still use it but if i wasn't connected then it's impossible to connect.

On the host side it's impossible to ssh or ping it, if i connect to it physically inside i can't ping anything.
The only way i found to resolve it is reboot the host.

The logs are not showing anything significant, nothing is shown until the host become isolated and all network-related services start to fail (ceph nfs and others...)

What can it be?
I'm very desperate!

Thank you
 
I disagree, the problem happen on any of the three hosts, no kernel panic just network isolation.
 
I've seen similar problems which turns out to be a memory issue, also without anything in the logs (simply upgrade the system with more memory resolved the issue in my case). Did you ever run a "free -m" when it occurs? However, whenever it's a memory issue or not, I'm pretty sure it's hardware related.
 
Actually I didn't, the memory free now it's very low, less than 1G but i think it's normal because there's the cache.
On proxmox UI the actual memory usage is below 50%