Network problem on cluster-node if I start a vm or ct

mjg

New Member
Jul 24, 2021
8
1
3
44
Hello all,

I have a 6-node cluster with proxmox 6.4-13.
All 5 nodes run perfectly. But the newest node has network problems if I start a vm or ct with network.
In the syslog of the node I get the message "cfs-lock 'file-replication_cfg' error: no quorum!" if the network is hanging.
Then the node and the vm or ct are not accessible over the network for 3-4 minutes. After that both are accessible again over network also for 3-4 minutes. And so on...
If I don't start any vm or ct the node is always accessible over the network. Also if I start a vm without network.

Any ideas?

Thank you for helping
Martin
 
HI, thank you for your answer.
I don't think so. This problem exist also if I start the vm with any live iso image without any network configuration. And dhcp is not running in this network.
 
Could you share /etc/network/interfaces of your cluster? At best of a working and the problematic node.
 
yes of course.
The working node:
auto lo
iface lo inet loopback

iface enp3s0 inet manual

auto vmbr0
iface vmbr0 inet static
address 10.14.33.162/20
gateway 10.14.47.254
bridge_ports enp3s0
bridge_stp off
bridge_fd 0

iface enp4s0 inet manual

The problem node:
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
address 10.14.33.166/20
gateway 10.14.47.254
bridge_ports eno1
bridge_stp off
bridge_fd 0

iface eno2 inet manual
 
Then I would consider this an expectable side effect. Latency on the sixth node it not low enough after firing up a guest with network.
You should definitely define a separate Corosync network on a separate interface.
 
Last edited:
ok, thank you for your help. Than I will define a dedicated network. I will give an answer if this is fixing the problem.
 
Hi,
I have now a dedicated corosync network. The error message "cfs-lock 'file-replication_cfg' error: no quorum!" is gone. But the problem with the network connection is still exists.
I have seen that the interface eno1 goes down periodically. But this problem exist only if I start a vm on this machine.

Do you have any ideas why this machine makes this trouble?

Best
 
I'm more the logical network guy, not the physical one. Do you have any hint in the kernel logs or dmesg?
What is the output of the networking service at the time of an outage?
 
I have thinking about this periodical link down.
And I think this problem comes from our network guys. The switch detects a second mac address and cut the link after a while it comes back and so on. I will ask our network guys tomorrow.
Thank you for your help.
It was a good time for the dedicated corosync network.
Thank you very much.
Best
 
  • Like
Reactions: ph0x
Only for information.
This was exactly the problem. The switch detects multiple mac addresses and goes down.
Thank you
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!