Noob with weird network issue

crawrj

New Member
Feb 26, 2026
15
1
3
Solved thanks to bbgeek17. LACP misconfiguration on my end.

I am not new to technology, but I have never used Proxmox before. I have been setting up my first cluster. I have two Dell servers and a QDevice. Cluster is active. The weird issue I am having is that everything works fine until I restart one of the hosts. Then I can ping both hosts from my machine. I can access the web interface of each host by going to its direct management IP. But the two hosts can't ping each other. If I make a minor change, like enabling or disabling VLAN Aware on either host, and then apply it, the issue resolves and everything works again. So I feel like whatever is happening is a minor thing that I am not aware of, but it is driving me crazy. Each host has eight NICs. My network is a Cisco switch stack at the core connected to a Cisco switch in the server cabinet. Both switches have all of the VLANS Proxmox is using. The Proxmox gateway sits at the core switch. But again, everything works until a host is rebooted, and then it is fixed again after applying a simple config change. So I can't see how it could be anything other than a Proxmox setting or issue. Below is the Proxmox interface setup. Any help would be GREATLY appreciated!

Linux PVE-Host2 7.0.2-6-pve #1 SMP PREEMPT_DYNAMIC PMX 7.0.2-6 (2026-05-20T08:55Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed May 27 10:27:14 2026 from 192.168.205.51
root@PVE-Host2:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto management2
iface management2 inet manual

auto cluster2
iface cluster2 inet static
address 192.168.20.152/24

auto management1
iface management1 inet manual

auto cluster1
iface cluster1 inet static
address 192.168.20.52/24

auto vmnetwork2
iface vmnetwork2 inet manual

auto migration2
iface migration2 inet manual

auto vmnetwork1
iface vmnetwork1 inet manual

auto migration1
iface migration1 inet manual

auto bond0
iface bond0 inet manual
bond-slaves management1 management2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#Management Bond

auto bond1
iface bond1 inet manual
bond-slaves vmnetwork1 vmnetwork2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#VMNetwork Bond

auto bond2
iface bond2 inet static
address 192.168.21.52/24
bond-slaves migration1 migration2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#Migration Network1

auto vmbr0
iface vmbr0 inet static
address 192.168.205.52/24
gateway 192.168.205.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#Management Bridge

auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#VMNetwork Bridge

source /etc/network/interfaces.d/*
 
Last edited:
You have two interfaces sitting on the same subnet, its very likely that this is what is tripping you up. I would recommend avoiding this configuration when possible.
If you are curious, you should examine routes and ARP tables during the "outage".

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
The only two interfaces on the same subnet are the cluster's direct NICs, since the documentation said not to use a bond for the cluster. How would this be fixed without just using one NIC? Also, why would that only affect the two servers communicating with each other after a reboot, and be fixed after a config application?
 
To test that theory, I removed the extra cluster NIC from both hosts and rebooted a host, and it is still doing the same thing. I put it back the way it was, hit apply configuration, and everything is working again. I could make a simple change like changing the comment text, then hit apply configuration, and everything would work. Just doesn't make sense to me.
 
Last edited:
journalct -e for systemd and dmesg -T for kernel messages.
For a live view use journalct -ef and dmesg -wT.

Please also post the network configuration for PVE-Host1.
 
Last edited:
Wed May 27 12:54:51 2026] bond2: Warning: No 802.3ad response from the link partner for any adapters in the bond

I would recommend that you reduce your network complexity to bare minimum. Then start adding it back piece by piece, making sure that things are working correctly one after another.

There is some randomness in how things come up, when NICs are ready vs bridges and bonds. It seems that your basic L2 config is not consistent across all pieces.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: UdoB
Wed May 27 12:54:51 2026] bond2: Warning: No 802.3ad response from the link partner for any adapters in the bond

I would recommend that you reduce your network complexity to bare minimum. Then start adding it back piece by piece, making sure that things are working correctly one after another.

There is some randomness in how things come up, when NICs are ready vs bridges and bonds. It seems that your basic L2 config is not consistent across all pieces.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Since the cluster is up, how do I modify the settings to change the NIC? I can't find that in the GUI. If I start over with just the management NIC and build from there, the cluster will fail. Also, not sure what you mean by "There is some randomness in how things come up, when NICs are ready vs bridges and bonds. It seems that your basic L2 config is not consistent across all pieces." Both hosts and their network configurations are identical. Down to the exact NICs being used. I do appreciate the feedback, just trying to understand.
 
I stripped it down to just this, and it is still happening. I don't understand. And it is only interconnectivity between the hosts. I can access and ping both hosts from my PC, just not from host1 to host2.

auto lo
iface lo inet loopback

auto management1
iface management1 inet manual

auto management2
iface management2 inet manual

iface cluster2 inet manual

iface migration1 inet manual

iface migration2 inet manual

iface vmnetwork1 inet manual

iface vmnetwork2 inet manual

auto bond0
iface bond0 inet manual
bond-slaves management1 management2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#Management Bond

auto vmbr0
iface vmbr0 inet static
address 192.168.205.52/24
gateway 192.168.205.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#Management Bridge

source /etc/network/interfaces.d/*
 
To be on the same page, none of this is PVE specific. You are dealing with basic Linux network management.

At one point in the thread you mentioned that there is a "direct" connectivity between the clients, is there an actual direct cable?

Your log showed that while the client is configured for LACP, the switch was not replying to LACP packets. Indicating a misconfiguration.

Your stripped down configuration still includes an LACP bond with non-default hashing. There are VLANs sitting on top of it.

I mean no disrespect, but as a self-identified "noob" you are still dealing with a more than basic configuration, which, we know, was not matching between two end-points just a few posts above.

This is not a simple "change this one line" type of engagement. Besides all the complexity, you also have not shared the running configuration of your system: ip a;ip route; arp;etc

Nor do we know the state of the second server. There are many tools available today, besides a back and forth in the Proxmox-oriented forum, that could assist you with methodical identification of missing pieces in your network.

I can access and ping both hosts from my PC, just not from host1 to host2.
Very likely your bond/lacp is not configured properly. I'd recommend you to draw a diagram on the napkin and think about the cable, IPs and hashing implications.
Even easier, get rid of the LACP and start with a simple single link.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
To be on the same page, none of this is PVE specific. You are dealing with basic Linux network management.

At one point in the thread you mentioned that there is a "direct" connectivity between the clients, is there an actual direct cable?

Your log showed that while the client is configured for LACP, the switch was not replying to LACP packets. Indicating a misconfiguration.

Your stripped down configuration still includes an LACP bond with non-default hashing. There are VLANs sitting on top of it.

I mean no disrespect, but as a self-identified "noob" you are still dealing with a more than basic configuration, which, we know, was not matching between two end-points just a few posts above.

This is not a simple "change this one line" type of engagement. Besides all the complexity, you also have not shared the running configuration of your system: ip a;ip route; arp;etc

Nor do we know the state of the second server. There are many tools available today, besides a back and forth in the Proxmox-oriented forum, that could assist you with methodical identification of missing pieces in your network.


Very likely your bond/lacp is not configured properly. I'd recommend you to draw a diagram on the napkin and think about the cable, IPs and hashing implications.
Even easier, get rid of the LACP and start with a simple single link.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
First of all, I don't take offense. I know you are trying to help. I am a noob to raw Linux and have never used Proxmox. I have used VMware, Cisco, and a ton of other technologies for more than 20 years, so I'm not at all new on the technology side. But I know almost nothing about Linux and nothing about Proxmox.

"At one point in the thread you mentioned that there is a "direct" connectivity between the clients, is there an actual direct cable?" No I was talking about communicating between the hosts. From host1 I can ping host 2 until a reboot then I can't. If I do something as simple as change the comment on vrb0 and apply the configuration, it starts working.

"Your log showed that while the client is configured for LACP, the switch was not replying to LACP packets. Indicating a misconfiguration." The switch is configured the same way I would do for VMware and anything else, really, so not sure why it is different here.

"Your stripped down configuration still includes an LACP bond with non-default hashing. There are VLANs sitting on top of it." I will test removing this next.

"I mean no disrespect, but as a self-identified "noob" you are still dealing with a more than basic configuration, which, we know, was not matching between two end-points just a few posts above." The two systems are matching in every way. If I said something to imply otherwise, I misspoke. The servers are identical down to the exact NIC models in the exact same positions. The Proxmox install is identical. The /etc/network/interfaces are configured exactly the same on both systems, minus the IP addresses of course.

The biggest thing I don't understand is that 1. Why does everything work, network-wise, on my PC but not between hosts if there is a misconfiguration on the network? 2. Why does changing the comment on vrb0 and applying the configuration fix it? If it were a network issue how would that possible fix it?
 
he switch is configured the same way I would do for VMware and anything else, really, so not sure why it is different here.
Wed May 27 12:54:51 2026] bond2: Warning: No 802.3ad response from the link partner for any adapters in the bond
The system reported the above message, we have no reason to doubt this. If something changed since then - we have no evidence of it.

An LACP bond that is misconfigured between two network nodes leads to an unpredictable behavior. Trying to explain it is fruitless. The results are essentially undefined.

In VMware the best practices was NOT to use LACP, and use Multi-subnet configuration instead. If your switch is configured to cater to VMware-type installs, that further supports that the Linux log report is correct.

The two systems are matching in every way.
Two systems being the server and the network switch


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
The system reported the above message, we have no reason to doubt this. If something changed since then - we have no evidence of it.

An LACP bond that is misconfigured between two network nodes leads to an unpredictable behavior. Trying to explain it is fruitless. The results are essentially undefined.

In VMware the best practices was NOT to use LACP, and use Multi-subnet configuration instead. If your switch is configured to cater to VMware-type installs, that further supports that the Linux log report is correct.


Two systems being the server and the network switch


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
OK, maybe I am completely misremembering how I last set up VMware, since it has been about 6 years since then. My current VMware install was done before me and only has one NIC.

So, what is the proper way to use Proxmox with multiple NIC's? Can you share how you would set this up?
 
Also, I can confirm it is LACP-related. I removed all LACP on both ends, and it is working. Right now, I just have one management NIC and two cluster nics active. Rebooted host1, and it is working.

auto lo
iface lo inet loopback

auto management1
iface management1 inet manual

auto management2
iface management2 inet manual

auto cluster1
iface cluster1 inet static
address 192.168.20.51/24

auto cluster2
iface cluster2 inet static
address 192.168.20.151/24

auto migration1
iface migration1 inet manual

auto migration2
iface migration2 inet manual

iface vmnetwork1 inet manual

iface vmnetwork2 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.205.51/24
gateway 192.168.205.1
bridge-ports management1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#Management Bridge
 
OK, I think I see my mistake. I will do some additional testing tomorrow and report back. Thanks for your help!!!!!!
 
I couldn't wait until tomorrow :). It is working now. Stupid mistake on my end. Thank you so much for getting me in the right place. I probably would have never gotten it. Much appreciated.
 
  • Like
Reactions: bbgeek17