Hi All,
I'm seeking assistance with a odd networking issue on version 8.2 (non-production repos). We are currently operating on kernel 6.5, despite the availability of kernel 8.6, due to specific compatibility and stability requirements. Our setup involves using Linux bridges for LAN and storage networking, and we've encountered an unusual communication challenge. This started after the cluster was down for about 24 hours and the machines turned off; all was fine before. Note: I don't use VLAN's or any special routing, these are flat networks, and I've made no changes to routes. Being that much of my storage uses NFS across nodes, this brings down most of my environment. On the network that's having issues, there's no switch between the nodes; it's just a 40 gig direct fiber interconnect. The issue is happening on both LXC's and VM's.
**Environment:**
- **Kernel Version:** Anchored at 6.5 for stability and compatibility reasons, despite newer versions being available.
- **Networking Setup:** Utilizes Linux bridges for managing LAN and storage networking.
- **Bridges Configured:**
- **vmbr0:** Handles all LAN communications, with seamless communication between hosts and guests.
- **vmbr1:** Dedicated to the storage network.
- **A migration network remains separate and is not bridged, as it's unnecessary for guest communication.
**Issue:**
- Hosts communicate across all networks without issues.
- Guests on separate hosts can only communicate over the LAN network.
- Guests on the same host can communicate across all networks between themselves and the host they're running on.
- On the storage network, communication between a guest, the other host's adapter, or any guest adapters fails.
**Troubleshooting Steps Taken:**
- Several reboots.
- Networking configurations have been reviewed and appear correct.
- Firewalls are disabled at all levels; stopping the firewall service on both hosts did not resolve the issue, I only stopped proxmox-firewall, assuming that would be enough.
- Attempted to rectify the issue by recreating vmbr1, with no change in behavior.
The problem seems to potentially involve routing or network isolation, despite routes being correctly configured. This issue emerged after the system was temporarily down, with no other changes made.
**Seeking Insights On:**
- Potential configuration oversights with Linux bridges that might lead to such issues.
- Specific routing issues that might not be immediately apparent.
- Experiences with similar issues and potential resolutions.
**Seeking Advice on Logs:**
To aid in diagnosing the issue, I am open to providing logs that might shed light on the situation. I am considering sharing excerpts from syslog, dmesg, network service logs, firewall logs, etc. However, I am unsure which would be most relevant to this specific networking challenge. Suggestions on which logs might offer the most insight would be greatly appreciated.
Thank you in advance for your help and insights!
Best,
Keith
I'm seeking assistance with a odd networking issue on version 8.2 (non-production repos). We are currently operating on kernel 6.5, despite the availability of kernel 8.6, due to specific compatibility and stability requirements. Our setup involves using Linux bridges for LAN and storage networking, and we've encountered an unusual communication challenge. This started after the cluster was down for about 24 hours and the machines turned off; all was fine before. Note: I don't use VLAN's or any special routing, these are flat networks, and I've made no changes to routes. Being that much of my storage uses NFS across nodes, this brings down most of my environment. On the network that's having issues, there's no switch between the nodes; it's just a 40 gig direct fiber interconnect. The issue is happening on both LXC's and VM's.
**Environment:**
- **Kernel Version:** Anchored at 6.5 for stability and compatibility reasons, despite newer versions being available.
- **Networking Setup:** Utilizes Linux bridges for managing LAN and storage networking.
- **Bridges Configured:**
- **vmbr0:** Handles all LAN communications, with seamless communication between hosts and guests.
- **vmbr1:** Dedicated to the storage network.
- **A migration network remains separate and is not bridged, as it's unnecessary for guest communication.
**Issue:**
- Hosts communicate across all networks without issues.
- Guests on separate hosts can only communicate over the LAN network.
- Guests on the same host can communicate across all networks between themselves and the host they're running on.
- On the storage network, communication between a guest, the other host's adapter, or any guest adapters fails.
**Troubleshooting Steps Taken:**
- Several reboots.
- Networking configurations have been reviewed and appear correct.
- Firewalls are disabled at all levels; stopping the firewall service on both hosts did not resolve the issue, I only stopped proxmox-firewall, assuming that would be enough.
- Attempted to rectify the issue by recreating vmbr1, with no change in behavior.
The problem seems to potentially involve routing or network isolation, despite routes being correctly configured. This issue emerged after the system was temporarily down, with no other changes made.
**Seeking Insights On:**
- Potential configuration oversights with Linux bridges that might lead to such issues.
- Specific routing issues that might not be immediately apparent.
- Experiences with similar issues and potential resolutions.
**Seeking Advice on Logs:**
To aid in diagnosing the issue, I am open to providing logs that might shed light on the situation. I am considering sharing excerpts from syslog, dmesg, network service logs, firewall logs, etc. However, I am unsure which would be most relevant to this specific networking challenge. Suggestions on which logs might offer the most insight would be greatly appreciated.
Thank you in advance for your help and insights!
Best,
Keith
Last edited: