When I swapped out the cable I ended up changing the port on the switch. This made me think the cable was the issue. But, when I tested the cable, I found it was still OK. Further step was to test the network card in the Proxmox server which also was OK. Finally I found it was the port on the switch that died. Since the switch was over 15 years old, I decided it was easier to replace the switch.
While I was debugging this issue, I did some research that I would like to share. Please find it below. I followed these steps.
===
Proxmox 8.4 Network Troubleshooting Guide (Debian 12)
When your Proxmox server loses network connectivity but boots to the splash screen, follow these systematic command-line debugging steps to isolate and resolve the issue.
Phase 1: Basic Interface Diagnostics
Check Network Interface Status
First, verify if your network interfaces are physically up and have IP addresses assigned:
Look for your primary interface (typically
enpXs0,
eth0, or similar) and bridge (
vmbr0). The interface should show UP state and have an IP address assigned.
If the interface shows as
DOWN, bring it up manually:
Bash:
sudo ip link set dev <interface_name> up
Verify Link Layer Connectivity
Check if the physical link is established:
Look for "Link detected: yes" in the output. If it shows "no," check your cable and switch port.
Phase 2: IP Configuration Verification
Examine Network Configuration File
Review your network configuration for errors:
Bash:
cat /etc/network/interfaces
Ensure your configuration includes:
- Correct interface names (verify they match
ip addr output)
- Proper static IP address and subnet (e.g.,
192.168.1.10/24)
- Correct gateway address
- Proper bridge configuration for
vmbr0
Check Cluster Configuration (If Applicable)
If this is a cluster node, verify the cluster network settings:
Bash:
cat /etc/corosync/corosync.conf
Ensure the
ring0_addr matches your current IP address.
Phase 3: Routing and Connectivity Testing
Test Local Stack
Verify the TCP/IP stack is working locally:
Bash:
ping -c 4 127.0.0.1
ping -c 4 <your_server_ip>
Check Routing Table
Examine if the default route is configured correctly:
Bash:
ip route show
# or
route -n
You should see a default route
(0.0.0.0) pointing to your gateway.
Test Gateway Connectivity
Try pinging your default gateway:
If this fails, you may have a subnet mismatch or ARP issue.
Check ARP Table
Verify ARP resolution is working:
If the gateway MAC address shows
(incomplete), there's a Layer 2 connectivity problem.
Phase 4: DNS and Name Resolution
Check DNS Configuration
Verify your DNS settings:
Ensure you have valid nameservers listed. For testing, you can temporarily add:
Bash:
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
Test DNS Resolution
Try pinging an external host by IP first, then by name:
Bash:
ping -c 4 8.8.8.8
ping -c 4 google.com
Phase 5: Firewall and Security
Check iptables Rules
Examine firewall rules that might be blocking traffic:
Bash:
sudo iptables -L -v -n
sudo iptables -t nat -L -v -n
Look for rules with
DROP policies or invalid connection tracking.
Check Proxmox Firewall Status
If Proxmox firewall is enabled, check its rules:
Bash:
sudo iptables -L PVEFW-FORWARD -v -n
A common issue is RST packets being dropped. If you see
ctstate INVALID drops, you may need to adjust firewall settings.
Temporarily Disable Firewall (For Testing)
If you suspect firewall issues, disable it temporarily:
Bash:
sudo systemctl stop pve-firewall
# or flush all rules
sudo iptables -F
sudo iptables -t nat -F
Important: Re-enable the firewall after testing.
Phase 6: Network Services and Configuration
Restart Networking Service
Apply configuration changes and restart networking:
Bash:
sudo systemctl restart networking
# or use ifreload for live changes
sudo ifreload -a
Check Network Service Status
Verify the networking service is active:
Bash:
sudo systemctl status networking
If it shows failed, check the logs:
Bash:
sudo journalctl -u networking
Check NetworkManager Interference
Ensure NetworkManager isn't interfering with Proxmox networking:
Bash:
sudo systemctl status NetworkManager
sudo systemctl stop NetworkManager
sudo systemctl disable NetworkManager
Phase 7: Hardware and Driver Issues
Check Network Device Recognition
Verify your NIC is detected:
Check Driver Loading
Look for driver-related messages:
Bash:
dmesg | grep -i network
dmesg | grep -i ethernet
Search for firmware loading failures or driver errors.
Check for Missing Firmware
If dmesg shows firmware errors, install missing firmware:
Bash:
sudo apt update
sudo apt install linux-firmware
Phase 8: Proxmox-Specific Issues
Check Bridge Configuration
Verify your bridge is properly configured:
Ensure
vmbr0 shows your physical interface as a port.
Check Proxmox Services
Restart Proxmox services:
Bash:
sudo systemctl restart pve-cluster
sudo systemctl restart pveproxy
sudo systemctl restart pvedaemon
Check Hosts File
Ensure /etc/hosts is correctly configured:
The file should contain your correct IP and hostname.
Phase 9: Recovery Steps
If the above steps don't resolve the issue:
Reconfigure Network Interfaces
Edit the interfaces file carefully:
Bash:
sudo nano /etc/network/interfaces
Make necessary corrections, then:
Bash:
sudo ifdown <interface_name> && sudo ifup <interface_name>
Reboot and Check Logs
If issues persist, reboot and monitor boot logs:
Bash:
sudo reboot
# After reboot, check:
sudo journalctl -b | grep -i network
Check for Recent Updates
If the problem started after an update, check for known issues:
Bash:
cat /var/log/apt/history.log
Quick Troubleshooting Checklist
- Physical Layer: Cable, switch port, link lights
- Interface Status:
ip link shows UP?
- IP Address:
ip addr shows correct IP?
- Routing:
ip route shows default gateway?
- Gateway Ping: Can ping gateway?
- DNS: /etc/resolv.conf has valid servers?
- Firewall: iptables rules blocking traffic?
- Services: networking service running?
- Hardware: NIC recognized and driver loaded?
- Logs: dmesg or journalctl shows errors?
Follow these steps methodically to isolate whether the issue is physical, configuration-related, or service-related.