[SOLVED] Previously working Proxmox host largely lost network access--dropping pings out, unable to ping, etc

Rktd

New Member
Dec 30, 2018
5
0
1
43
I've scoured the forums, but haven't had any luck finding a solution that works. Let me first try and explain the problem.

I'm currently running proxmox 5.2.2. Recently the host seemed to just drop off and I couldn't connect to it any longer over the network (currently using IPMI to do so). I don't think it coincided with any upgrade, but it's now been a couple weeks and I don't remember the exact time it stopped working. I've tried booting up older kernel versions but that hasn't made a difference.

I thought it might be hardware and ended up switching to a new server (same hardware, just swapped the hard drives) but that changed nothing. Switch ports and ethernet cables don't seem to be an issue. Everything else on the network works fine, but I'm unable to ping the host (gives a "Request timeout") and have some odd ping behavior when pinging from the host to other boxes.

The only thing I changed in the /etc/network/interfaces files was that I noticed that debian seemed to change the naming convention for the ethernet ports from eth0 and eth1 to enp1s0f0 and enp1s0f1 which I've changed.

Code:
pveversion -v
Screen Shot 2018-12-30 at 1.07.17 am.png
Screen Shot 2018-12-30 at 1.07.52 am.png

Code:
ifconfig
Screen Shot 2018-12-30 at 1.37.09 am.png
Screen Shot 2018-12-30 at 1.37.43 am.png

Code:
/etc/network/interfaces
Screen Shot 2018-12-30 at 1.14.46 am.png

Code:
route -n
Screen Shot 2018-12-30 at 1.26.27 am.png

Ping to the router looks as follows, consistently with ~90% packet loss
Screen Shot 2018-12-30 at 12.51.59 am.png

Iptables is empty

There aren't any other devices using that ip address, and the same behavior occurs if I change the ip address.

Any ideas or help would be greatly appreciated.
 
Did you already check the arp table?

But normally debian wouldn't change the naming if the interfaces automatically. What do you changed in the past time? Did you check the udev rules, if they are okay?
 
I'll be honest, I'm not sure what I'm looking for in the arp table, same with the udev rules, but I've included screenshots (although I do note that the MAC address in the udev rule doesn't match the actual MAC address, I'm assuming because the hardware has changed. Not sure if that makes a difference).

In terms of changing things, I didn't change anything (that I am aware of). I changed the ethernet devices from eth0 to enp1s0f0 after things stopped working, but that has neither made it worse nor better.

Code:
arp -a
Screen Shot 2018-12-30 at 1.03.49 pm.png

Code:
cat /etc/udev/rules.d/70-persistent-net.rules
Screen Shot 2018-12-30 at 1.05.31 pm.png
 
In the ARP table you will see where the packets are send too. Can you confirm the MAC addresses are correct and matching to the connected interfaces?

Let the udev rules recreate. There might be the reason for network problems.
 
sorry, I'm not that familiar with ARP. Any command in particular to get that info?

Also, I've deleted the udev rules file, but it doesn't recreate on reboot and I can't get it to reboot--I've tried multiple commands, the one I think should be working and that shows the interfaces when -v is added is
Code:
udevadm trigger --subsystem-match=net --action=add

Manually editing /etc/udev/rules.d/70-persistent-net.rules doesn't seem to make any difference either
 
The symptom of having a few pings work and then a few timeout do point in the following possible directions:
* duplicate IP on the network (try sniffing with tcpdump)
* duplicate MAC on the network (unlikely since you said you changed the server-hardware)
* problems with the NIC - check `dmesg` and `journalctl -r` for potential issues
* depending on the switch - a look at it's logs or forwarding-db/mac-table might help

as a sidenote - you could try the iproute2 commands (ip link, ip addr, ip neigh, ip route) instead of the netutils - they provide the information in a more readable way IMHO
 
The symptom of having a few pings work and then a few timeout do point in the following possible directions:
* duplicate IP on the network (try sniffing with tcpdump)
* duplicate MAC on the network (unlikely since you said you changed the server-hardware)
* problems with the NIC - check `dmesg` and `journalctl -r` for potential issues
* depending on the switch - a look at it's logs or forwarding-db/mac-table might help

as a sidenote - you could try the iproute2 commands (ip link, ip addr, ip neigh, ip route) instead of the netutils - they provide the information in a more readable way IMHO

Thanks. I agree they do seem a bit more friendly to the eye.

I’ve managed to locate the problem, though I’m still not entirely sure who is at fault so to speak.

The server was plugged into an HP Procurve J9028B 1800-24g switch.

When plugged directly into the router networking started working again. So I had an old DD-WRT router lying around which I turned into a switch, plugged the Proxmox box into it and plugged that into the Procurve and again things worked as normal. I also tested the previous box (same hardware--Supermicro X8DTU-F) with Ubuntu Server 18.04, same behavior. I haven't had the time to try an OS without systemd, but given that Proxmox worked previously (it was installed with a pre systemd version) I suspect the culprit lies somewhere with the drivers, or netplan/systemd and the HP Procurve. The switch firmware or settings hasn't changed.

Anyway, I'm glad to have figured it out and will update with any more specific findings.
 
Glad you resolved your issue! Please mark the Thread as SOLVED, so that others know what to expect. Thanks!

hm - maybe the switch needs a reboot? (have had that experience every now and then with HP Procurves...)
You could also check the switch logs - link flapping etc. usually show up there.
 
Thanks. I had already tried rebooting the Procurve switch without result. I'll look into the logs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!