Please help with strange network problem

gooselk

New Member
Sep 26, 2024
1
0
1
Hello community.

I'm haunted by a strange networking problem which at least I could track down to my Proxmox machine. I'm running PVE 8.2.7 on a Lenovo M710s with several lxc's including Nextcloud, jellyfin, some of the arr-apps and sabnzbd. All but Nextcloud were setup via the Helper-Scripts. At first it worked great. After some days my internet-connection suddenly dropped out. I couldn't even ping my router. The whole network became unresponsive. After a few days of fiddling around (I'm not exactly an linux expert) I found that when I rebooted my pve-node the problems all of the sudden disappeared.

The network then was running fine for a little while (talking an hour or so...) an then dropped out again. Searching the forum I found out that I was running a Realtek-NIC in that machine. Tried the Realtek-fix to no avail. Then I decided to purchase an Intel-based NIC. Also to no avail...

Until I noticed that my TrueNAS machine fired some alerts when the error occured. Something like this:
Code:
28804 SSH login failures in the last 24 hours: ... first 28800 messages skipped ... 27 Sep 21:20:16: Failed password for root from 192.168.2.213 port 37432 ssh2 27 Sep 21:20:16: Failed password for root from 192.168.2.213 port 37444 ssh2 27 Sep 21:20:16: Failed password for root from 192.168.2.213 port 37450 ssh2 27 Sep 21:20:16: Failed password for root from 192.168.2.213 port 39844 ssh2
2024-09-27 21:20:18 (Europe/Berlin)

Different ports and IP addresses all the time but the listed addresses all where from my lxc's. So I ran journalctl -f on my containers and sure enough, some of them as soon as the problem happened showed something like this:
Code:
Sep 27 21:00:59 sabnzbd sshd[556]: error: kex_exchange_identification: client sent invalid protocol identifier "GET / HTTP/1.1"
Sep 27 21:00:59 sabnzbd sshd[556]: error: send_error: write: Broken pipe
Sep 27 21:00:59 sabnzbd sshd[556]: banner exchange: Connection from 192.168.2.210 port 42464: invalid format
Sep 27 21:00:59 sabnzbd sshd[557]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.2.210  user=root

As far as I can tell, the affected addresses all have connection to a NFS share, the TrueNAS a server, the containers as clients. This let's suspect that the problem might be connected to nfs-mounts or the service on the NAS, yet the problem does not resolve when restarting the NAS. Only way to get some relief is to reboot the pve-node altogether. Shuting down the containers does not help. Which brings me back to Proxmox. I have no idea how to start troubleshoot the problem...

Please help! Thanks :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!