Hello community.
I'm haunted by a strange networking problem which at least I could track down to my Proxmox machine. I'm running PVE 8.2.7 on a Lenovo M710s with several lxc's including Nextcloud, jellyfin, some of the arr-apps and sabnzbd. All but Nextcloud were setup via the Helper-Scripts. At first it worked great. After some days my internet-connection suddenly dropped out. I couldn't even ping my router. The whole network became unresponsive. After a few days of fiddling around (I'm not exactly an linux expert) I found that when I rebooted my pve-node the problems all of the sudden disappeared.
The network then was running fine for a little while (talking an hour or so...) an then dropped out again. Searching the forum I found out that I was running a Realtek-NIC in that machine. Tried the Realtek-fix to no avail. Then I decided to purchase an Intel-based NIC. Also to no avail...
Until I noticed that my TrueNAS machine fired some alerts when the error occured. Something like this:
Different ports and IP addresses all the time but the listed addresses all where from my lxc's. So I ran
As far as I can tell, the affected addresses all have connection to a NFS share, the TrueNAS a server, the containers as clients. This let's suspect that the problem might be connected to nfs-mounts or the service on the NAS, yet the problem does not resolve when restarting the NAS. Only way to get some relief is to reboot the pve-node altogether. Shuting down the containers does not help. Which brings me back to Proxmox. I have no idea how to start troubleshoot the problem...
Please help! Thanks
I'm haunted by a strange networking problem which at least I could track down to my Proxmox machine. I'm running PVE 8.2.7 on a Lenovo M710s with several lxc's including Nextcloud, jellyfin, some of the arr-apps and sabnzbd. All but Nextcloud were setup via the Helper-Scripts. At first it worked great. After some days my internet-connection suddenly dropped out. I couldn't even ping my router. The whole network became unresponsive. After a few days of fiddling around (I'm not exactly an linux expert) I found that when I rebooted my pve-node the problems all of the sudden disappeared.
The network then was running fine for a little while (talking an hour or so...) an then dropped out again. Searching the forum I found out that I was running a Realtek-NIC in that machine. Tried the Realtek-fix to no avail. Then I decided to purchase an Intel-based NIC. Also to no avail...
Until I noticed that my TrueNAS machine fired some alerts when the error occured. Something like this:
Code:
28804 SSH login failures in the last 24 hours: ... first 28800 messages skipped ... 27 Sep 21:20:16: Failed password for root from 192.168.2.213 port 37432 ssh2 27 Sep 21:20:16: Failed password for root from 192.168.2.213 port 37444 ssh2 27 Sep 21:20:16: Failed password for root from 192.168.2.213 port 37450 ssh2 27 Sep 21:20:16: Failed password for root from 192.168.2.213 port 39844 ssh2
2024-09-27 21:20:18 (Europe/Berlin)
Different ports and IP addresses all the time but the listed addresses all where from my lxc's. So I ran
journalctl -f
on my containers and sure enough, some of them as soon as the problem happened showed something like this:
Code:
Sep 27 21:00:59 sabnzbd sshd[556]: error: kex_exchange_identification: client sent invalid protocol identifier "GET / HTTP/1.1"
Sep 27 21:00:59 sabnzbd sshd[556]: error: send_error: write: Broken pipe
Sep 27 21:00:59 sabnzbd sshd[556]: banner exchange: Connection from 192.168.2.210 port 42464: invalid format
Sep 27 21:00:59 sabnzbd sshd[557]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.2.210 user=root
As far as I can tell, the affected addresses all have connection to a NFS share, the TrueNAS a server, the containers as clients. This let's suspect that the problem might be connected to nfs-mounts or the service on the NAS, yet the problem does not resolve when restarting the NAS. Only way to get some relief is to reboot the pve-node altogether. Shuting down the containers does not help. Which brings me back to Proxmox. I have no idea how to start troubleshoot the problem...
Please help! Thanks