SOLVED: Very strange network problem. Please help.

starob

New Member
Jun 24, 2024
11
0
1
Hi,
I have a very strange network issue which I cannot solve. I'm looking for help in this forum hoping somebody has any idea what the cause could be. I'm not a Proxmox expert nor a networking expert but I have some networking knowledge and Proxmox experience.

My problem is that I have a shell script running that pings a specified IP address and logs errors if something is wrong. I can run multiple instances of this script in the background to check multiple IP addresses. The script has 2 parameters: IP address and ping wait time. I run it with "nohup ./watch_ip.sh IP_ADDRESS WAIT_TIME &". So for example I run "nohup ./watch_ip.sh 192.168.1.1 1 &". So far so good.

Now the strange issue: When I run this script on my Proxmox 8.2.4 host with specific IP addresses (192.168.1.250, 192.168.1.204, 192.168.1.205) the log file fills up with errors. When I run the same script with the same IP addresses from a LXC container on the same host no errors are logged ! What ???
Furthermore when I use other IP addresses on the PM host (i.e. 192.168.1.1) then no error is logged. Strange, isn't it?

The problematic IP addresses belong to a Fritzbox 7490 with 2 IP cameras connected. But again running the script from a container is no problem, just from the PM host. Also running the same script on another host on the same network is OK.

Notes:
- I disabled PCIe power management with kernel parameter "pcie_port_pm=off". This didn't help to solve the issue but I still left it active.
- I changed the PM host LAN cable which didn't help.
- dmesg logs several errors that might be related but I don't understand them.

System infos below and attached:

PM Host HW: Beelink Mini PC, 12th Gen Intel Alder Lake-N100 Prozessor (bis zu 3.40GHz), EQ12 Office Mini Computer, 8GB DDR5 500GB SSD Mini Desktop PC, Dual 2.5G Ethernet/Dual HDMI/WiFi 6/WOL/Auto Power On

ethtool output:

root@pve:~# ethtool enp2s0
Settings for enp2s0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
2500baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
2500baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
MDI-X: off (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes


This is the script:

Bash:
#!/bin/bash
# check and log if a host is reachable by ping

#CONFIGURATION

#IP of host
WATCH_IP=$1
# Time to wait for response in sec
WAIT_TIME=$2
#path to logfile
LOGFILE="/var/log/watchip-$1.log"
#duration between pings
PAUSE=1
#how many failed pings before log
TESTS=1

#SCRIPT

#initialize
MISSED=0
touch $LOGFILE

while true; do
  if ! ping -c 1 -W $WAIT_TIME $WATCH_IP > /dev/null; then
    ((MISSED++))
  else
    if [ $MISSED -ge $TESTS ]; then
      echo `date` '-' $WATCH_IP "is up again." >> $LOGFILE;
    fi
    MISSED=0
  fi;
  if [ $MISSED -eq $TESTS ]; then
    echo `date` "-" $WATCH_IP "is down."  >> $LOGFILE;
  fi
  sleep $PAUSE;
done

PM host dmesg output attached.
 

Attachments

The error scenario did not point to this but at end of the day it turned out that a flaky network cable caused these issues.