"Network not reachable"

otter90 · Jun 16, 2021

Dear community,
i have a strange network issue with containers and no solution so far. In my proxmox installation i have various VMs and containers for different purposes (iobroker, zoneminder, fhem, weewx, CupsPrint, Raspberrymatic...). For two of my containers (used for iobroker and zoneminder, both debian) the network connection breaks down after exactly 10 days after reboot and the containers are not reachable any more via network. They are up and running and I can access them via proxmox GUI/console. Every network related command (e.g. ping xyz) within the console is being responded with the error message "Network not reachable". When I stop and reboot the containers, everything works fine again, for exactly 10 days...
Of course I can post some more specifications of the containers if required for analysis.
Would be glad if anyone had an idea.
Thanks + Regards
otter90

[EDIT] I just made an additional observation. If I do a reboot before the network breaks down, e.g. 5 days after the last failure, then the next breakdown still will occur after the 10 days of the first reboot (and not 10 days after the second). Which means that only a reboot immediately after a breakdown will grant the new 10 day uptime period. Example:
[01.01.] network breakdown, manual reboot
[05.01] manual reboot
[11.01.] network breakdown, manual reboot
[21.01.] network breakdown, manual reboot
and so on ...

Dominic · Jun 30, 2021

Is there anything relevant in the syslog from the day when the network breaks down?

otter90 · Jul 6, 2021

Thanks for asking. Hope this is the right part of the logs. At least it says something about network, at the point in time of the last network crash:
Jul 6 08:27:31 IOBrokerLXC ntpd[201]: Deleting interface #3 eth0, 192.168.200.93#123, interface stats: received=9284, sent=9284, dropped=0, active_time=863703 secs
Jul 6 08:27:31 IOBrokerLXC ntpd[201]: 192.168.200.1 local addr 192.168.200.93 -> <null>
Jul 6 08:27:34 IOBrokerLXC bash[193]: Error: connect ENETUNREACH 192.168.200.21:8999 - Local (0.0.0.0:0)
Jul 6 08:27:34 IOBrokerLXC bash[193]: at internalConnect (net.js:923:16)
Jul 6 08:27:34 IOBrokerLXC bash[193]: at defaultTriggerAsyncIdScope (internal/async_hooks.js:313:12)
Jul 6 08:27:34 IOBrokerLXC bash[193]: at net.js:1011:9
Jul 6 08:27:34 IOBrokerLXC bash[193]: at processTicksAndRejections (internal/process/task_queues.js:79:11) {
Jul 6 08:27:34 IOBrokerLXC bash[193]: errno: 'ENETUNREACH',
Jul 6 08:27:34 IOBrokerLXC bash[193]: code: 'ENETUNREACH',
Jul 6 08:27:34 IOBrokerLXC bash[193]: syscall: 'connect',
Jul 6 08:27:34 IOBrokerLXC bash[193]: address: '192.168.200.21',
Jul 6 08:27:34 IOBrokerLXC bash[193]: port: 8999
Jul 6 08:27:34 IOBrokerLXC bash[193]: }
Jul 6 08:27:34 IOBrokerLXC bash[193]: Messages Error: Error: connect ENETUNREACH 192.168.200.21:8999 - Local (0.0.0.0:0)
Jul 6 08:27:34 IOBrokerLXC bash[193]: at internalConnect (net.js:923:16)
Jul 6 08:27:34 IOBrokerLXC bash[193]: at defaultTriggerAsyncIdScope (internal/async_hooks.js:313:12)
Jul 6 08:27:34 IOBrokerLXC bash[193]: at net.js:1011:9

t.lamprecht · Jul 6, 2021

otter90 said:
[EDIT] I just made an additional observation. If I do a reboot before the network breaks down, e.g. 5 days after the last failure, then the next breakdown still will occur after the 10 days of the first reboot (and not 10 days after the second). Which means that only a reboot immediately after a breakdown will grant the new 10 day uptime period. Example:

I.e., the network goes down roughly every ten days, independent of any reboot?

otter90 said:
Thanks for asking. Hope this is the right part of the logs. At least it says something about network, at the point in time of the last network crash:

Anything on the host's syslog around that time? You can use the syslog panel in the Proxmox VE's web-interface (Node -> Syslog) or the journalctl CLI tool.

Any firewall configured or the like?

otter90 · Jul 6, 2021

It's not "roughly" every 10 days, it's absolutely exactly 10 days, as if someone scheduled a countdown.
I have set a FHEM routine on another client that permanently pings this machine, so I know very exactly when it crashes.
Firewall is not configured (to my knowledge). This is the hosts log at the same time:
Jul 06 08:27:26 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:26 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:26 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:26 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:27 pve pvedaemon[1931]: got timeout
Jul 06 08:27:27 pve pvedaemon[14022]: got timeout
Jul 06 08:27:27 pve pvestatd[1019]: got timeout
Jul 06 08:27:28 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:28 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:28 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:28 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:28 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:28 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:28 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:29 pve pvestatd[1019]: status update time (5.067 seconds)
Jul 06 08:27:31 pve kernel: rpc_check_timeout: 22 callbacks suppressed
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: call_decode: 37 callbacks suppressed
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 not responding, still trying
Jul 06 08:27:31 pve kernel: nfs: server 192.168.200.91 OK
Jul 06 08:27:36 pve kernel: rpc_check_timeout: 43 callbacks suppressed

Vengance · Sep 7, 2021

Saure issue here, the VMs frequently loose their IPv6 connection and the host itself lost the IPv4 (DHCP) connection a few hours ago

Code:

ntpd[1064]: Deleting interface #3 eth0, 198.xxx.147.87#123, interface stats: received=972, sent=942, dropped=0, active_time=86396 secs

arbocomve · Nov 4, 2021

Hi @otter90 / @Vengance

Do you already know the solution to this problem? I have already a year and a half restarting my server every day and until now it was that I decided to look for the problem and I came across this post.

Vengance · Nov 4, 2021

No, but my guess is that it was related to OVHs network somehow.

"Network not reachable"

otter90

New Member

Dominic

Proxmox Retired Staff

otter90

New Member

t.lamprecht

Proxmox Staff Member

otter90

New Member

Vengance

Renowned Member

arbocomve

New Member

Vengance

Renowned Member

We value your privacy