No network after a few minutes on a new installation

PNL

New Member
Mar 27, 2023
17
1
3
Hi,
I am trying to build a Proxmox cluster with 3 nodes. I am using the latest version 7.4.
The 3 machines are similar HP EliteDesk 800 G2 Mini with a network interface on the motherboard 32GB ram and a 240GB SSD and a 1TB M2 disk.
They are the same machines with a difference on the additional card display port for the node 1 and serial port for the node 2 and 3.
The firmware is configured the same way and it's the latest version.
On node 1 everything is ok
On node 2 and node 3 after boot the network works for a few minutes then the ping or ssh does not work anymore.
I removed the additional serial port card on node 2 and I have the same problem.
I looked for an equivalent problem without success.
dmesg /var/log/messages show no change when the network becomes unreachable.
I tested with several network cables on 2 different switches with the same problem.
I tested with only one node (2 or 3) on and node 1 off, same problem.
I don't know where to look, there is something that makes the network fall after a few minutes (3 to 4 minutes)

Thanks for your help.

Pat
 

Attachments

Is it possible that the IPs your used for 2 and 3 are already in use by something else? Add a secondary IP on unused private subnet to each host. Use arp-scan to find the dups. The logs you posted are not very helpful because they are not well labeled.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Thanks for the answer, I'm sure the IPs for 2 and 3 are not used by other hosts. Can you explain me the logs that are not well labeled? I'm new to Proxmox. What information would be relevant to help find the problem?
 
The information that would help is "ip a" output from each host when they are in good state, as well as when they transition to "bad state".
Output of "apr -an" before and after.
In bad state can node2 ping/communicate to node3 and vice versa? What about router? You may need to get on console to get the results.

Remember, PVE is a software suite that is based on Debian Linux. The networking is basic Linux networking.
If attaching logs or text output, make sure to prefix it with hostname. You can also use CODE tags in the forum. Also, you can save output to file and transfer from "unreachable" node to your workstation via USB or similar.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
officially refurbished? got guarantee on it?

i have 2x "HP ED 800 G2 i5-6500t". network/NIC is fine on both.
one of them got a mechanically broken VGA port. option for send back with refund/replacement.

if the NIC is broken, just replace it with a working unit. refurbishment comes with hidden costs sometimes.

upgraded to latest v61 bios??
 
Yes it's refurbished with guarantee.

The nic is not broken, it's work just after boot, but after a few minutes I have no SSH session and I can't ping to or from the host.

when I received the pc, it was installed with win10 and the network was working without any problem.
It's not the switch or the network cables and on the switch I don't have any alerts on the ports.

The bios : HP EliteDesk 800 G2 DM 35W/8055, BIOS N21 Ver. 02.60 12/15/2022, it's N21 but version 02.60. I did not find a more recent version.
 
Hi,

Some files to help find the problem, i not found apr command .

In bad state node2 can't ping node3 and vice versa. there is no router all node are on same switch and there is no error on the switch.

The information that would help is "ip a" output from each host when they are in good state, as well as when they transition to "bad state".
Output of "apr -an" before and after.
In bad state can node2 ping/communicate to node3 and vice versa? What about router? You may need to get on console to get the results.

Remember, PVE is a software suite that is based on Debian Linux. The networking is basic Linux networking.
If attaching logs or text output, make sure to prefix it with hostname. You can also use CODE tags in the forum. Also, you can save output to file and transfer from "unreachable" node to your workstation via USB or similar.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

Attachments

There is only so much that can be done remotely via forum when there is no obvious error or misconfiguration. If you want to solve this, you need to reduce number of variables and invest some time in researching network troubleshooting guides.

Things like arp and arp-scan can be installed with :
apt instal net-tools arp-scan

A tcpdump may also be useful (apt install tcpdump), when the connection stops working - start tcpdump on the interface to monitor for any traffic.

Remove your switch from the equation, connect any two nodes with direct cable and monitor the connectivity with ping.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi,
I saved the bios configuration of node1 (which is working) and reloaded it on node2 to be sure to be in the same configuration.
I tried with kernel 6.1 but I have the same problem.
I tried with only node2 connected to the switch and I have the same problem.
I tried with only node2 connected to the router and i have the same problem.
I tried with a debian 11.6 live CD and I have no problem with the network card. After one night the ping on the gateway is still ok.
I tried to modify the interfaces file to disable the bridge vmbr0 but it doesn't work I don't have a network.

auto lo
iface lo inet loopback


auto eno1
iface eno1 inet manual
address 192.168.50.12/24
gateway 192.168.50.1

# auto vmbr0
#iface vmbr0 inet static
# address 192.168.50.12/24
# gateway 192.168.50.1
# bridge-ports eno1
# bridge-stp off

# bridge-fd 0
How to configure the eno1 interface to have network access just to test that it's not a problem with the network card?
What can be the component that disables the network access after a few minutes?
Thank you for your help, I am a bit lost.
 
Last edited:
auto eno1
iface eno1 inet manual
Code:
inet static – Defines a static IP address.

inet manual – Does not define an IP address for an interface. Generally used by interfaces that are bridge or aggregation members, interfaces that need to operate in promiscuous mode (e.g. port mirroring or network TAPs), or have a VLAN device configured on them. It's a way to keep the interface up without an IP address.


https://unix.stackexchange.com/ques...-explanation-of-etc-network-interfaces-syntax

Its just a guess, since you did not provide any state information after boot


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
I have corrected my mistake in the interfaces file.
After reboot I have access to the network but always for a few minutes.
Can IPV6 cause this kind of problem?
Is there another Proxmox component that is not in a basic debian that can cause this type of problem?
 
Are you sure this is based on an ubuntu version?
On the wiki https://pve.proxmox.com/wiki/Installation it is specified Debian. I don't think this is a kernel problem. On 3 identical nodes on one it works without problem and the other 2 there is the same problem. The network card is I219-LM

There is necessarily a difference between these 3 PCs which explains the problem.
They don't have the same RAM configuration. node1 has a single 16GB RAM node2 has two 8GB RAMs. Node1 has an additional display port card, node2 is a serial card.

the question is why node2 works without problems with a Windows installation or a Debian 11.6 live CD and not Proxmox?
in vino veritas, I'm going to have a drink, maybe it'll give me an idea :)
 
Are you sure this is based on an ubuntu version?
Yes, the Proxmox kernel (and therefore the driver) is based on Ubuntu's kernel, see for example this post from a Proxmox staff member.
Yes, Proxmox VE is based on the GNU/Linux distribution Debian but the Linux kernel of Proxmox is based on Ubuntu's kernel.
I don't think this is a kernel problem. On 3 identical nodes on one it works without problem and the other 2 there is the same problem. The network card is I219-LM

There is necessarily a difference between these 3 PCs which explains the problem.
They don't have the same RAM configuration. node1 has a single 16GB RAM node2 has two 8GB RAMs. Node1 has an additional display port card, node2 is a serial card.
Do probably already checked the various BIOS settings of the nodes but maybe the Above 4G Decoding setting is different? Testing the latest Ubuntu Live DVD might give an additional data point (since the Proxmox kernel is based on Ubuntu).
 
Hi,
I tried with an Ubuntu 22.10 live CD and I have the same error after 2-3 minutes the network no longer works.
I installed Ubuntu 22.10 on node2 to be able to debug and retrieve configuration files and logs. I will see with the Ubuntu community if this problem is known.
What is the reason to use an Ubuntu kernel on a Debian for Proxmox?
Is it for the integration of non-free drivers?
 
  • Like
Reactions: leesteken
Hi,

I found a workaround for Ubuntu 22.10

modprobe -rv e1000e
modprobe -v e1000e
systemctl restart networking

after this network is ok and i can ping the gateway without lost packet.

I tested the same thing on proxmox but the traffic still does not pass. with a ping on the gateway no packet passes.
Only the reboot can find the network for a few minutes.

Anyone have an idea?

Thanks

Pat
 
Look for module of network interface...
lsmod

Probably you host not use e1000e module

EDIT: formating of command
 
Last edited:
Hi,
I would like to compile the e1000e driver from source.
I downloaded the sources installed the pve-headers but when I launch the compilation I have the following messages
common.mk:85: *** Kernel header files not in any of the expected locations.
common.mk:86: *** Install the appropriate kernel development package, e.g.
common.mk:87: *** kernel-devel, for building kernel modules and try again. Stop.

Something must be missing but I can't find it.
What should be checked?
 
Hi,
I did some new tests to find a solution:
I recall the basic problem on a server prox1 everything works correctly on the other 2 prox2 and prox3 the network cuts after 2-3mn.
prox1,2 and 3 have the same hardware configuration I don't see any apparent difference but there is surely one I haven't found yet.
From a Debian 11.6 live CD prox2 and 3 are ok
From a Ubuntu 22.10 live CD prox2 and 3 have the same problem as under Proxmox 7.3
I installed Ubuntu 22.10 on prox2 and if I unload and reload the e1000e driver with modprobe then I restart the network I have no more problem the eno1 interface does not cut.
If I do the same thing under Proxmox 7.3 I have just one or 2 ping which passes before the interface eno1 cuts.
There is no trace in the logs (messages, syslog, kernel, dmesg ...) when the eno1 interface goes down or I didn't find it.

Some file for debug :
ip a prox1

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether ec:8e:b5:70:66:4d brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ec:8e:b5:70:66:4d brd ff:ff:ff:ff:ff:ff
    inet 192.168.50.11/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::ee8e:b5ff:fe70:664d/64 scope link
       valid_lft forever preferred_lft forever
ip a prox3
Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether ec:8e:b5:7c:12:40 brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ec:8e:b5:7c:12:40 brd ff:ff:ff:ff:ff:ff
    inet 192.168.50.13/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::ee8e:b5ff:fe7c:1240/64 scope link
       valid_lft forever preferred_lft forever

ip link prox1
Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether ec:8e:b5:70:66:4d brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ec:8e:b5:70:66:4d brd ff:ff:ff:ff:ff:ff

ip link prox3
Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether ec:8e:b5:7c:12:40 brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ec:8e:b5:7c:12:40 brd ff:ff:ff:ff:ff:ff

lshw -c network prox1
Code:
  *-network
       description: Ethernet interface
       product: Ethernet Connection (2) I219-LM
       vendor: Intel Corporation
       physical id: 1f.6
       bus info: pci@0000:00:1f.6
       logical name: eno1
       version: 31
       serial: ec:8e:b5:70:66:4d
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=5.15.74-1-pve duplex=full firmware=0.8-4 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
       resources: irq:125 memory:e1100000-e111ffff
  *-network
       description: Ethernet interface
       physical id: 2
       logical name: vmbr0
       serial: ec:8e:b5:70:66:4d
       size: 1Gbit/s
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=192.168.50.11 link=yes multicast=yes speed=1Gbit/s

lshw -c network prox3
Code:
  *-network
       description: Ethernet interface
       product: Ethernet Connection (2) I219-LM
       vendor: Intel Corporation
       physical id: 1f.6
       bus info: pci@0000:00:1f.6
       logical name: eno1
       version: 31
       serial: ec:8e:b5:7c:12:40
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=5.15.74-1-pve duplex=full firmware=0.8-4 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
       resources: irq:126 memory:e1100000-e111ffff
  *-network
       description: Ethernet interface
       physical id: 2
       logical name: vmbr0
       serial: ec:8e:b5:7c:12:40
       size: 1Gbit/s
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=192.168.50.13 link=yes multicast=yes speed=1Gbit/s

After all these tests I think that the e1000e driver of the kernel 5.15.74-1-pve is a problem for prox2 , 3 .

I tried to compile the Intel driver from the e1000e-3.8.7 sources.
During the make there are errors:
Code:
error  CC [M]  /usr/local/src/e1000e-3.8.7/src/ethtool.o/usr/local/src/e1000e-3.8.7/src/ethtool.c:2838:19: error: initialization of ‘int (*)(struct net_device *, struct ethtool_coalesce *, struct kernel_ethtool_coalesce *, struct netlink_ext_ack *)’ from incompatible pointer type ‘int (*)(struct net_device *, struct ethtool_coalesce *)’ [-Werror=incompatible-pointer-types]2838 |  .get_coalesce  = e1000_get_coalesce,

I think that these sources are no longer compatible with recent kernels.

How to get the e1000e source used in Debian 11.6 to recompile it on Proxmox 7.3?

Thanks for your help.
 

Attachments

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!