One Node Down, Network Issues

stsinc

Member
Apr 15, 2021
66
0
11
Hi,
For some reason I still do not understand, one of the three nodes in my cluster did not mount yesterday and is still down.
After some research, I ended up narrowing down the issue to a network issue.
So:
  • I have ifupdown2 installed
  • I checked the connection to the switch > OK
  • I did a dmesg | grep eth and found the name of the Ethernet card in the node to be enp1s0f0
  • I checked the interfaces content and it seems to comply with what I have on my other nodes:
Code:
auto lo
iface lo inet loopback
iface enp1s0f0 inet manual
        mtu 9000
auto vmbr0
iface vmbr0 inet static
        address 192.168.1.69/24
        gateway 192.168.1.1
        bridge-ports enp1s0f0
        bridge-stp off
        bridge-fd 0
        mtu 9000

Here is what happens:
  1. I have attached a screen and a keyboard to the node: this way, I can check the console without being interrupted by network issues
  2. I reboot the node from its console
  3. I check the switch, the LED lights up
  4. I do an ip a -- the Ethernet card enp1s0f0 is DOWN
  5. as a result:
    1. the bridge vmbr0 does not show on the list
    2. and the Web interface of Proxmox cannot be displayed
I attach a screen copy of the console:

IMG_20210419_092330654.png

What is happening and how can I ge out of this issue that is blocking for our work?
Best,
Stephen
 
Last edited:
Did the networking once work with that docker bridge? Such a bridge does a lot of voodoo with your networking, it wouldn't be a wonder if that bridge breaks the "normal" interface.
That's the reason why docker is encouraged to be used inside a VM.
 
Oh yes, it was working yesterday morning.
You are right, there are several Docker stacks installed in this node but they are installed:
  • either in Turnkey Linux Core CTs
  • or in VMs
Do you think that it would be useful to uninstall docker from the node itself?
 
I just checked, and docker is NOT installed in the node itself: apt remove docker tells me docker is not installed.
 
I also just did a pct list to check if any of the containers/vms with docker embedded were still active > all are stopped
 
If docker is not installed I would suggest to remove the docker bridge that can be seen in your screenshot, just to rule out any influence on your networking.
 
So, I followed your advice:
1. Removed the docker0 bridge:
Code:
ip link set docker0 down
brctl delbr docker0
2. ifup -a
3. ip a : Now both the network card interface (enp1s0f0) and the Proxmox bridge (vumbr0) are active YAYYYY!!!
4. A green tick is displayed again in the cluster interface
5. But after approx. two minutes something weird happens when I do an ip a again:
  • the network card interface disappears
  • a NEW "veth" appears instead, along the existing one.
So, in conclusion:
  • We are definitely on the good track because the node has been fully functional for a short amount of time
  • How can I get rid of those pesky vethxx and what ae they anyway???
 
UPDATE
  • Obviously, the vethxx stuff also comes from docker, they are virtual Ethernet ports
  • I tried to get rid of them both (now there are two) by ip link delete <veth_ID> but it responds: Cannot find device <veth_ID>
 
UPDATE #2
In fact, I feel totally dumb because at one point I must have installed docker bare-metal on the node.
I was trying to apt remove docker and got an error. But docker is NOT installed under "docker".
 
UPDATE #3
I love docker but I did not know it it would behave like Alien in the movie -- how difficult it is to get rid of it, even when installed partially (in my case only docker-ce was installed) !!
So I eventually removed all docker presence on the first level of the node (bare metal). I also disabled any call to it from the systemctl
Still:
  • my network card interface still does not mount
  • one veth still appears
PLEASE HELP!
 
Hm, my experience with docker network is a bit rusty, but my guess would be to remove /var/lib/docker completely, if you already uninstalled every docker package. Plus probably a reboot afterwards.