[SOLVED] The network stops responding after about 30/40 minutes after upgraded from Proxmox 6 to 7.0-11

kamzata · Nov 9, 2021

Yesterday I upgraded from 6 to 7. The upgrade was performed without any error message and now I can use Proxmox 7 normally. However, after about 30 minutes from the upgrade, the server went offline. I proceeded to restart it and again everything was working correctly. After about an hour a new crash. The server is offline again. Now I have restarted it and it seems to be working fine again. What is happening?

Package versions:

Bash:

proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.4: 6.4-7
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-14
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.6-pve1~bpo10+1

These are the packages installed in addiction:

Code:

ncdu
htop
curlftpfs
npd6
fail2ban

This is my sysctl.conf:

Bash:

vm.max_map_count=262144
fs.protected_hardlinks=1
fs.protected_symlinks=1


### IPv4
net.ipv4.conf.all.rp_filter=1
net.ipv4.icmp_echo_ignore_broadcasts=1
net.ipv4.conf.default.forwarding=1
net.ipv4.conf.default.proxy_arp=0
net.ipv4.ip_forward=1
kernel.sysrq=1
net.ipv4.conf.default.send_redirects=1
net.ipv4.conf.all.send_redirects=0

### IPv6
net.ipv6.conf.eno1.autoconf=0
net.ipv6.conf.eno1.accept_ra=0
net.ipv6.conf.all.accept_redirects=0
net.ipv6.conf.all.router_solicitations=1
net.ipv6.conf.all.forwarding=1
net.ipv6.conf.default.forwarding=1
net.ipv6.conf.all.proxy_ndp=1
net.ipv6.conf.default.proxy_ndp=1

mira · Nov 9, 2021

Please provide the syslogs (/var/log/syslog and /var/log/syslog.1).

kamzata · Nov 9, 2021

It just happened again.

Syslog: https://drive.google.com/file/d/1d_PFNysMKCb_zXJ0klbnD3x-tLPfjbxn/view?usp=sharing
Sysylog.1: https://drive.google.com/file/d/1sYXUBhVpMctrI6OPrjPEoG8dZUNdibhw/view?usp=sharing

The first crash happened at 04:17 this night.

kamzata · Nov 9, 2021

Using remote KVM I can see this before server stops to respond:

mira · Nov 9, 2021

Is there by any chance a BIOS update available? Also check the firmware for the other hardware, for example the NIC.

Code:

Nov  9 04:17:01 srv001 systemd[1]: pvesr.service: Succeeded.
Nov  9 04:17:01 srv001 systemd[1]: Finished Proxmox VE replication runner.
Nov  9 04:17:01 srv001 CRON[38934]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Nov  9 10:54:26 srv001 systemd-modules-load[1171]: Inserted module 'iscsi_tcp'
Nov  9 10:54:26 srv001 systemd-modules-load[1171]: Inserted module 'ib_iser'
Nov  9 10:54:26 srv001 systemd-modules-load[1171]: Inserted module 'vhost_net'

That's a rather large jump. Did your host hang the whole time?

kamzata · Nov 9, 2021

mira said:
Is there by any chance a BIOS update available? Also check the firmware for the other hardware, for example the NIC.

Code:

Nov 9 04:17:01 srv001 systemd[1]: pvesr.service: Succeeded. Nov 9 04:17:01 srv001 systemd[1]: Finished Proxmox VE replication runner. Nov 9 04:17:01 srv001 CRON[38934]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Nov 9 10:54:26 srv001 systemd-modules-load[1171]: Inserted module 'iscsi_tcp' Nov 9 10:54:26 srv001 systemd-modules-load[1171]: Inserted module 'ib_iser' Nov 9 10:54:26 srv001 systemd-modules-load[1171]: Inserted module 'vhost_net'

That's a rather large jump. Did your host hang the whole time?

It's a dedicated server hosted on OVH. I don't know even if it's allowed me update the BIOS or firmware. Yes, I went to sleep and then I get up and the server didn't answer anymore.

kamzata · Nov 9, 2021

This is my /etc/network/interfaces:

Bash:

auto lo
iface lo inet loopback
iface lo inet6 loopback

iface eno1 inet manual


auto vmbr0
iface vmbr0 inet static
        address 51.77.XX.65
        netmask  255.255.255.0
        gateway 51.77.XX.254
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0


        up ip addr add 51.89.XX.206/32 dev vmbr0
        down ip addr del 51.89.XX.206/32 dev vmbr0

        up ip addr add 51.89.XX.215/32 dev vmbr0
        down ip addr del 51.89.XX.215/32 dev vmbr0


        up ip addr add 192.168.1.1/24 dev vmbr0
        down ip addr del 192.168.1.1/24 dev vmbr0

        up ip addr add 192.168.2.1/24 dev vmbr0
        down ip addr del 192.168.2.1/24 dev vmbr0


iface vmbr0 inet6 static
        address  2001:41d0:XXX:2441::ffff
        netmask  128

        post-up sleep 5; /sbin/ip -6 route add 2001:41d0:XXX:24FF:FF:FF:FF:FF dev vmbr0
        post-up sleep 5; /sbin/ip -6 route add default via 2001:41d0:XXX:24FF:FF:FF:FF:FF
        pre-down /sbin/ip -6 route del default via 2001:41d0:XXX:24FF:FF:FF:FF:FF
        pre-down /sbin/ip -6 route del 2001:41d0:XXX:24FF:FF:FF:FF:FF dev vmbr0

        post-up /sbin/ip -f inet6 neigh add proxy 2001:41d0:XXX:24FF:FF:FF:FF:FF dev vmbr0


        post-up echo 0 > /proc/sys/net/ipv6/conf/vmbr0/autoconf
        post-up echo 0 > /proc/sys/net/ipv6/conf/vmbr0/accept_ra
        post-up echo 1 > /proc/sys/net/ipv6/conf/all/proxy_ndp
        post-up echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
        post-up echo 1 > /proc/sys/net/ipv6/conf/default/forwarding


iface enp0s20f0u8u3c2 inet manual

iface eno2 inet manual

I use npd6 in order to make ipv6 works.

What about those messages?

Bash:

....
vmbr0: port 18(veth210i0) entered blocking state
vmbr0: port 18(veth210i0) entered disabled state
...
vmbr0: port 19(veth210i0) entered blocking state
vmbr0: port 19(veth210i0) entered forwarding state
...

It's a nightmare... it stops every 30/40 minutes.

kamzata · Nov 9, 2021

Anyway, using remote KVM, once the server stops to respond, I see the output (as I mentioned before) and I'm able to press CTRL+D. Doing that, it throws me to the Proxmox Login. Then I log in and reboot the server.

I also tried to ping 8.8.8.8 before rebooting but network was unreachable.

kamzata · Nov 9, 2021

Could be related to this? Could I simply install ifupdown2 even if it wasn't already installed? Or should I choose the solution B?

kamzata · Nov 9, 2021

Just checked out a little bit more in deep... networking service is up and ipv6 works. It seems something related to just ipv4. Any hints?

Edit: just installed ifupdown2 and now it works!

Search

Search

[SOLVED] The network stops responding after about 30/40 minutes after upgraded from Proxmox 6 to 7.0-11

kamzata

Renowned Member

mira

Proxmox Staff Member

kamzata

Renowned Member

kamzata

Renowned Member

mira

Proxmox Staff Member

kamzata

Renowned Member

kamzata

Renowned Member

kamzata

Renowned Member

kamzata

Renowned Member

kamzata

Renowned Member