Hanging `ip` commands result in VM startup timeout

marinbernard

Member
Jan 2, 2019
9
1
6
38
Nice, FR
Hi,

I am using Proxmox VE 7.1 (up to date) on two brand new Atom C3758 motherboards with integrated Intel X553 GbE adapters. The servers were migrated from C2758 motherboards which were working fine until their replacement.

Trying to start a KVM guest on those nodes results in a timeout error. The timeout is thrown 20 to 30 seconds after the start command is issued. During this interval, the whole PVE host becomes unresponsive, until the kernel watchdog raises a rescue interrupt as it detects a loop on a CPU core:

Code:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [ip:52000]

And indeed, running top during the startup process shows that the following ip command consumes 100% of one CPU:

Code:
/sbin/ip link set fwpr149p0 master vmbr0

After the timeout is reached, the above ip process is probably killed. Another one is started to remove the tap interface, but it hangs the same way:

Code:
/sbin/ip link delete dev fwln149i0

Trying to remove the remaining interfaces by hand makes the node hang again.

What can be the cause of those hanging IP commands ? Is it a network driver issue ?

The network configuration of the node is quite simple and shown below.

Code:
$ ls /etc/network/interfaces.d/
$ cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto eno3
iface eno3 inet manual

auto eno4
iface eno4 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2 eno3 eno4
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2+3

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

auto vmbr0.164
iface vmbr0.164 inet static
        address 10.0.2.228/24
        gateway 10.0.2.254

$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp2s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:8a:2f:c4 brd ff:ff:ff:ff:ff:ff
3: enp2s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:8a:2f:c5 brd ff:ff:ff:ff:ff:ff
4: enp2s0f2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:8a:2f:c6 brd ff:ff:ff:ff:ff:ff
5: enp2s0f3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:8a:2f:c7 brd ff:ff:ff:ff:ff:ff
6: eno1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 5a:33:6b:b4:8a:10 brd ff:ff:ff:ff:ff:ff permaddr 3c:ec:ef:04:75:06
    altname enp6s0f0
7: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 5a:33:6b:b4:8a:10 brd ff:ff:ff:ff:ff:ff permaddr 3c:ec:ef:04:75:07
    altname enp6s0f1
8: eno3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 5a:33:6b:b4:8a:10 brd ff:ff:ff:ff:ff:ff permaddr 3c:ec:ef:04:75:08
    altname enp7s0f0
9: eno4: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 5a:33:6b:b4:8a:10 brd ff:ff:ff:ff:ff:ff permaddr 3c:ec:ef:04:75:09
    altname enp7s0f1
10: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether 5a:33:6b:b4:8a:10 brd ff:ff:ff:ff:ff:ff
11: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 5a:33:6b:b4:8a:10 brd ff:ff:ff:ff:ff:ff
12: vmbr0.164@vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 5a:33:6b:b4:8a:10 brd ff:ff:ff:ff:ff:ff
46: fwbr149i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 22:b3:40:98:33:98 brd ff:ff:ff:ff:ff:ff
47: fwpr149p0@fwln149i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether 12:ae:87:d1:d1:87 brd ff:ff:ff:ff:ff:ff
48: fwln149i0@fwpr149p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr149i0 state UP mode DEFAULT group default qlen 1000
    link/ether fe:3a:48:94:cb:31 brd ff:ff:ff:ff:ff:ff

pveversion follows:

Code:
# pveversion  -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-3-pve: 5.11.22-7
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
 
Last edited:
This seems to be exactly the same issue as the one described on this thread. Enabling SR-IOV in the BIOS seems to help: I was able to start all the VMs and only got the soft lockup alert on startup for the first VM.

IMHO this issue is probably caused by a bug in network drivers or the network stack.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!