I upgraded from PVE8 to PVE9 and everything went smoothly. However afterwards all the PCT were in the stopped state.
Manually starting PCT shows the error
And debug shows a bit more detail
This is the network status. It's normal for eno2 to be DOWN (it's unplugged).
And the network configuration in /etc/network/interfaces
Tried various things and eventually stumbled on this workaround (not a fix).
And now the container is running. But reboot the host and the fault returns. I'm holding off upgrading more hosts to PVE9 because a reboot effectively bricks all PCT until I manually intervene with the workaround.
I have no explanation for how the workaround even works. I compared ifconfig before/after and nothing changes except the index number of vmbr0 (goes from 4 to 22). I'll continue troubleshooting tomorrow.
PS: I searched the forums and found one other thread with a similar error message, but the fix there to reinstall proxmox-kernel did not work for me.
Manually starting PCT shows the error
Code:
# pct start 100
run_buffer: 571 Script exited with status 2
lxc_create_network_priv: 3466 Success - Failed to create network device
lxc_spawn: 1852 Failed to create the network
__lxc_start: 2119 Failed to spawn container "100"
startup for container '100' failed
And debug shows a bit more detail
Code:
INFO utils - ../src/lxc/utils.c:run_script_argv:587 - Executing script "/usr/share/lxc/lxcnetaddbr" for container "100", config section "net"
DEBUG utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/lxcnetaddbr 100 net up veth veth100i0 produced output: RTNETLINK answers: Unknown error 524
DEBUG utils - ../src/lxc/utils.c:run_buffer:560 - Script exec /usr/share/lxc/lxcnetaddbr 100 net up veth veth100i0 produced output: can't enslave 'fwpr100p0' to 'vmbr0'
ERROR utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 2
ERROR network - ../src/lxc/network.c:lxc_create_network_priv:3466 - Success - Failed to create network device
ERROR start - ../src/lxc/start.c:lxc_spawn:1852 - Failed to create the network
DEBUG network - ../src/lxc/network.c:lxc_delete_network:4221 - Deleted network devices
ERROR start - ../src/lxc/start.c:__lxc_start:2119 - Failed to spawn container "100"
WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 16 for process 4227
startup for container '100' failed
This is the network status. It's normal for eno2 to be DOWN (it's unplugged).
Code:
# ip l show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP mode DEFAULT group default qlen 1000
link/ether 64:51:06:d8:12:34 brd ff:ff:ff:ff:ff:ff
altname enp3s0f0
altname enx645106d81234
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr0 state DOWN mode DEFAULT group default qlen 1000
link/ether 64:51:06:d8:12:35 brd ff:ff:ff:ff:ff:ff
altname enp3s0f1
altname enx645106d81235
4: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 64:51:06:d8:12:34 brd ff:ff:ff:ff:ff:ff
And the network configuration in /etc/network/interfaces
Code:
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.1.55/24
gateway 192.168.1.1
bridge-ports eno1 eno2
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
source /etc/network/interfaces.d/*
Tried various things and eventually stumbled on this workaround (not a fix).
Code:
# ifdown vmbr0 ; ifup vmbr0
# pct start 100
#
And now the container is running. But reboot the host and the fault returns. I'm holding off upgrading more hosts to PVE9 because a reboot effectively bricks all PCT until I manually intervene with the workaround.
I have no explanation for how the workaround even works. I compared ifconfig before/after and nothing changes except the index number of vmbr0 (goes from 4 to 22). I'll continue troubleshooting tomorrow.
PS: I searched the forums and found one other thread with a similar error message, but the fix there to reinstall proxmox-kernel did not work for me.