lxc doesn't start properly after upgrade to pve7

lifeboy · Jan 23, 2022

I upgraded my nodes from PVE 6.4 to 7, checked in advance with pve6to7 for any issues and all seemed to have gone well, except I have one container that starts, but not properly.

If I do pct start 138, no error is returned, but the container doesn't run, although it's reported as running.

Code:

~# lxc-start -F -f /etc/pve/lxc/138.conf --name pre --logfile /tmp/lxc.pre..log --logpriority TRACE
lxc-start: pre: conf.c: lxc_transient_proc: 3804 No such file or directory - Failed to mount temporary procfs
lxc-start: pre: mount_utils.c: mount_at: 660 No such file or directory - Failed to mount "/proc/self/fd/59" to "/proc/self/fd/58"
lxc-start: pre: conf.c: lxc_setup_dev_console: 1996 No such file or directory - Failed to mount "8(/dev/pts/1)" on "58"
lxc-start: pre: conf.c: lxc_setup_console: 2152 No such file or directory - Failed to setup console
lxc-start: pre: conf.c: lxc_setup: 4424 Failed to setup console
lxc-start: pre: start.c: do_start: 1274 Failed to setup container "pre"
lxc-start: pre: sync.c: sync_wait: 34 An error occurred in another process (expected sequence number 4)
lxc-start: pre: start.c: __lxc_start: 2068 Failed to spawn container "pre"
lxc-start: pre: tools/lxc_start.c: main: 306 The container failed to start
lxc-start: pre: tools/lxc_start.c: main: 311 Additional information can be obtained by setting the --logfile and --logpriority options

The conf file:

Code:

~# cat /etc/pve/lxc/138.conf
#Ruby 2.4 and Rails 5 test server
#mp0%3A /mnt/imb-win01,mp=/mnt/server
arch: amd64
cores: 4
features: fuse=1
hostname: pre
memory: 8192
nameserver: 192.168.131.254 net0: name=eth0,bridge=vmbr0,gw=192.168.131.254,hwaddr=5E:8B:07:01:72:D1,ip=192.168.131.186/24,type=veth
onboot: 1
ostype: ubuntu
rootfs: standard:vm-138-disk-0,size=50G
searchdomain: xxx.yy
swap: 4096
unprivileged: 1

I'm stuck. What is breaking this container?

Trace log here https://pastebin.com/NMS0s94W

oguz · Jan 24, 2022

hi,

Code:

nameserver: 192.168.131.254 net0: name=eth0,bridge=vmbr0,gw=192.168.131.254,hwaddr=5E:8B:07:01:72:D1,ip=192.168.131.186/24,type=veth

is this how it looks in your config file or it happened while pasting?
it should rather be:

Code:

nameserver: 192.168.131.254 
net0: name=eth0,bridge=vmbr0,gw=192.168.131.254,hwaddr=5E:8B:07:01:72:D1,ip=192.168.131.186/24,type=veth

lifeboy · Jan 24, 2022

oguz said:
hi,

Code:

nameserver: 192.168.131.254 net0: name=eth0,bridge=vmbr0,gw=192.168.131.254,hwaddr=5E:8B:07:01:72:D1,ip=192.168.131.186/24,type=veth

is this how it looks in your config file or it happened while pasting?
it should rather be:

Code:

nameserver: 192.168.131.254 net0: name=eth0,bridge=vmbr0,gw=192.168.131.254,hwaddr=5E:8B:07:01:72:D1,ip=192.168.131.186/24,type=veth

Yes, it happened in the pasting...

oguz · Jan 26, 2022

lifeboy said:
If I do pct start 138, no error is returned, but the container doesn't run, although it's reported as running.

can you do a pct enter CTID afterwards?
is the lxc-start process alive for your container? (ps aux | grep CTID)

also could you post the pveversion -v output just in case

lifeboy · Jan 26, 2022

oguz said:
can you do a pct enter CTID afterwards?
is the lxc-start process alive for your container? (ps aux | grep CTID)

also could you post the pveversion -v output just in case

Yes, pct enter 138 works. I'm in the container now, but there's no network, which is probably the main problem. I'll dig around to see what I can find.

~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.4.157-1-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-8
pve-kernel-5.13: 7.1-6
pve-kernel-5.4: 6.4-12
pve-kernel-5.3: 6.1-6
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.4.162-1-pve: 5.4.162-2
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 15.2.15-pve1
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-network-perl: 0.6.2
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.3-1
proxmox-backup-file-restore: 2.1.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

lifeboy · Jan 26, 2022

lifeboy said:
Yes, pct enter 138 works. I'm in the container now, but there's no network, which is probably the main problem. I'll dig around to see what I can find.

If I start the networking (/etc/init.d/networking start), the network comes up. I can also start ssh then.

oguz · Jan 26, 2022

btw (running kernel: 5.4.157-1-pve) is that on purpose? that's a pretty old version of our kernel (and you have the newer ones also installed)

lifeboy said:
If I start the networking (/etc/init.d/networking start), the network comes up. I can also start ssh then.

alright, then maybe something is wrong with that container only? does it happen on other containers?

lifeboy · Jan 27, 2022

oguz said:
btw (running kernel: 5.4.157-1-pve) is that on purpose? that's a pretty old version of our kernel (and you have the newer ones also installed)

alright, then maybe something is wrong with that container only? does it happen on other containers?

No, that's the only container. But then it's also the only container that was running Ubuntu 14.04 when the upgrade to pve7 was done.

The lxc was running perfectly before though. Now, when I enter the lxc and start all the services manually, they run. But of course, that should not be, they should start as set up.

oguz · Jan 27, 2022

lifeboy said:
But then it's also the only container that was running Ubuntu 14.04 when the upgrade to pve7 was done.

that's probably why it doesn't work well on PVE7 (systemd version on ubuntu < 18.04 is not compatible with the default cgroupv2 we use on PVE7) [0]

there are some options to do so if you still want to run older ubuntu, see suggestions at the link

[0]: https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup_compat

lifeboy · Jan 27, 2022

oguz said:
there are some options to do so if you still want to run older ubuntu, see suggestions at the link

[0]: https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup_compat

I have read through that, but something is not quite clear to me. In the Ubuntu 14.04 lxc image there is no /etc/default/grub as refered to by this linked reference. So should the systemd.unified_cgroup_hierarchy=0 parameter be set in the proxmox node kernel config??

oguz · Jan 27, 2022

lifeboy said:
So should the systemd.unified_cgroup_hierarchy=0 parameter be set in the proxmox node kernel config??

yes the cgroups are from the host kernel

lifeboy · Jan 27, 2022

oguz said:
yes the cgroups are from the host kernel

So, is it one of the other for all lxc's? In other words, if I impliment this kernel setting, will all containers revert to using cgroups instead cgroupsv2

oguz · Jan 27, 2022

lifeboy said:
So, is it one of the other for all lxc's? In other words, if I impliment this kernel setting, will all containers revert to using cgroups instead cgroupsv2

yes it's for all the containers on that host.
the newer containers should also work fine with the older cgroups for now

lifeboy · Feb 2, 2022

oguz said:
yes it's for all the containers on that host.
the newer containers should also work fine with the older cgroups for now

I think the only way forward to future proof these older guests is to move them to KVM machines.

oguz · Feb 2, 2022

lifeboy said:
I think the only way forward to future proof these older guests is to move them to KVM machines.

would be a good idea. upgrading them to newer releases could also be an option depending on what you're doing with them

lifeboy · Feb 2, 2022

oguz said:
would be a good idea. upgrading them to newer releases could also be an option depending on what you're doing with them

It's a legacy rails app for which there is no upgrade budget at this stage

Needs substantial work and it probably won't happen.

Search

Search

lxc doesn't start properly after upgrade to pve7

lifeboy

Renowned Member

oguz

Proxmox Retired Staff

lifeboy

Renowned Member

oguz

Proxmox Retired Staff

lifeboy

Renowned Member

lifeboy

Renowned Member

oguz

Proxmox Retired Staff

lifeboy

Renowned Member

oguz

Proxmox Retired Staff

lifeboy

Renowned Member

oguz

Proxmox Retired Staff

lifeboy

Renowned Member

oguz

Proxmox Retired Staff

lifeboy

Renowned Member

oguz

Proxmox Retired Staff

lifeboy

Renowned Member