[SOLVED] PVE-Container segfault

sigo · Nov 26, 2019

Latest update to PVE broke my containers.
I had installed a simplified Debian containers from LinuxContainer.org (for example: https://us.images.linuxcontainers.o...er/amd64/default/20191119_05:24/rootfs.tar.xz) and have a "bind mount" to ZFS directory in lxc configuration.

When i starting a container with pct start 104 a got a segfault:
Nov 26 07:20:48 pve01 kernel: [206352.372800] lxc-start[14951]: segfault at 50 ip 00007f955159ef8b sp 00007ffd91d23de0 error 4 in liblxc.so.1.6.0[7f9551545000+8a000]
Nov 26 07:20:48 pve01 kernel: [206352.372803] Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41
ff 56 58 48 8b 83 f8 00 00 00 8b

root@pve01:/etc/pve/lxc# cat 104.conf
arch: amd64
cores: 1
hostname: collector
memory: 512
mp0: /rpool/ml/mariadb,mp=/var/lib/mysql
net0: name=eth0,bridge=vmbr30,firewall=1,gw=192.168.30.1,hwaddr=66:65:CB:B3:4C:E0,ip=192.168.30.104/24,type=veth
onboot: 1
ostype: debian
protection: 1
rootfs: local-zfs:subvol-104-disk-0,size=2G
swap: 512
unprivileged: 1

Logs from running lxc-start -F --logpriority .... and strace lxc-start .... attached.

Buggy pveversion:

root@pve01:~# apt list --upgradable
Listing... Done
libpve-access-control/stable 6.0-4 amd64 [upgradable from: 6.0-3]
libpve-common-perl/stable 6.0-8 all [upgradable from: 6.0-7]
libpve-guest-common-perl/stable 3.0-3 all [upgradable from: 3.0-2]
libpve-storage-perl/stable 6.0-11 all [upgradable from: 6.0-9]
pve-cluster/stable 6.0-9 amd64 [upgradable from: 6.0-7]
pve-container/stable 3.0-13 all [upgradable from: 3.0-10]
pve-firewall/stable 4.0-8 amd64 [upgradable from: 4.0-7]
pve-ha-manager/stable 3.0-5 amd64 [upgradable from: 3.0-3]
pve-manager/stable 6.0-15 amd64 [upgradable from: 6.0-12]
qemu-server/stable 6.0-17 amd64 [upgradable from: 6.0-13]

My current pveversion (downgraded from buggy)

root@pve01:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.3.10-1-pve)
pve-manager: 6.0-12 (running version: 6.0-12/0a603350)
pve-kernel-5.3: 6.0-12
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-3
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-7
libpve-guest-common-perl: 3.0-2
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-9
pve-cluster: 6.0-7
pve-container: 3.0-10
pve-docs: 6.0-9
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-4
pve-ha-manager: 3.0-3
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-13
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

t.lamprecht · Nov 26, 2019

Yes, seems like a regression which came in with the new mountpoint hotplug feature through pve-container.

A fix has been applied, and will be released soon:
https://git.proxmox.com/?p=pve-container.git;a=commitdiff;h=48e36ac2e039ac6b9b19fb44a243ae88f8f3ed54

sigo · Nov 26, 2019

t.lamprecht said:
Yes, seems like a regression which came in with the new mountpoint hotplug feature through pve-container.

A fix has been applied, and will be released soon:
https://git.proxmox.com/?p=pve-container.git;a=commitdiff;h=48e36ac2e039ac6b9b19fb44a243ae88f8f3ed54

I will check this fix later and report here.

sigo · Nov 26, 2019

t.lamprecht said:
Yes, seems like a regression which came in with the new mountpoint hotplug feature through pve-container.

A fix has been applied, and will be released soon:
https://git.proxmox.com/?p=pve-container.git;a=commitdiff;h=48e36ac2e039ac6b9b19fb44a243ae88f8f3ed54

yes, this solves the bug. Thanks!

kronvold · Dec 4, 2019

I'm still getting this segfault on LXC post upgrade

Dec 3 15:57:40 sword systemd[1]: Starting PVE LXC Container: 107...
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 329 The container failed to start
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Dec 3 15:57:40 sword systemd[1]: pve-container@107.service: Control process exited, code=exited, status=1/FAILURE
Dec 3 15:57:40 sword systemd[1]: pve-container@107.service: Failed with result 'exit-code'.
Dec 3 15:57:40 sword systemd[1]: Failed to start PVE LXC Container: 107.
Dec 3 15:57:40 sword pvedaemon[15550]: command 'systemctl start pve-container@107' failed: exit code 1
Dec 3 15:57:40 sword kernel: [ 3234.588046] lxc-start[15562]: segfault at 50 ip 00007f7e8a656f8b sp 00007ffe43ae6f00 error 4 in liblxc.so.1.6.0[7f7e8a5fd000+8a000]
Dec 3 15:57:40 sword kernel: [ 3234.588073] Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 8>0 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b
Dec 3 15:57:40 sword pvedaemon[1585]: <kronvold@pam> end task UPID:sword:00003CBE:0004EF8D:5DE6DA54:vzstart:107:kronvold@pam: comman>d 'systemctl start pve-container@107' failed: exit code 1

I'm already running pve-container 3.0-14 with the patch mentioned earlier.

proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-15 (running version: 6.0-15/52b91481)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-4.15: 5.4-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-4.15.18-23-pve: 4.15.18-51
pve-kernel-4.15.18-17-pve: 4.15.18-43
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-12
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.0-9
pve-container: 3.0-14
pve-docs: 6.0-9
pve-edk2-firmware: 2.20191002-1
pve-firewall: 4.0-8
pve-firmware: 3.0-4
pve-ha-manager: 3.0-5
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-1
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

edit:
Fixed it... I'll leave this for future wanderers.

One of my zfs didn't mount... had some other zfs strangeness and it all cleared up after I unmounted each one at a time, cleared out the mount points and remounted them with zfs mount -a (which recreated the mountpoints)

Is it odd that this caused the same segfault error?

fireon · Jan 29, 2020

Same Problem here, stopped the CT, never started again, CentOS8

Code:

Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 329 The container failed to start
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Jän 29 20:52:48 virtu01 systemd[1]: pve-container@103.service: Control process exited, code=exited, status=1/FAILURE
Jän 29 20:52:48 virtu01 systemd[1]: pve-container@103.service: Failed with result 'exit-code'.
Jän 29 20:52:48 virtu01 pvedaemon[24263]: unable to get PID for CT 103 (not running?)
Jän 29 20:52:48 virtu01 systemd[1]: Failed to start PVE LXC Container: 103.
Jän 29 20:52:48 virtu01 kernel: lxc-start[20122]: segfault at 50 ip 00007f0bbf289f8b sp 00007ffc986076d0 error 4 in liblxc.so.1.6.0[7f0bbf230000+8a000]
Jän 29 20:52:48 virtu01 kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b
Jän 29 20:52:48 virtu01 pvedaemon[20109]: command 'systemctl start pve-container@103' failed: exit code 1

t.lamprecht · Jan 29, 2020

Can you please try to start this one in the foreground with debug log file and post that here?

Code:

lxc-start -n 103 -F -l DEBUG -o /tmp/ct-103.log

CT config itself could also be nice to have.

@wbumiller any idea?

rus · Jan 31, 2020

I have same problem:

Code:

-- The unit pve-container@119.service has entered the 'failed' state with result 'exit-code'.
Jan 31 22:09:57 proxmox.s-dev.work systemd[1]: Failed to start PVE LXC Container: 119.
-- Subject: A start job for unit pve-container@119.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-container@119.service has finished with a failure.
--
-- The job identifier is 3808 and the job result is failed.
Jan 31 22:09:57 proxmox.s-dev.work kernel: lxc-start[15696]: segfault at 50 ip 00007f467a08ef8b sp 00007fffaca46f80 error 4 in liblxc.so.1.6.0[7f467a035000+8a000]
Jan 31 22:09:57 proxmox.s-dev.work kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 d
Jan 31 22:10:00 proxmox.s-dev.work systemd[1]: Starting Proxmox VE replication runner...

fireon · Feb 2, 2020

Log is attached.

rus · Feb 3, 2020

I'm attach logs too. Maybe any body know how fix this error?

oguz · Feb 3, 2020

hi,

rus said:
I'm attach logs too. Maybe any body know how fix this error?

unable to detect OS distribution can mean the following things:

- the container doesn't have the file(s) for us to detect the distribution (deleted/missing file)
- container rootfs may be corrupted
- your volume for the container is not mounted

fireon · Feb 3, 2020

rus said:
I'm attach logs too. Maybe any body know how fix this error?

What OS you are running inside?

fireon · Feb 3, 2020

oguz said:
- container rootfs may be corrupted

And how can i check this? It is an ZFS Dataset. Command like e2fsck didn't work.

Marco Manenti · Feb 15, 2020

Try this:

in /lib/systemd/system/zfs-mount.service

change ExecStart=/sbin/zfs mount -a
to ExecStart=/sbin/zfs mount -O -a

then restart proxmox.

fireon · Feb 15, 2020

Marco Manenti said:
Try this:

in /lib/systemd/system/zfs-mount.service

change ExecStart=/sbin/zfs mount -a
to ExecStart=/sbin/zfs mount -O -a

then restart proxmox.

Thanks, but the CT was new created, the old one is deleted.

olegfusion · Mar 9, 2020

I have similar problem. Supermicro H11SSL + Amd epyc 7551, zfs raidz. Using KVM and LXC. When I power off and power on server, lxc containers can't boot because filesystem is not mounted (zfs). But when I reboot, filesystems are mounted correctly and all is fine. I'm not sure when exactly this problem started (production server), but it's quite strange and annoying.

=>
change ExecStart=/sbin/zfs mount -a
to ExecStart=/sbin/zfs mount -O -a

I didn't try this, I think it can be dangerous.

Search

Search

[SOLVED] PVE-Container segfault

sigo

Active Member

Attachments

t.lamprecht

Proxmox Staff Member

sigo

Active Member

sigo

Active Member

kronvold

New Member

fireon

Distinguished Member

t.lamprecht

Proxmox Staff Member

rus

Member

fireon

Distinguished Member

Attachments

rus

Member

Attachments

oguz

Proxmox Retired Staff

fireon

Distinguished Member

fireon

Distinguished Member

Marco Manenti

Active Member

fireon

Distinguished Member

olegfusion

Member

We value your privacy