[SOLVED] PVE-Container segfault

sigo

Active Member
Aug 24, 2017
23
3
43
52
Latest update to PVE broke my containers.
I had installed a simplified Debian containers from LinuxContainer.org (for example: https://us.images.linuxcontainers.o...er/amd64/default/20191119_05:24/rootfs.tar.xz) and have a "bind mount" to ZFS directory in lxc configuration.

When i starting a container with pct start 104 a got a segfault:
Nov 26 07:20:48 pve01 kernel: [206352.372800] lxc-start[14951]: segfault at 50 ip 00007f955159ef8b sp 00007ffd91d23de0 error 4 in liblxc.so.1.6.0[7f9551545000+8a000]
Nov 26 07:20:48 pve01 kernel: [206352.372803] Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41
ff 56 58 48 8b 83 f8 00 00 00 8b
root@pve01:/etc/pve/lxc# cat 104.conf
arch: amd64
cores: 1
hostname: collector
memory: 512
mp0: /rpool/ml/mariadb,mp=/var/lib/mysql
net0: name=eth0,bridge=vmbr30,firewall=1,gw=192.168.30.1,hwaddr=66:65:CB:B3:4C:E0,ip=192.168.30.104/24,type=veth
onboot: 1
ostype: debian
protection: 1
rootfs: local-zfs:subvol-104-disk-0,size=2G
swap: 512
unprivileged: 1

Logs from running lxc-start -F --logpriority .... and strace lxc-start .... attached.

Buggy pveversion:
root@pve01:~# apt list --upgradable
Listing... Done
libpve-access-control/stable 6.0-4 amd64 [upgradable from: 6.0-3]
libpve-common-perl/stable 6.0-8 all [upgradable from: 6.0-7]
libpve-guest-common-perl/stable 3.0-3 all [upgradable from: 3.0-2]
libpve-storage-perl/stable 6.0-11 all [upgradable from: 6.0-9]
pve-cluster/stable 6.0-9 amd64 [upgradable from: 6.0-7]
pve-container/stable 3.0-13 all [upgradable from: 3.0-10]
pve-firewall/stable 4.0-8 amd64 [upgradable from: 4.0-7]
pve-ha-manager/stable 3.0-5 amd64 [upgradable from: 3.0-3]
pve-manager/stable 6.0-15 amd64 [upgradable from: 6.0-12]
qemu-server/stable 6.0-17 amd64 [upgradable from: 6.0-13]

My current pveversion (downgraded from buggy)
root@pve01:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.3.10-1-pve)
pve-manager: 6.0-12 (running version: 6.0-12/0a603350)
pve-kernel-5.3: 6.0-12
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-3
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-7
libpve-guest-common-perl: 3.0-2
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-9
pve-cluster: 6.0-7
pve-container: 3.0-10
pve-docs: 6.0-9
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-4
pve-ha-manager: 3.0-3
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-13
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 

Attachments

  • Like
Reactions: fireon and sigo
I'm still getting this segfault on LXC post upgrade

Dec 3 15:57:40 sword systemd[1]: Starting PVE LXC Container: 107...
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 329 The container failed to start
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Dec 3 15:57:40 sword systemd[1]: pve-container@107.service: Control process exited, code=exited, status=1/FAILURE
Dec 3 15:57:40 sword systemd[1]: pve-container@107.service: Failed with result 'exit-code'.
Dec 3 15:57:40 sword systemd[1]: Failed to start PVE LXC Container: 107.
Dec 3 15:57:40 sword pvedaemon[15550]: command 'systemctl start pve-container@107' failed: exit code 1
Dec 3 15:57:40 sword kernel: [ 3234.588046] lxc-start[15562]: segfault at 50 ip 00007f7e8a656f8b sp 00007ffe43ae6f00 error 4 in liblxc.so.1.6.0[7f7e8a5fd000+8a000]
Dec 3 15:57:40 sword kernel: [ 3234.588073] Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 8>0 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b

Dec 3 15:57:40 sword pvedaemon[1585]: <kronvold@pam> end task UPID:sword:00003CBE:0004EF8D:5DE6DA54:vzstart:107:kronvold@pam: comman>d 'systemctl start pve-container@107' failed: exit code 1

I'm already running pve-container 3.0-14 with the patch mentioned earlier.

proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-15 (running version: 6.0-15/52b91481)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-4.15: 5.4-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-4.15.18-23-pve: 4.15.18-51
pve-kernel-4.15.18-17-pve: 4.15.18-43
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-12
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.0-9
pve-container: 3.0-14
pve-docs: 6.0-9
pve-edk2-firmware: 2.20191002-1
pve-firewall: 4.0-8
pve-firmware: 3.0-4
pve-ha-manager: 3.0-5
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-1
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

edit:
Fixed it... I'll leave this for future wanderers.

One of my zfs didn't mount... had some other zfs strangeness and it all cleared up after I unmounted each one at a time, cleared out the mount points and remounted them with zfs mount -a (which recreated the mountpoints)

Is it odd that this caused the same segfault error?
 
Last edited:
Same Problem here, stopped the CT, never started again, CentOS8
Code:
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 329 The container failed to start
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Jän 29 20:52:48 virtu01 systemd[1]: pve-container@103.service: Control process exited, code=exited, status=1/FAILURE
Jän 29 20:52:48 virtu01 systemd[1]: pve-container@103.service: Failed with result 'exit-code'.
Jän 29 20:52:48 virtu01 pvedaemon[24263]: unable to get PID for CT 103 (not running?)
Jän 29 20:52:48 virtu01 systemd[1]: Failed to start PVE LXC Container: 103.
Jän 29 20:52:48 virtu01 kernel: lxc-start[20122]: segfault at 50 ip 00007f0bbf289f8b sp 00007ffc986076d0 error 4 in liblxc.so.1.6.0[7f0bbf230000+8a000]
Jän 29 20:52:48 virtu01 kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b
Jän 29 20:52:48 virtu01 pvedaemon[20109]: command 'systemctl start pve-container@103' failed: exit code 1
 
Can you please try to start this one in the foreground with debug log file and post that here?
Code:
lxc-start -n 103 -F -l DEBUG -o /tmp/ct-103.log

CT config itself could also be nice to have.

@wbumiller any idea?
 
I have same problem:

Code:
-- The unit pve-container@119.service has entered the 'failed' state with result 'exit-code'.
Jan 31 22:09:57 proxmox.s-dev.work systemd[1]: Failed to start PVE LXC Container: 119.
-- Subject: A start job for unit pve-container@119.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-container@119.service has finished with a failure.
--
-- The job identifier is 3808 and the job result is failed.
Jan 31 22:09:57 proxmox.s-dev.work kernel: lxc-start[15696]: segfault at 50 ip 00007f467a08ef8b sp 00007fffaca46f80 error 4 in liblxc.so.1.6.0[7f467a035000+8a000]
Jan 31 22:09:57 proxmox.s-dev.work kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 d
Jan 31 22:10:00 proxmox.s-dev.work systemd[1]: Starting Proxmox VE replication runner...
 
hi,

I'm attach logs too. Maybe any body know how fix this error?

unable to detect OS distribution can mean the following things:

- the container doesn't have the file(s) for us to detect the distribution (deleted/missing file)
- container rootfs may be corrupted
- your volume for the container is not mounted
 
Try this:

in /lib/systemd/system/zfs-mount.service

change ExecStart=/sbin/zfs mount -a
to ExecStart=/sbin/zfs mount -O -a

then restart proxmox.
Thanks, but the CT was new created, the old one is deleted.
 
I have similar problem. Supermicro H11SSL + Amd epyc 7551, zfs raidz. Using KVM and LXC. When I power off and power on server, lxc containers can't boot because filesystem is not mounted (zfs). But when I reboot, filesystems are mounted correctly and all is fine. I'm not sure when exactly this problem started (production server), but it's quite strange and annoying.

=>
change ExecStart=/sbin/zfs mount -a
to ExecStart=/sbin/zfs mount -O -a


I didn't try this, I think it can be dangerous.