[SOLVED] PVE-Container segfault

sigo

Active Member
Aug 24, 2017
23
3
43
51
Latest update to PVE broke my containers.
I had installed a simplified Debian containers from LinuxContainer.org (for example: https://us.images.linuxcontainers.o...er/amd64/default/20191119_05:24/rootfs.tar.xz) and have a "bind mount" to ZFS directory in lxc configuration.

When i starting a container with pct start 104 a got a segfault:
Nov 26 07:20:48 pve01 kernel: [206352.372800] lxc-start[14951]: segfault at 50 ip 00007f955159ef8b sp 00007ffd91d23de0 error 4 in liblxc.so.1.6.0[7f9551545000+8a000]
Nov 26 07:20:48 pve01 kernel: [206352.372803] Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41
ff 56 58 48 8b 83 f8 00 00 00 8b
root@pve01:/etc/pve/lxc# cat 104.conf
arch: amd64
cores: 1
hostname: collector
memory: 512
mp0: /rpool/ml/mariadb,mp=/var/lib/mysql
net0: name=eth0,bridge=vmbr30,firewall=1,gw=192.168.30.1,hwaddr=66:65:CB:B3:4C:E0,ip=192.168.30.104/24,type=veth
onboot: 1
ostype: debian
protection: 1
rootfs: local-zfs:subvol-104-disk-0,size=2G
swap: 512
unprivileged: 1

Logs from running lxc-start -F --logpriority .... and strace lxc-start .... attached.

Buggy pveversion:
root@pve01:~# apt list --upgradable
Listing... Done
libpve-access-control/stable 6.0-4 amd64 [upgradable from: 6.0-3]
libpve-common-perl/stable 6.0-8 all [upgradable from: 6.0-7]
libpve-guest-common-perl/stable 3.0-3 all [upgradable from: 3.0-2]
libpve-storage-perl/stable 6.0-11 all [upgradable from: 6.0-9]
pve-cluster/stable 6.0-9 amd64 [upgradable from: 6.0-7]
pve-container/stable 3.0-13 all [upgradable from: 3.0-10]
pve-firewall/stable 4.0-8 amd64 [upgradable from: 4.0-7]
pve-ha-manager/stable 3.0-5 amd64 [upgradable from: 3.0-3]
pve-manager/stable 6.0-15 amd64 [upgradable from: 6.0-12]
qemu-server/stable 6.0-17 amd64 [upgradable from: 6.0-13]

My current pveversion (downgraded from buggy)
root@pve01:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.3.10-1-pve)
pve-manager: 6.0-12 (running version: 6.0-12/0a603350)
pve-kernel-5.3: 6.0-12
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-3
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-7
libpve-guest-common-perl: 3.0-2
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-9
pve-cluster: 6.0-7
pve-container: 3.0-10
pve-docs: 6.0-9
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-4
pve-ha-manager: 3.0-3
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-13
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 

Attachments

  • log.txt
    7 KB · Views: 7
  • strace.txt
    23.7 KB · Views: 3
  • Like
Reactions: fireon and sigo
I'm still getting this segfault on LXC post upgrade

Dec 3 15:57:40 sword systemd[1]: Starting PVE LXC Container: 107...
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 329 The container failed to start
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Dec 3 15:57:40 sword lxc-start[15552]: lxc-start: 107: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Dec 3 15:57:40 sword systemd[1]: pve-container@107.service: Control process exited, code=exited, status=1/FAILURE
Dec 3 15:57:40 sword systemd[1]: pve-container@107.service: Failed with result 'exit-code'.
Dec 3 15:57:40 sword systemd[1]: Failed to start PVE LXC Container: 107.
Dec 3 15:57:40 sword pvedaemon[15550]: command 'systemctl start pve-container@107' failed: exit code 1
Dec 3 15:57:40 sword kernel: [ 3234.588046] lxc-start[15562]: segfault at 50 ip 00007f7e8a656f8b sp 00007ffe43ae6f00 error 4 in liblxc.so.1.6.0[7f7e8a5fd000+8a000]
Dec 3 15:57:40 sword kernel: [ 3234.588073] Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 8>0 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b

Dec 3 15:57:40 sword pvedaemon[1585]: <kronvold@pam> end task UPID:sword:00003CBE:0004EF8D:5DE6DA54:vzstart:107:kronvold@pam: comman>d 'systemctl start pve-container@107' failed: exit code 1

I'm already running pve-container 3.0-14 with the patch mentioned earlier.

proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-15 (running version: 6.0-15/52b91481)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-4.15: 5.4-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-4.15.18-23-pve: 4.15.18-51
pve-kernel-4.15.18-17-pve: 4.15.18-43
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-12
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.0-9
pve-container: 3.0-14
pve-docs: 6.0-9
pve-edk2-firmware: 2.20191002-1
pve-firewall: 4.0-8
pve-firmware: 3.0-4
pve-ha-manager: 3.0-5
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-1
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

edit:
Fixed it... I'll leave this for future wanderers.

One of my zfs didn't mount... had some other zfs strangeness and it all cleared up after I unmounted each one at a time, cleared out the mount points and remounted them with zfs mount -a (which recreated the mountpoints)

Is it odd that this caused the same segfault error?
 
Last edited:
Same Problem here, stopped the CT, never started again, CentOS8
Code:
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 329 The container failed to start
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Jän 29 20:52:48 virtu01 lxc-start[20111]: lxc-start: 103: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Jän 29 20:52:48 virtu01 systemd[1]: pve-container@103.service: Control process exited, code=exited, status=1/FAILURE
Jän 29 20:52:48 virtu01 systemd[1]: pve-container@103.service: Failed with result 'exit-code'.
Jän 29 20:52:48 virtu01 pvedaemon[24263]: unable to get PID for CT 103 (not running?)
Jän 29 20:52:48 virtu01 systemd[1]: Failed to start PVE LXC Container: 103.
Jän 29 20:52:48 virtu01 kernel: lxc-start[20122]: segfault at 50 ip 00007f0bbf289f8b sp 00007ffc986076d0 error 4 in liblxc.so.1.6.0[7f0bbf230000+8a000]
Jän 29 20:52:48 virtu01 kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b
Jän 29 20:52:48 virtu01 pvedaemon[20109]: command 'systemctl start pve-container@103' failed: exit code 1
 
Can you please try to start this one in the foreground with debug log file and post that here?
Code:
lxc-start -n 103 -F -l DEBUG -o /tmp/ct-103.log

CT config itself could also be nice to have.

@wbumiller any idea?
 
I have same problem:

Code:
-- The unit pve-container@119.service has entered the 'failed' state with result 'exit-code'.
Jan 31 22:09:57 proxmox.s-dev.work systemd[1]: Failed to start PVE LXC Container: 119.
-- Subject: A start job for unit pve-container@119.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-container@119.service has finished with a failure.
--
-- The job identifier is 3808 and the job result is failed.
Jan 31 22:09:57 proxmox.s-dev.work kernel: lxc-start[15696]: segfault at 50 ip 00007f467a08ef8b sp 00007fffaca46f80 error 4 in liblxc.so.1.6.0[7f467a035000+8a000]
Jan 31 22:09:57 proxmox.s-dev.work kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 d
Jan 31 22:10:00 proxmox.s-dev.work systemd[1]: Starting Proxmox VE replication runner...
 
I'm attach logs too. Maybe any body know how fix this error?
 

Attachments

  • lxc-129.log
    6 KB · Views: 3
  • lxc-116.log
    6.2 KB · Views: 2
hi,

I'm attach logs too. Maybe any body know how fix this error?

unable to detect OS distribution can mean the following things:

- the container doesn't have the file(s) for us to detect the distribution (deleted/missing file)
- container rootfs may be corrupted
- your volume for the container is not mounted
 
Try this:

in /lib/systemd/system/zfs-mount.service

change ExecStart=/sbin/zfs mount -a
to ExecStart=/sbin/zfs mount -O -a

then restart proxmox.
Thanks, but the CT was new created, the old one is deleted.
 
I have similar problem. Supermicro H11SSL + Amd epyc 7551, zfs raidz. Using KVM and LXC. When I power off and power on server, lxc containers can't boot because filesystem is not mounted (zfs). But when I reboot, filesystems are mounted correctly and all is fine. I'm not sure when exactly this problem started (production server), but it's quite strange and annoying.

=>
change ExecStart=/sbin/zfs mount -a
to ExecStart=/sbin/zfs mount -O -a


I didn't try this, I think it can be dangerous.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!