ZFS mount on start problem - segfault at 0 error 4 in libc-2.28.so, subvolumes not mounted

Dacesilian

Member
Apr 15, 2020
23
3
23
There are multiple segfaults in syslog during boot and subvolumes aren't mounted properly. In fact, after boot, my zpool mountpoints aren't there and appear after a while, but are not mounted - only empty parent folder exists.

Workaround
I've tried many configuration changes from other threads, but nothing helped so far.

Only workaround is to zfs mount <mountpoint_path> one by one, then start containers.
Code:
for i in `zfs list|sed 1d|grep -e '^tank/container/subvol'|awk '{print $1}'`; do zfs mount $i ; done
for i in `pct list|sed 1d|grep -e 'stopped'|awk '{print $1}'`; do pct start $i ; done


What I've tried, but no success
I've tried tip from linked thread to disable ZFS cache mount and use zfs-import-scan.service. I've completely disabled cache in /etc/default/zfs.
I've even loaded Live CD Debian and deleted all "mountpoints folders" to omit conflicts with existing folders.


Can you please tell me, what can I try now? I want my pools to mount at startup and start all VMs automatically. It was working fine, but now it's not.
Please let me know if you need more information.


Errors (during boot):

1586985155823.png
1586985240984.png

When starting container:

1586986173142.png

My configuration:

Code:
# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-8
pve-kernel-5.3: 6.1-6
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-7
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

# zfs --version
zfs-0.8.3-pve1
zfs-kmod-0.8.3-pve1

/etc/default/zfs:
Code:
ZFS_MOUNT='yes'
ZFS_UNMOUNT='yes'
ZFS_SHARE='yes'
ZFS_UNSHARE='yes'
ZPOOL_IMPORT_ALL_VISIBLE='no'
ZPOOL_IMPORT_PATH="/dev/disk/by-id"
ZFS_POOL_IMPORT="rpool"
VERBOSE_MOUNT='no'
DO_OVERLAY_MOUNTS='no'
ZPOOL_IMPORT_OPTS="-o cachefile=none"
ZPOOL_CACHE="none"
MOUNT_EXTRA_OPTIONS=""
ZFS_DKMS_ENABLE_DEBUG='no'
ZFS_DKMS_ENABLE_DEBUGINFO='no'
ZFS_DKMS_DISABLE_STRIP='no'
ZFS_INITRD_PRE_MOUNTROOT_SLEEP='0'
ZFS_INITRD_POST_MODPROBE_SLEEP='0'

Thank you!
 
hmm - a segfault in zfs sounds odd - could you please:
* run a scrub on your pool
* run debsums to verify that all packages are installed correctly)
* run memtest for a good while on the box? (last time I had problems with strange segfaults in shipped binaries it turned out to be a broken ram-stick)

I hope this helps!
 
Scrub is running now on last pool (17 TB, 0,50 % done), two others are okay:
Code:
pool: nvme
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:04:08 with 0 errors on Thu Apr 16 11:02:37 2020

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:13 with 0 errors on Thu Apr 16 10:58:39 2020


debsums - packages are okay:
Code:
# debsums -s
root@supernas:~# echo $?
0


I will run memtest after scrub, but I'm skeptical - segfault happens always, I've rebooted many times.
 
Scrub has been completed on all three pool with no errors.

Don't you have any other tip? Isn't there any error in recent package updates?
 
* run a scrub on your pool
* run debsums to verify that all packages are installed correctly)
* run memtest for a good while on the box? (last time I had problems with strange segfaults in shipped binaries it turned out to be a broken ram-stick)

Hello, I've done all suggested points, memtest passed without errors, but still the same behaviour - ZFS folders are not mounting on start.

Is there anything to do, please?

Current syslog lines:

Code:
Apr 20 07:12:27 supernas kernel: [   41.070493] audit: type=1400 audit(1587359547.386:19): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="/usr/bin/lxc-start" name="/tank/samba/" pid=6462 comm="mount.zfs" fstype="zfs" srcname="tank/samba" flags="rw, strictatime"
Apr 20 07:12:27 supernas kernel: [   41.081625]  zd144: p1
Apr 20 07:12:27 supernas kernel: [   41.083828] audit: type=1400 audit(1587359547.402:20): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="/usr/bin/lxc-start" name="/tank/container/" pid=6458 comm="mount.zfs" fstype="zfs" srcname="tank/container" flags="rw, strictatime"
Apr 20 07:12:27 supernas kernel: [   41.146484] audit: type=1400 audit(1587359547.462:21): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="/usr/bin/lxc-start" name="/tank/kvm/" pid=6457 comm="mount.zfs" fstype="zfs" srcname="tank/kvm" flags="rw, strictatime"
Apr 20 07:12:27 supernas kernel: [   41.169902]  zd160: p1
Apr 20 07:12:27 supernas kernel: [   41.231938]  zd176: p1 p2
Apr 20 07:12:27 supernas kernel: [   41.277892]  zd192: p1 p2 < p5 >
Apr 20 07:12:27 supernas zed: eid=15 class=config_sync pool_guid=0xDEAD7D0007C1F74B
Apr 20 07:12:27 supernas kernel: [   41.331359]  zd208: p1 p2
Apr 20 07:12:27 supernas kernel: [   41.366111]  zd224: p1
Apr 20 07:12:27 supernas kernel: [   41.427688]  zd240: p1 p2 < p5 >
Apr 20 07:12:27 supernas kernel: [   41.480135]  zd256: p1
Apr 20 07:12:27 supernas kernel: [   41.509283]  zd272: p1
Apr 20 07:12:27 supernas kernel: [   41.553050]  zd288: p1
Apr 20 07:12:27 supernas kernel: [   41.629006]  zd304: p1 p2 < p5 >
Apr 20 07:12:27 supernas kernel: [   41.665794]  zd320: p1
Apr 20 07:12:28 supernas kernel: [   41.729614]  zd336: p1
Apr 20 07:12:28 supernas kernel: [   41.784335]  zd352: p1 p2 p3
Apr 20 07:12:28 supernas kernel: [   41.829916]  zd368: p1 p2 p3
Apr 20 07:12:28 supernas kernel: [   41.878268]  zd384: p1 p2 < p5 >
Apr 20 07:12:28 supernas kernel: [   41.930991]  zd400: p1 p2 < p5 >
Apr 20 07:12:28 supernas kernel: [   41.965681]  zd416: p1 p2
Apr 20 07:12:28 supernas kernel: [   42.075020]  zd432: p1 p2 < p5 >
Apr 20 07:12:28 supernas lxc-start[4746]: lxc-start: 109: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Apr 20 07:12:28 supernas lxc-start[4746]: lxc-start: 109: tools/lxc_start.c: main: 329 The container failed to start
Apr 20 07:12:28 supernas lxc-start[4746]: lxc-start: 109: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Apr 20 07:12:28 supernas lxc-start[4746]: lxc-start: 109: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Apr 20 07:12:28 supernas systemd[1]: pve-container@109.service: Control process exited, code=exited, status=1/FAILURE
Apr 20 07:12:28 supernas systemd[1]: pve-container@109.service: Failed with result 'exit-code'.
Apr 20 07:12:28 supernas pvestatd[3367]: unable to get PID for CT 109 (not running?)
Apr 20 07:12:28 supernas systemd[1]: Failed to start PVE LXC Container: 109.
Apr 20 07:12:28 supernas kernel: [   42.206706] lxc-start[4750]: segfault at 50 ip 00007fcf08f01f8b sp 00007ffc4f8934c0 error 4 in liblxc.so.1.6.0[7fcf08ea8000+8a000]
Apr 20 07:12:28 supernas kernel: [   42.207764] Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b
Apr 20 07:12:28 supernas pve-guests[4744]: command 'systemctl start pve-container@109' failed: exit code 1
 
I've already mentioned that I've disabled mounting from cache and using zfs-import-scan service.

What root delay should do? Where is the problem?
 
I've already mentioned that I've disabled mounting from cache and using zfs-import-scan service.
the quoted post suggest to keep with import with zfs cache - but recreating the cache file (zpool set cachefile=/etc/zfs/zpool.cache tank) and updating the initramfs)

What root delay should do?
should your pool need longer to get imported it might happen that the containers get started before their subvolumes are mounted

you can check by rebooting and if the problem remains check the subvol of a container:

also taking a closer look at the posted logs - do you use lxc containers (pct) or qemu/kvm guests? (qm) or both?
asking because the output looks like there are a rather large number of zvols (which don't get mounted but are present as blockdevices in /dev)

please provide the output of `zpool status` and `zfs list` and the config of a affected guest (e.g. container 109)

additionally please post the log since booting (anonymize what you need to anonymize!) - `journalctl -b`

EDIT: added the request for the journal
 
This is weird:
Code:
root@supernas:/tank/container# zfs mount tank/container/subvol-109-disk-0
cannot mount 'tank/container/subvol-109-disk-0': filesystem already mounted

root@supernas:/tank/container# zfs unmount tank/container/subvol-109-disk-0
umount: /tank/container/subvol-109-disk-0: not mounted.
cannot unmount '/tank/container/subvol-109-disk-0': umount failed

root@supernas:/tank/container# ls -lh subvol-109-disk-0/
total 512
drwxr-xr-x 2 root root 2 2020-04-20 18:11:52.493377025 +0200 dev

root@supernas:/tank/container# ls -lh subvol-109-disk-0/dev/
total 0

dev folder is beeing created automagically, maybe during container start.

This is also not good:
Code:
root@supernas:/tank/container# zfs get mountpoint tank/container/subvol-109-disk-0
NAME                              PROPERTY    VALUE                              SOURCE
tank/container/subvol-109-disk-0  mountpoint  /tank/container/subvol-109-disk-0  inherited from tank/container

root@supernas:/tank/container# zfs mount tank/container/subvol-109-disk-0
cannot mount 'tank/container/subvol-109-disk-0': filesystem already mounted

root@supernas:/tank/container# zfs umount tank/container/subvol-109-disk-0
umount: /tank/container/subvol-109-disk-0: no mount point specified.
cannot unmount '/tank/container/subvol-109-disk-0': umount failed

root@supernas:/tank/container# zfs unmount tank/container/subvol-109-disk-0
umount: /tank/container/subvol-109-disk-0: no mount point specified.
cannot unmount '/tank/container/subvol-109-disk-0': umount failed

root@supernas:/tank/container# ls -lh subvol-109-disk-0
ls: cannot access 'subvol-109-disk-0': No such file or directory

There is some serious problem with ZFS. I think it's not possible to downgrade ZFS to 0.8.2?

zpool status is all 3 pool ONLINE, without errors

In containers I have line lxc.mount.entry: /dev/net/tun dev/net/tun none bind,create=file. But I think problem is ZFS is not mounting.

Are you able to mount the subvolumes manually after boot?
I would say no. It is possible somehow, but not working reliable (now I have part of subvolumes mounted, but you see above that behaviour - unable to mount others).

there the cause seems to be a non-empty mountpoint
This is not the case. I've deleted all directories with Live CD, but same segfaults during boot.
 
Last edited:
* please try setting a cache file and updating initramfs and rebooting afterwards (this helped most users with similar issues)

* did you change any ZFS related systemd service files?

* does the problem persist if you don't start containers on boot but afterwards (only to pinpoint the reason for the problems)

This is not the case. I've deleted all directories with Live CD, but same segfaults during boot.
The /dev/ directories get created by lxc on every container start - removing the directories with a live-cd and rebooting to the system on disk will not help
 
I've already tried using cache file a few days ago - didn't help. There was some problem with disappearing cache file, so I've also tried some hack to backup&restore cache file - not worked.

did you change any ZFS related systemd service files?
I'm not aware of it. But I have to say that updating kernel and ZFS is always a big surprise, packages go crazy and it's a lot of work to get running system (now with errors).

All this started when I've moved root pool to another drives.

does the problem persist if you don't start containers on boot but afterwards
I think problem is that ZFS is not mounting correctly.

I've tried to stop automounting, but I wasn't able to stop it - ZFS is trying to import pool (and mount subvolumes) all the time.

removing the directories with a live-cd and rebooting to the system on disk will not help
I've deleted whole /tank/container directory, of course:) Just to be sure that no subvolume directory exists. I've also enables overlay-mount - nothing helped.

Weird is that I haven't experienced any error before root pool move, but maybe I've done update also, so I think kernel&ZFS version is a problem.
I don't know what to do now.
 
I don't know what to do now.
The answer lies in what your ultimate goal is. If its to regain operation, the easiest solution is to blow everything away, reinstall proxmox, create a new zpool, and restore your containers from backup. you do have a backup... ?

If you're more interested in figuring out what went wrong, I'd start by booting a livecd/recovery env that has zfs support, mount the pool, and investigate whats what in a clean environment.
 
I've compiled ZFS from sources, branch latest release 0.8.3.
Code:
sudo apt install build-essential autoconf automake libtool gawk alien fakeroot dkms libblkid-dev uuid-dev libudev-dev libssl-dev zlib1g-dev libaio-dev libattr1-dev libelf-dev pve-headers-$(uname -r) python3 python3-dev python3-setuptools python3-cffi libffi-dev

git clone --single-branch --branch zfs-0.8.3 https://github.com/zfsonlinux/zfs zfs-git
cd zfs-git
git log > log.txt
git diff-tree -p 95fcb04215015950b3388ba0a6edad8e1b463415
git diff-tree --no-commit-id --name-only -r 95fcb04215015950b3388ba0a6edad8e1b463415
git revert a9cd8bfde73a78a0ba02e25b712fe28d11019191

sh autogen.sh
./configure
make -s -j$(nproc)
make -j$(nproc) deb

apt remove kmod-zfs-devel libnvpair1 libuutil1 libzfs2-devel libzfs2 libzpool2 python3-pyzfs zfs-dkms zfs-dracut zfs-initramfs zfs-test zfs libzfs2linux zfsutils-linux zfs-zed

for file in *.deb; do sudo dpkg -i $file; done

dpkg -i --force-overwrite zfs-dkms_0.8.3-1_amd64.deb

zfs-dkms has conflict with kmod-zfs-devel, but it's working with --force-overwrite.

Problem
Only nvme pool is mounting. Other pool (tank) is not mounting on boot with error:
Code:
Apr 21 10:47:00 supernas audit[5877]: AVC apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="/usr/bin/lxc-start" name="/tank/container/" pid=5877 comm="mount.zfs" fstype="zfs" srcname="tank/container" flags="rw, strictatime"

zfs mount -a works and mounts all subvolumes correctly.

Is it possible to solve apparmor denied somehow? Thank you.
 
Hi,
I think I have the same problem.

I just started with proxmox 3-4 days ago. I hadn't enabled start at boot until a CT was fully configured.
Everything was working fine, starting the CT manually ok.
Now that a CT is ok, I enabled auto-start, and after rebooting the server, I see that it didn't start automatically.
Job for pve-container@101.service failed because the control process exited with error code.
See "systemctl status pve-container@101.service" and "journalctl -xe" for details.
TASK ERROR: command 'systemctl start pve-container@101' failed: exit code 1

I can't start it manually either, same error.

I have this error on dmesg:
[ 22.215472] audit: type=1400 audit(1587481964.814:15): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="/usr/bin/lxc-start" name="/Data_zfs/" pid=1769 comm="mount.zfs" fstype="zfs" srcname="Data_zfs" flags="rw, strictatime"
[ 22.263039] lxc-start[1294]: segfault at 50 ip 00007fd461006f8b sp 00007fff10eb7720 error 4 in liblxc.so.1.6.0[7fd460fad000+8a000]
And this with journal -b

avril 21 17:36:22 SRV-Virtu pvedaemon[5894]: starting CT 101: UPID:SRV-Virtu:00001706:0002329F:5E9F12F6:vzstart:101:root@pam:
avril 21 17:36:22 SRV-Virtu systemd[1]: Starting PVE LXC Container: 101...
avril 21 17:36:23 SRV-Virtu lxc-start[5896]: lxc-start: 101: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
avril 21 17:36:23 SRV-Virtu lxc-start[5896]: lxc-start: 101: tools/lxc_start.c: main: 329 The container failed to start
avril 21 17:36:23 SRV-Virtu lxc-start[5896]: lxc-start: 101: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
avril 21 17:36:23 SRV-Virtu lxc-start[5896]: lxc-start: 101: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
avril 21 17:36:23 SRV-Virtu systemd[1]: pve-container@101.service: Control process exited, code=exited, status=1/FAILURE
avril 21 17:36:23 SRV-Virtu systemd[1]: pve-container@101.service: Failed with result 'exit-code'.
avril 21 17:36:23 SRV-Virtu systemd[1]: Failed to start PVE LXC Container: 101.
avril 21 17:36:23 SRV-Virtu pvedaemon[5894]: command 'systemctl start pve-container@101' failed: exit code 1
avril 21 17:36:23 SRV-Virtu kernel: lxc-start[5907]: segfault at 50 ip 00007f69c50c5f8b sp 00007ffe52907600 error 4 in liblxc.so.1.6.0[7f69c506c000+8a000]
avril 21 17:36:23 SRV-Virtu kernel: Code: 9b c0 ff ff 4d 85 ff 0f 85 82 02 00 00 66 90 48 8b 73 50 48 8b bb f8 00 00 00 e8 80 78 fa ff 4c 8b 74 24 10 48 89 de 4c 89 f7 <41> ff 56 50 4c 89 f7 48 89 de 41 ff 56 58 48 8b 83 f8 00 00 00 8b
avril 21 17:36:23 SRV-Virtu pvedaemon[1267]: <root@pam> end task UPID:SRV-Virtu:00001706:0002329F:5E9F12F6:vzstart:101:root@pam: command 'systemctl start pve-container@101' failed: exit code 1
When I check the dataset zfs are up with the command zfs list -r -o name,mountpoint,mounted:

Code:
NAME                        MOUNTPOINT                   MOUNTED
Data_zfs                    /Data_zfs                         no
Data_zfs/subvol-100-disk-0  /Data_zfs/subvol-100-disk-0       no
Data_zfs/subvol-101-disk-1  /Data_zfs/subvol-101-disk-1       no

I need to do zfs mount -a and after that i can start container.

If i disable auto start and restart the server, the manual start of the container will work.

edit:
* please try setting a cache file and updating initramfs and rebooting afterwards (this helped most users with similar issues)
That fix the CT auto start for me, no more segfault
Code:
zpool set cachefile=/etc/zfs/zpool.cache POOLNAME
 
Last edited:
hm - one issue in the openzfs github tracker seems similar: https://github.com/openzfs/zfs/issues/9560
there the cause seems to be a non-empty mountpoint

I can confirm I just saw this on a server with two pools. One (local-zfs) for boot and a few critical containers, and one (Pool1) for everything else. After an upgrade from latest 5.x to 6.2-10 containers on local-zfs would start, those on Pool1 didn't. VMs on Pool1 came up fine. It was obvious on inspection that the relevant zfs datasets for the containers weren't mounting. 'mount -a' segfaulted as above.

I mucked around with cache files to no avail. However, after finding the above I worked around the problem. There were for some reason some remnants of directories/files from old datasets in the root of the Pool1 dataset, which I deleted. Some of the containers were set to autostart, and it seems that when this happened Proxmox had also created the root dir and some subdirs/files for the containers on the Pool1 dataset before startup (obviously) failed - e.g. at/below /Pool1/proxmox-vmdisks/subvol-120-disk-1/.. . These also needed deleting - would be nice if Proxmox could check that the mount is there as expected before creating them.

Having done that and rebooted, with the mount point empty again, everything came up as normal.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!