[SOLVED] PVE 6.1: ZFS Segfault upon system boot

Hi, upon booting the system (only during boot so far), I get a segfault in ZFS:

Code:
[   17.905941] ZFS: Loaded module v0.8.3-pve1, ZFS pool version 5000, ZFS filesystem version 5
[...][   20.804544] zfs[4387]: segfault at 0 ip 00007f565ddde694 sp 00007f5656ff5420 error 4
[   20.804546] zfs[4379]: segfault at 0 ip 00007f565deba01c sp 00007f565d509478 error 4
[   20.804547]  in libc-2.28.so[7f565dd84000+148000]
[   20.805298]  in libc-2.28.so[7f565dd84000+148000]
[   20.805846] Code: 29 f2 41 ff 55 70 48 85 c0 7e 3b 48 8b 93 90 00 00 00 48 01 43 10 48 83 fa ff 74 0a 48 01 d0 48 89 83 90 00 00 00 48 8b 43 08 <0f> b6 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f
[   20.806365] Code: 29 c8 c5 f8 77 c3 0f 1f 84 00 00 00 00 00 48 85 d2 0f 84 5a 02 00 00 89 f9 c5 f9 6e c6 c4 e2 7d 78 c0 83 e1 3f 83 f9 20 77 44 <c5> fd 74 0f c5 fd d7 c1 85 c0 0f 85 c4 01 00 00 48 83 ea 20 0f 86

I haven't yet noticed any data corruption, but who knows.
 
Hi,

please send the output of.

Code:
pveversion -v
 
Sure, here you go:

Code:
# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-22
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
I do not see this here on my Servers.
Do you use the ZFS as rootfs?
What HW do you use with ZFS(SSD/HDD, CPU, ...) ?
 
Yes, I haven't found out, yet, either, how to reproduce this bug with certainty. It's just weird that this is happening from time to time, and gives me a bit of a bad feeling.

The machine here is an EPYC 7502P on a TYAN platform with 256 GB RAM. It uses NVME and SATA SSDs. Boot is from two mirrored (ZFS, I had meant to use md, but this is not officially supported) Kingston DC1000B SSDs. There are also two mirrored P4800X, 6 Micron 9300 Max in striped mirrors, and 4 SATA SSDs (Samsung, Kingston) in a striped mirror in the system.
 
I can't reproduce it here but my setup is quite different.
Please try the pve-kernel-5.4 maybe it solve it.
If not I can hopefully test is next week with a similar HW.
 
@wolfgang: I'm sorry that I have to report that the segfault upon boot is back.

Code:
#dmesg | grep zfs
[   23.442792] traps: zfs[10790] general protection fault ip:7fbe057054a6 sp:7fbdf7ff7310 error:0 in libc-2.28.so[7fbe056a3000+148000]
[   23.462145] systemd[1]: zfs-mount.service: Main process exited, code=killed, status=11/SEGV
[   23.485228] systemd[1]: zfs-mount.service: Failed with result 'signal'.

pveversion -v:
Code:
proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-2
pve-kernel-helper: 6.2-2
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.27-1-pve: 5.4.27-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-6
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
 
This is not the same class of errors.

The error comes now from the zfs-mount.service.
Is there more information in journald?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!