ZFS Random Crash

J.Carlos

Member
Oct 23, 2017
21
2
8
32
Hi, im having some troubles with zfs, have three nodes and two of them are getting randoms reboots.

When it crash i cant see nothing on syslog.

The system have:
AMD EPYC 7401P 24-Core
128 GB DDR4 ECC RAM
2 x 960 GB NVMe Gen3 x4 Data Center Series RAID 1 ZFS


pveversion -v
proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.13-5-pve: 4.13.13-38
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-20
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-6
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.4-pve2~bpo9

/etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=8589934592
options zfs zfs_arc_min=4294967296

/etc/sysctl.conf
vm.swappiness = 10

Is there some log for zfs? What can be the problem?

Thanks in advance.
 
I have too this adjustments

zfs set primarycache=none rpool/swap
zfs set secondarycache=none rpool/swap
zfs set compression=off rpool/swap
zfs set sync=disabled rpool/swap
zfs set logbias=throughput rpool/swap
zfs set checksum=off rpool/swap

And swap partition disable on fstab.
 
Hi,

do you get any hints from the logs?

If the logs do not show anything you can enable core dump.

https://pve.proxmox.com/wiki/Enable_Core_Dump_systemd

Hi Wolfgang, the logs are not showing anything.

Apr 11 09:29:00 px1 systemd[1]: Starting Proxmox VE replication runner...
Apr 11 09:29:01 px1 systemd[1]: Started Proxmox VE replication runner.
Apr 11 09:30:00 px1 systemd[1]: Starting Proxmox VE replication runner...
Apr 11 09:30:09 px1 systemd[1]: Started Proxmox VE replication runner.
Apr 11 09:31:00 px1 systemd[1]: Starting Proxmox VE replication runner...
Apr 11 09:31:01 px1 systemd[1]: Started Proxmox VE replication runner.
Apr 11 09:32:00 px1 systemd[1]: Starting Proxmox VE replication runner...
Apr 11 09:32:01 px1 systemd[1]: Started Proxmox VE replication runner.
Apr 11 09:33:00 px1 systemd[1]: Starting Proxmox VE replication runner...
Apr 11 09:33:01 px1 systemd[1]: Started Proxmox VE replication runner.
Apr 11 09:34:00 px1 systemd[1]: Starting Proxmox VE replication runner...
Apr 11 09:34:01 px1 systemd[1]: Started Proxmox VE replication runner.
Apr 11 09:36:28 px1 systemd-modules-load[3145]: Inserted module 'dummy'
Apr 11 09:36:28 px1 kernel: [ 0.000000] random: get_random_bytes called from start_kernel+0x42/0x$
Apr 11 09:36:28 px1 kernel: [ 0.000000] Linux version 4.13.13-5-pve (root@nora) (gcc version 6.3.$
Apr 11 09:36:28 px1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.13.1$
Apr 11 09:36:28 px1 systemd-modules-load[3145]: Inserted module 'iscsi_tcp'
Apr 11 09:36:28 px1 kernel: [ 0.000000] KERNEL supported cpus:
Apr 11 09:36:28 px1 kernel: [ 0.000000] Intel GenuineIntel
Apr 11 09:36:28 px1 kernel: [ 0.000000] AMD AuthenticAMD
Apr 11 09:36:28 px1 kernel: [ 0.000000] Centaur CentaurHauls
Apr 11 09:36:28 px1 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating po$
Apr 11 09:36:28 px1 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Apr 11 09:36:28 px1 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Apr 11 09:36:28 px1 kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
Apr 11 09:36:28 px1 kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x7, contex

I going to try your suggestion, thanks!