Proxmox 5.4 stops to work (zfs issue?)

zeuxprox

Renowned Member
Dec 10, 2014
92
5
73
Hi,
I have a single node with proxmox 5.4-13, and tonight it stops to work. I had to hard reboot the node...
I have 3 zfs pool (one for proxmox in raid 1, one for my HDD disks in raidz2 and one for my SSD disks in raidz2) and all the pools are online and scrub is ok.

pve version is:

Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-24-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-12
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-41
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2


Tonight, and this is the second time in 20 days, all VMs (a mix of Linux and Windows) became unresponsive and I cannot stop them. The only thing that I can do is hard reboot the node. In kernel logs, and on the monitor, I have:

Code:
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021318] ------------[ cut here ]------------
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021320] kernel BUG at mm/slub.c:296!
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021350] invalid opcode: 0000 [#1] SMP PTI
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021368] Modules linked in: veth tcp_diag inet_diag ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_tcpudp xt_comment xt_addrtype xt_multiport xt_conntrack xt_s
et xt_mark ip_set_hash_net ip_set iptable_filter softdog openvswitch nsh nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c nfnetlink_log nfnetlink 8021q garp mrp ipmi_ssif intel_rapl skx_edac x86_pkg_t
emp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate snd_pcm intel_rapl_perf snd_timer snd soundcore ast ttm pcspkr drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021627]  sysfillrect sysimgblt joydev input_leds lpc_ich shpchp mei_me mei ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi sunrpc scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid ixgbe mdio igb(O) dca mpt3sas raid_class scsi_transport_sas i40e ptp pps_core i2c_i801 ahci libahci
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021812] CPU: 17 PID: 49577 Comm: z_wr_int_7 Tainted: P           O     4.15.18-21-pve #1
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021841] Hardware name: Supermicro SSG-6029P-E1CR12L/X11DPH-T, BIOS 2.1 06/15/2018
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021872] RIP: 0010:__slab_free+0x1a2/0x330
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021889] RSP: 0018:ffffb8a524cd7a70 EFLAGS: 00010246
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021908] RAX: ffff9484779a6f60 RBX: ffff9484779a6f60 RCX: 00000001002a0029
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021933] RDX: ffff9484779a6f60 RSI: ffffe57cbede6980 RDI: ffff9464bf407600
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021957] RBP: ffffb8a524cd7b10 R08: 0000000000000001 R09: ffffffffc01b9c2c
Jan 22 00:32:42 dt-prox1 kernel: [1408979.021981] R10: ffffb8a524cd7b30 R11: 0000000000000000 R12: ffff9484779a6f60
Jan 22 00:32:42 dt-prox1 kernel: [1408979.022006] R13: ffff9464bf407600 R14: ffffe57cbede6980 R15: ffff9484779a6f60
Jan 22 00:32:42 dt-prox1 kernel: [1408979.022917] FS:  0000000000000000(0000) GS:ffff9484beec0000(0000) knlGS:0000000000000000
Jan 22 00:32:42 dt-prox1 kernel: [1408979.023725] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 22 00:32:42 dt-prox1 kernel: [1408979.024512] CR2: 00005643be998248 CR3: 00000006ea20a005 CR4: 00000000007626e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.025508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 22 00:32:42 dt-prox1 kernel: [1408979.026512] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 22 00:32:42 dt-prox1 kernel: [1408979.027670] PKRU: 55555554
Jan 22 00:32:42 dt-prox1 kernel: [1408979.028637] Call Trace:
Jan 22 00:32:42 dt-prox1 kernel: [1408979.029555]  ? __update_load_avg_blocked_se.isra.36+0xd1/0x150
Jan 22 00:32:42 dt-prox1 kernel: [1408979.030529]  ? __mutex_lock.isra.5+0x474/0x500
Jan 22 00:32:42 dt-prox1 kernel: [1408979.031455]  ? ttwu_do_wakeup+0x1e/0x140
Jan 22 00:32:42 dt-prox1 kernel: [1408979.032393]  kmem_cache_free+0x1af/0x1e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.033237]  ? kmem_cache_free+0x1af/0x1e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.034098]  spl_kmem_cache_free+0x13c/0x1c0 [spl]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.034875]  arc_hdr_destroy+0xa7/0x1b0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.035688]  arc_freed+0x69/0xc0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.036467]  zio_free_sync+0x41/0x100 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.037251]  zio_free+0x90/0xd0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.037997]  dsl_free+0x11/0x20 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.038727]  dsl_dataset_block_kill+0x257/0x490 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.039432]  ? kmem_cache_free+0x1af/0x1e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.040143]  ? kmem_cache_free+0x1af/0x1e0
Jan 22 00:32:42 dt-prox1 kernel: [1408979.040870]  dbuf_write_done+0x162/0x1b0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.041562]  arc_write_done+0x86/0x3f0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.042225]  zio_done+0x2d0/0xe60 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.042839]  ? kfree+0x165/0x180
Jan 22 00:32:42 dt-prox1 kernel: [1408979.043509]  ? spl_kmem_free+0x33/0x40 [spl]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.044228]  zio_execute+0x95/0xf0 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.044904]  taskq_thread+0x2ae/0x4d0 [spl]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.045474]  ? wake_up_q+0x80/0x80
Jan 22 00:32:42 dt-prox1 kernel: [1408979.046051]  ? zio_reexecute+0x390/0x390 [zfs]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.046591]  kthread+0x105/0x140
Jan 22 00:32:42 dt-prox1 kernel: [1408979.047111]  ? task_done+0xb0/0xb0 [spl]
Jan 22 00:32:42 dt-prox1 kernel: [1408979.047613]  ? kthread_create_worker_on_cpu+0x70/0x70
Jan 22 00:32:42 dt-prox1 kernel: [1408979.048178]  ret_from_fork+0x35/0x40
Jan 22 00:32:42 dt-prox1 kernel: [1408979.048702] Code: ff ff ff 75 5d 48 83 bd 68 ff ff ff 00 0f 84 b9 fe ff ff 48 8b b5 60 ff ff ff 48 8b bd 68 ff ff ff e8 63 85 79 00 e9 a1 fe ff ff <0f> 0b 80 4d ab 80 4c 8b 45 88 31 d2 4c 8b 4d a8 4c 89 f6 e8 66
Jan 22 00:32:42 dt-prox1 kernel: [1408979.049735] RIP: __slab_free+0x1a2/0x330 RSP: ffffb8a524cd7a70
Jan 22 00:32:42 dt-prox1 kernel: [1408979.050243] ---[ end trace 1503c118398c1e08 ]---


I think is a zfs problem, but what happened?

Help me please...

Thak you very much
 
Last edited:
I think is a zfs problem, but what happened?

Cannot tell immediate, it could had been a memory error, that could at least explain a invalid instruction opcode.

Other things to check:
* ensure latest updates are installed (you run an older kernel than available, at time of writing the newest for PVE 5.4 would be 4.15.18-52 (ABI 4.15.18-24).
* install the intel microcode package, add Debians "non-free" repo and do apt install intel-microcode
* check for a newer BIOS/UEFI update for the supermicro board.
 
Thank you t.lamprecht, I'll do what you suggested tomorrow night.

Now I see this message on the screen:

Code:
[ 1448.513043] kvm [54271]: vcpu1, guest rIP: 0xfffff80250fb6582 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1 nop

while in kernlog:

Code:
kernel: [ 7630.723176] perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79750

I am very worried

Thank you
 
kernel: [ 7630.723176] perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79750

This one is nothing to worry about, it's just about the kernel trace/perf subsystem for performance and debugging measurements.
If there are some load spikes happening on a CPU this will often show up, but nothing to worry.

[ 1448.513043] kvm [54271]: vcpu1, guest rIP: 0xfffff80250fb6582 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1 nop
Are there Windows VMs running? As the Windows OS and some software running on it, often probes the Machine Specific Registers (MSR) this kernel log message can happen - the KVM module just tells you that a guest tried to probe a non existent register.

You could either try another CPU model for the VM (what are you using now?) or try adding options kvm ignore_msrs=1 report_ignored_msrs=0 to /etc/modprobe.d/kvm.conf as a workaround.
 
I forget to specify that in the GUI, when I had the problem the IO delay was high, about 18%. Now, that the server has not problems the IO delay is 0.05% - 0.4%.

zpool iostat
Code:
 capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
HDD-Pool    1.69T  52.8T     41    141   246K  3.30M
SSD-Pool     102G  2.51T     60    439   304K  10.7M
rpool       2.81G   114G      0     24  5.38K   350K
----------  -----  -----  -----  -----  -----  -----

while zpool status:

Code:
pool: HDD-Pool
 state: ONLINE
 
config:

    NAME                                               STATE     READ WRITE CKSUM
    HDD-Pool                                           ONLINE       0     0     0
      raidz2-0                                         ONLINE       0     0     0
        scsi-35000cca26ac15b54                         ONLINE       0     0     0
        scsi-35000cca26ad40cbc                         ONLINE       0     0     0
        scsi-35000cca26ac4cfdc                         ONLINE       0     0     0
        scsi-35000cca26abed87c                         ONLINE       0     0     0
        scsi-35000cca26ad81e74                         ONLINE       0     0     0
        scsi-35000cca26ace9470                         ONLINE       0     0     0
    logs
      nvme-eui.334842304b3049160025384100000004-part1  ONLINE       0     0     0
    cache
      nvme-eui.334842304b3049160025384100000004-part3  ONLINE       0     0     0

errors: No known data errors

  pool: SSD-Pool
 state: ONLINE
  scan: scrub repaired 0B in 0h2m with 0 errors on Wed Jan 22 05:27:51 2020
config:

    NAME                                               STATE     READ WRITE CKSUM
    SSD-Pool                                           ONLINE       0     0     0
      raidz2-0                                         ONLINE       0     0     0
        ata-SAMSUNG_MZ7KM480HMHQ-00005_S3F4NX0K604708  ONLINE       0     0     0
        ata-SAMSUNG_MZ7KM480HMHQ-00005_S3F4NX0K604709  ONLINE       0     0     0
        ata-SAMSUNG_MZ7KM480HMHQ-00005_S3F4NX0K604707  ONLINE       0     0     0
        ata-SAMSUNG_MZ7KM480HMHQ-00005_S3F4NX0K604677  ONLINE       0     0     0
        ata-SAMSUNG_MZ7KM480HMHQ-00005_S3F4NX0K604706  ONLINE       0     0     0
        ata-SAMSUNG_MZ7KM480HMHQ-00005_S3F4NX0K604672  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0h0m with 0 errors on Wed Jan 22 05:28:32 2020
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdm2    ONLINE       0     0     0
        sdn2    ONLINE       0     0     0

errors: No known data errors

thank you to all...
 
Hi,

as suggested by t.lamprecht, I installed the intel microcode, updated the Bios and Porxomox with apt dist-upgrade but my running kernel is still 4.15.18-24-pve and not 4.15.18-52 as you can see from pveversion -v

Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-24-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-12
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-41
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-55
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

Why?

Thank
 
Hi,

today the problem occurred again and I had to restart the server. As I wrote in my previous post I have update the Bios to the latest version available and also Proxmox is up to date. The only strange thing is that my kernel is 4.15.18-24-pve and not 4.15.18-52-pve as suggested by t.lamprecht. My repository is enterprise and not no-subscription. The problem was exactly the same:

This is wath is logged in my last /var/log/kern.log

Code:
Feb  5 12:32:44 dt-prox1 kernel: [908564.740252] ------------[ cut here ]------------
Feb  5 12:32:44 dt-prox1 kernel: [908564.740254] kernel BUG at mm/slub.c:296!
Feb  5 12:32:44 dt-prox1 kernel: [908564.740285] invalid opcode: 0000 [#1] SMP PTI
Feb  5 12:32:44 dt-prox1 kernel: [908564.740301] Modules linked in: veth tcp_diag inet_diag ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_tcpudp xt_comment xt_addrtype xt_multiport xt_conntrack xt_set xt_mark ip_set_hash_net ip_set iptable_filter softdog openvswitch nsh nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c nfnetlink_log nfnetlink 8021q garp mrp ipmi_ssif intel_rapl skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf ast ttm pcspkr drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt joydev input_leds
Feb  5 12:32:44 dt-prox1 kernel: [908564.740555]  lpc_ich shpchp mei_me ioatdma mei wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid mpt3sas ixgbe raid_class igb(O) mdio dca scsi_transport_sas i40e ptp pps_core i2c_i801 ahci libahci
Feb  5 12:32:44 dt-prox1 kernel: [908564.740747] CPU: 31 PID: 19199 Comm: z_wr_int_6 Tainted: P           O     4.15.18-24-pve #1
Feb  5 12:32:44 dt-prox1 kernel: [908564.740775] Hardware name: Supermicro SSG-6029P-E1CR12L/X11DPH-T, BIOS 3.2 10/22/2019
Feb  5 12:32:44 dt-prox1 kernel: [908564.740805] RIP: 0010:__slab_free+0x1a2/0x330
Feb  5 12:32:44 dt-prox1 kernel: [908564.741861] RSP: 0018:ffff9acd8f06fa70 EFLAGS: 00010246
Feb  5 12:32:44 dt-prox1 kernel: [908564.742862] RAX: ffff8b3f03e83420 RBX: ffff8b3f03e83420 RCX: 00000001002a001f
Feb  5 12:32:44 dt-prox1 kernel: [908564.743911] RDX: ffff8b3f03e83420 RSI: ffffd0dbb20fa0c0 RDI: ffff8b427f407600
Feb  5 12:32:44 dt-prox1 kernel: [908564.744872] RBP: ffff9acd8f06fb10 R08: 0000000000000001 R09: ffffffffc04b0c2c
Feb  5 12:32:44 dt-prox1 kernel: [908564.745911] R10: ffff9acd8f06fb30 R11: 0000000000000000 R12: ffff8b3f03e83420
Feb  5 12:32:44 dt-prox1 kernel: [908564.746896] R13: ffff8b427f407600 R14: ffffd0dbb20fa0c0 R15: ffff8b3f03e83420
Feb  5 12:32:44 dt-prox1 kernel: [908564.747999] FS:  0000000000000000(0000) GS:ffff8b427fe40000(0000) knlGS:0000000000000000
Feb  5 12:32:44 dt-prox1 kernel: [908564.749008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb  5 12:32:44 dt-prox1 kernel: [908564.749904] CR2: 00007f49dc5aa003 CR3: 0000003ef580a006 CR4: 00000000007626e0
Feb  5 12:32:44 dt-prox1 kernel: [908564.750631] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb  5 12:32:44 dt-prox1 kernel: [908564.751351] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb  5 12:32:44 dt-prox1 kernel: [908564.752059] PKRU: 55555554
Feb  5 12:32:44 dt-prox1 kernel: [908564.752761] Call Trace:
Feb  5 12:32:44 dt-prox1 kernel: [908564.753475]  ? avl_nearest+0x2b/0x30 [zavl]
Feb  5 12:32:44 dt-prox1 kernel: [908564.754571]  ? __mutex_lock.isra.5+0x474/0x500
Feb  5 12:32:44 dt-prox1 kernel: [908564.755533]  ? sched_clock+0x9/0x10
Feb  5 12:32:44 dt-prox1 kernel: [908564.756420]  kmem_cache_free+0x1af/0x1e0
Feb  5 12:32:44 dt-prox1 kernel: [908564.757267]  ? kmem_cache_free+0x1af/0x1e0
Feb  5 12:32:44 dt-prox1 kernel: [908564.758008]  spl_kmem_cache_free+0x13c/0x1c0 [spl]
Feb  5 12:32:44 dt-prox1 kernel: [908564.758788]  arc_hdr_destroy+0xa7/0x1b0 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.759548]  arc_freed+0x69/0xc0 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.760329]  zio_free_sync+0x41/0x100 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.761092]  zio_free+0x90/0xd0 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.761841]  dsl_free+0x11/0x20 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.762690]  dsl_dataset_block_kill+0x257/0x490 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.763587]  ? kmem_cache_free+0x1af/0x1e0
Feb  5 12:32:44 dt-prox1 kernel: [908564.764491]  ? kmem_cache_free+0x1af/0x1e0
Feb  5 12:32:44 dt-prox1 kernel: [908564.765402]  dbuf_write_done+0x162/0x1b0 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.766298]  arc_write_done+0x86/0x3f0 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.767227]  zio_done+0x2d0/0xe60 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.768096]  ? kfree+0x165/0x180
Feb  5 12:32:44 dt-prox1 kernel: [908564.768910]  ? spl_kmem_free+0x33/0x40 [spl]
Feb  5 12:32:44 dt-prox1 kernel: [908564.769745]  zio_execute+0x95/0xf0 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.770429]  taskq_thread+0x2ae/0x4d0 [spl]
Feb  5 12:32:44 dt-prox1 kernel: [908564.771139]  ? wake_up_q+0x80/0x80
Feb  5 12:32:44 dt-prox1 kernel: [908564.772060]  ? zio_reexecute+0x390/0x390 [zfs]
Feb  5 12:32:44 dt-prox1 kernel: [908564.772860]  kthread+0x105/0x140
Feb  5 12:32:44 dt-prox1 kernel: [908564.773560]  ? task_done+0xb0/0xb0 [spl]
Feb  5 12:32:44 dt-prox1 kernel: [908564.774118]  ? kthread_create_worker_on_cpu+0x70/0x70
Feb  5 12:32:44 dt-prox1 kernel: [908564.774680]  ret_from_fork+0x35/0x40
Feb  5 12:32:44 dt-prox1 kernel: [908564.775202] Code: ff ff ff 75 5d 48 83 bd 68 ff ff ff 00 0f 84 b9 fe ff ff 48 8b b5 60 ff ff ff 48 8b bd 68 ff ff ff e8 73 a3 79 00 e9 a1 fe ff ff <0f> 0b 80 4d ab 80 4c 8b 45 88 31 d2 4c 8b 4d a8 4c 89 f6 e8 66
Feb  5 12:32:44 dt-prox1 kernel: [908564.776307] RIP: __slab_free+0x1a2/0x330 RSP: ffff9acd8f06fa70
Feb  5 12:32:44 dt-prox1 kernel: [908564.776849] ---[ end trace 2045e6b14990ff17 ]---

For my HDD-Pool (RAIDZ2) I use a NVMe drive for cache and log

Help me please....

Thank you very much
 
Last edited:
If can help you, this is arc_summary (part 1):

Code:
------------------------------------------------------------------------
ZFS Subsystem Report                Thu Feb 06 07:44:18 2020
ARC Summary: (HEALTHY)
    Memory Throttle Count:            0

ARC Misc:
    Deleted:                13.13M
    Mutex Misses:                328
    Evict Skips:                433

ARC Size:                100.02%    16.00    GiB
    Target Size: (Adaptive)        100.00%    16.00    GiB
    Min Size (Hard Limit):        12.50%    2.00    GiB
    Max Size (High Water):        8:1    16.00    GiB

ARC Size Breakdown:
    Recently Used Cache Size:    77.37%    10.98    GiB
    Frequently Used Cache Size:    22.63%    3.21    GiB

ARC Hash Breakdown:
    Elements Max:                9.13M
    Elements Current:        99.23%    9.06M
    Collisions:                9.51M
    Chain Max:                6
    Chains:                    1.02M

ARC Total accesses:                    131.93M
    Cache Hit Ratio:        88.62%    116.92M
    Cache Miss Ratio:        11.38%    15.01M
    Actual Hit Ratio:        86.76%    114.46M

    Data Demand Efficiency:        75.33%    45.67M
    Data Prefetch Efficiency:    50.48%    6.71M

    CACHE HITS BY CACHE LIST:
      Anonymously Used:        0.74%    867.46k
      Most Recently Used:        17.55%    20.52M
      Most Frequently Used:        80.35%    93.94M
      Most Recently Used Ghost:    0.31%    358.14k
      Most Frequently Used Ghost:    1.06%    1.23M

    CACHE HITS BY DATA TYPE:
      Demand Data:            29.43%    34.41M
      Prefetch Data:        2.90%    3.39M
      Demand Metadata:        67.54%    78.97M
      Prefetch Metadata:        0.13%    156.18k

    CACHE MISSES BY DATA TYPE:
      Demand Data:            75.05%    11.27M
      Prefetch Data:        22.14%    3.32M
      Demand Metadata:        2.65%    397.74k
      Prefetch Metadata:        0.16%    23.57k

L2 ARC Summary: (HEALTHY)
    Low Memory Aborts:            0
    Free on Write:                295
    R/W Clashes:                0
    Bad Checksums:                0
    IO Errors:                0

L2 ARC Size: (Adaptive)                55.26    GiB
    Compressed:            58.10%    32.10    GiB
    Header Size:            0.74%    416.67    MiB

L2 ARC Evicts:
    Lock Retries:                20
    Upon Reading:                0

L2 ARC Breakdown:                15.01M
    Hit Ratio:            6.42%    964.01k
    Miss Ratio:            93.58%    14.05M
    Feeds:                    68.25k

L2 ARC Writes:
    Writes Sent:            100.00%    67.66k

DMU Prefetch Efficiency:                    28.75M
    Hit Ratio:            13.52%    3.89M
    Miss Ratio:            86.48%    24.86M
 
If can help you, this is arc_summary (part 2):


Code:
ZFS Tunables:
    dbuf_cache_hiwater_pct                            10
    dbuf_cache_lowater_pct                            10
    dbuf_cache_max_bytes                              104857600
    dbuf_cache_max_shift                              5
    dmu_object_alloc_chunk_shift                      7
    ignore_hole_birth                                 1
    l2arc_feed_again                                  1
    l2arc_feed_min_ms                                 200
    l2arc_feed_secs                                   1
    l2arc_headroom                                    2
    l2arc_headroom_boost                              200
    l2arc_noprefetch                                  0
    l2arc_norw                                        0
    l2arc_write_boost                                 8388608
    l2arc_write_max                                   8388608
    metaslab_aliquot                                  524288
    metaslab_bias_enabled                             1
    metaslab_debug_load                               0
    metaslab_debug_unload                             0
    metaslab_fragmentation_factor_enabled             1
    metaslab_lba_weighting_enabled                    1
    metaslab_preload_enabled                          1
    metaslabs_per_vdev                                200
    send_holes_without_birth_time                     1
    spa_asize_inflation                               24
    spa_config_path                                   /etc/zfs/zpool.cache
    spa_load_verify_data                              1
    spa_load_verify_maxinflight                       10000
    spa_load_verify_metadata                          1
    spa_slop_shift                                    5
    zfetch_array_rd_sz                                1048576
    zfetch_max_distance                               8388608
    zfetch_max_streams                                8
    zfetch_min_sec_reap                               2
    zfs_abd_scatter_enabled                           1
    zfs_abd_scatter_max_order                         10
    zfs_admin_snapshot                                1
    zfs_arc_average_blocksize                         8192
    zfs_arc_dnode_limit                               0
    zfs_arc_dnode_limit_percent                       10
    zfs_arc_dnode_reduce_percent                      10
    zfs_arc_grow_retry                                0
    zfs_arc_lotsfree_percent                          10
    zfs_arc_max                                       17179869184
    zfs_arc_meta_adjust_restarts                      4096
    zfs_arc_meta_limit                                0
    zfs_arc_meta_limit_percent                        75
    zfs_arc_meta_min                                  0
    zfs_arc_meta_prune                                10000
    zfs_arc_meta_strategy                             1
    zfs_arc_min                                       2147483648
    zfs_arc_min_prefetch_lifespan                     0
    zfs_arc_p_dampener_disable                        1
    zfs_arc_p_min_shift                               0
    zfs_arc_pc_percent                                0
    zfs_arc_shrink_shift                              0
    zfs_arc_sys_free                                  0
    zfs_autoimport_disable                            1
    zfs_checksums_per_second                          20
    zfs_compressed_arc_enabled                        1
    zfs_dbgmsg_enable                                 0
    zfs_dbgmsg_maxsize                                4194304
    zfs_dbuf_state_index                              0
    zfs_deadman_checktime_ms                          5000
    zfs_deadman_enabled                               1
    zfs_deadman_synctime_ms                           1000000
    zfs_dedup_prefetch                                0
    zfs_delay_min_dirty_percent                       60
    zfs_delay_scale                                   500000
    zfs_delays_per_second                             20
    zfs_delete_blocks                                 20480
    zfs_dirty_data_max                                4294967296
    zfs_dirty_data_max_max                            4294967296
    zfs_dirty_data_max_max_percent                    25
    zfs_dirty_data_max_percent                        10
    zfs_dirty_data_sync                               67108864
    zfs_dmu_offset_next_sync                          0
    zfs_expire_snapshot                               300
    zfs_flags                                         0
    zfs_free_bpobj_enabled                            1
    zfs_free_leak_on_eio                              0
    zfs_free_max_blocks                               100000
    zfs_free_min_time_ms                              1000
    zfs_immediate_write_sz                            32768
    zfs_max_recordsize                                1048576
    zfs_mdcomp_disable                                0
    zfs_metaslab_fragmentation_threshold              70
    zfs_metaslab_segment_weight_enabled               1
    zfs_metaslab_switch_threshold                     2
    zfs_mg_fragmentation_threshold                    85
    zfs_mg_noalloc_threshold                          0
    zfs_multihost_fail_intervals                      5
    zfs_multihost_history                             0
    zfs_multihost_import_intervals                    10
    zfs_multihost_interval                            1000
    zfs_multilist_num_sublists                        0
    zfs_no_scrub_io                                   0
    zfs_no_scrub_prefetch                             0
    zfs_nocacheflush                                  0
    zfs_nopwrite_enabled                              1
    zfs_object_mutex_size                             64
    zfs_pd_bytes_max                                  52428800
    zfs_per_txg_dirty_frees_percent                   30
    zfs_prefetch_disable                              0
    zfs_read_chunk_size                               1048576
    zfs_read_history                                  0
    zfs_read_history_hits                             0
    zfs_recover                                       0
    zfs_recv_queue_length                             16777216
    zfs_resilver_delay                                2
    zfs_resilver_min_time_ms                          3000
    zfs_scan_idle                                     50
    zfs_scan_ignore_errors                            0
    zfs_scan_min_time_ms                              1000
    zfs_scrub_delay                                   4
    zfs_send_corrupt_data                             0
    zfs_send_queue_length                             16777216
    zfs_sync_pass_deferred_free                       2
    zfs_sync_pass_dont_compress                       5
    zfs_sync_pass_rewrite                             2
    zfs_sync_taskq_batch_pct                          75
    zfs_top_maxinflight                               32
    zfs_txg_history                                   0
    zfs_txg_timeout                                   5
    zfs_vdev_aggregation_limit                        131072
    zfs_vdev_async_read_max_active                    3
    zfs_vdev_async_read_min_active                    1
    zfs_vdev_async_write_active_max_dirty_percent     60
    zfs_vdev_async_write_active_min_dirty_percent     30
    zfs_vdev_async_write_max_active                   10
    zfs_vdev_async_write_min_active                   2
    zfs_vdev_cache_bshift                             16
    zfs_vdev_cache_max                                16384
    zfs_vdev_cache_size                               0
    zfs_vdev_max_active                               1000
    zfs_vdev_mirror_non_rotating_inc                  0
    zfs_vdev_mirror_non_rotating_seek_inc             1
    zfs_vdev_mirror_rotating_inc                      0
    zfs_vdev_mirror_rotating_seek_inc                 5
    zfs_vdev_mirror_rotating_seek_offset              1048576
    zfs_vdev_queue_depth_pct                          1000
    zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3 avx2 avx512f avx512bw
    zfs_vdev_read_gap_limit                           32768
    zfs_vdev_scheduler                                noop
    zfs_vdev_scrub_max_active                         2
    zfs_vdev_scrub_min_active                         1
    zfs_vdev_sync_read_max_active                     10
    zfs_vdev_sync_read_min_active                     10
    zfs_vdev_sync_write_max_active                    10
    zfs_vdev_sync_write_min_active                    10
    zfs_vdev_write_gap_limit                          4096
    zfs_zevent_cols                                   80
    zfs_zevent_console                                0
    zfs_zevent_len_max                                896
    zil_replay_disable                                0
    zil_slog_bulk                                     786432
    zio_delay_max                                     30000
    zio_dva_throttle_enabled                          1
    zio_requeue_io_start_cut_in_line                  1
    zio_taskq_batch_pct                               75
    zvol_inhibit_dev                                  0
    zvol_major                                        230
    zvol_max_discard_blocks                           16384
    zvol_prefetch_bytes                               131072
    zvol_request_sync                                 0
    zvol_threads                                      32
    zvol_volmode                                      1
 
Maybe is related to the fact that my servers uses swap partition that is on a zfs raid1 volume created when I installed Proxmox 5.4?
Could be, can you try removing that swap from /etc/fstab (temporarily) or with "swapoff -a" and see if that helps?
 
Hi,

I have removed swap with "swapoff -a" and now... hope well...

I also noticed in kern.log the following message:

Code:
Feb  7 14:11:50 dt-prox1 kernel: [57824.528963] perf: interrupt took too long (4912 > 4902), lowering kernel.perf_event_max_sample_rate to 40500

What is it?
 
The perf message is pretty common. It happens if there's a bit of CPU load going on, doesn't even have to be that much load so normally nothing to worry about - more of a information message.
 
Go to the affected node then disks, see if you can find any S.M.A.R.T error like below.

image001.jpg
I had the issue as shown in my screenshot which was resolved after replacing the affected hard drive. Maybe root cause of your issue is also a degraded hard drive.
 
I'd try to remove the NVMe device from the party to eliminate that as a possible source.
NVMe devices failing could cause entire hosts to crash in the past. Perhaps this is part of your problem (it's only a guess, don't know if these issues still persist).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!