kernel-4.15.18-8 ZFS freeze

Discussion in 'Proxmox VE: Installation and configuration' started by aa007, Nov 17, 2018.

  1. aa007

    aa007 New Member

    Joined:
    Feb 6, 2014
    Messages:
    7
    Likes Received:
    0
    Hi,

    I have upgraded the kernel to this version just yesterday and today our hypervisor showed kernel panic and journald was complaining that it cant write anything. VMs were running, but when i tried to write anything to the disk it froze. After 10 more minutes VMs stopped responding.
    After reseting it all went back to normal.

    I dont have much more, but I saw there were some patches regarding ZFS in this version. So for now I am downgrading to 4.15.18-7
     
    #1 aa007, Nov 17, 2018
    Last edited: Nov 17, 2018
  2. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    13,159
    Likes Received:
    352
    Please post details about your hardware, maybe this helps for debugging (e.g. storage controller)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. aa007

    aa007 New Member

    Joined:
    Feb 6, 2014
    Messages:
    7
    Likes Received:
    0
    Its Fujitsu PRIMERGY RX2530M1 with PRAID EP400i - 8 Seagate ST900MM0018 drives put in JBOD mode so Proxmox sees all the drivers - one of them is a hot spare.
    We have attached two M2 SSDs (Samsung 970 PRO 512GB and Samsung 860 EVO M.2 250GB) for ZIL / L2ARC using I-TEC PCI-E 2x M.2 Card - we were out of disk slots for attaching SSDs, so one is attached using PCIe and the other is SATA.
    There are 2 partitions on each device (32GB/96GB). We have a mirror of the first 2 32GB partitions for ZIL and then we use the 96GB partition on Samsung 970 for L2ARC.
     
  4. marsian

    marsian Member
    Proxmox VE Subscriber

    Joined:
    Sep 27, 2016
    Messages:
    36
    Likes Received:
    2
    Are you using the latest BIOS and Firmware on it? We've seen sporadic hangs with older firmwares on FTS devices, but could fix all of them with recent upgrades....
     
  5. aa007

    aa007 New Member

    Joined:
    Feb 6, 2014
    Messages:
    7
    Likes Received:
    0
    It has happened this morning again even with older version. of the kernel
    So I have found there was outdated BIOS. Other components are up to date. I have upgraded the BIOS to the latest version and I see there was a new version of the kernel 4.15.18-9 available so I have installed it also.
    Will report if it happens again.
     
  6. aa007

    aa007 New Member

    Joined:
    Feb 6, 2014
    Messages:
    7
    Likes Received:
    0
    So unfortunately it has happened again. This time we managed to get the stack trace:
    Code:
    [223849.690311] kernel BUG at mm/slub.c:296!
    [223849.690345] invalid opcode: 0000 [#1] SMP PTI
    [223849.690368] Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment xt_tcpudp xt_set xt_addrtype xt_conntrack xt_mark ip_set_hash_net ip_set xt_multiport iptable_filter openvswitch nsh nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c softdog nfnetlink_log nfnetlink intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif mgag200 ttm irqbypass crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel pcbc snd_pcm drm aesni_intel snd_timer aes_x86_64 crypto_simd snd i2c_algo_bit glue_helper cryptd fb_sys_fops syscopyarea sysfillrect soundcore mei_me intel_cstate joydev input_leds sysimgblt
    [223849.690618]  intel_rapl_perf ipmi_si pcspkr mei lpc_ich ipmi_devintf ipmi_msghandler shpchp wmi acpi_power_meter mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm sunrpc ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq uas usb_storage hid_generic usbkbd usbmouse usbhid hid ahci libahci i2c_i801 ixgbe be2net igb(O) dca ptp pps_core mdio megaraid_sas
    [223849.690758] CPU: 28 PID: 40604 Comm: z_wr_int_4 Tainted: P           O     4.15.18-9-pve #1
    [223849.690781] Hardware name: FUJITSU PRIMERGY RX2530 M1/D3279-A1, BIOS V5.0.0.9 R1.36.0 for D3279-A1x                     06/06/2018
    [223849.690816] RIP: 0010:__slab_free+0x1a2/0x330
    [223849.690830] RSP: 0018:ffffb84c5c8bfa70 EFLAGS: 00010246
    [223849.690847] RAX: ffff943781796f60 RBX: ffff943781796f60 RCX: 00000001002a0020
    [223849.691793] RDX: ffff943781796f60 RSI: ffffda0c5705e580 RDI: ffff9441ff407600
    [223849.692728] RBP: ffffb8
    The rest is captured on a photo:
    [​IMG]
     
  7. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,068
    Likes Received:
    248
    JBOD mode is still a Raid and not supported with ZFS.
    ZFS has problems with transparent caches.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice