Proxmox VE 5.1 - ZFS kernel tainted, pvestatd frozen

Discussion in 'Proxmox VE: Installation and configuration' started by davidindra, Nov 30, 2017.

  1. davidindra

    davidindra New Member

    Joined:
    Oct 14, 2017
    Messages:
    29
    Likes Received:
    0
    Hello,
    I want to ask you for help with some bug I discovered.

    I have Proxmox cluster built from two nodes. Sometimes, pvestatd service hangs and that node is marked red (as unavailable) in web GUI. When I look into its log, I see that process launched by it, zpool status -o name -H rpool hangs, eats 100% (one core) of CPU and cannot be killed (even by SIGKILL).
    Then, each minute this appears in my dmesg:

    Code:
    [519015.269863] spl_kmem_alloc_impl: 114631497 callbacks suppressed
    [519015.269864] Possible memory allocation deadlock: size=32776 lflags=0x1404200
    [519015.269866] CPU: 1 PID: 16295 Comm: zpool Tainted: P           O    4.13.4-1-pve #1
    [519015.269867] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 3.0a 12/21/2015
    [519015.269868] Call Trace:
    [519015.269873]  dump_stack+0x63/0x8b
    [519015.269880]  spl_kmem_alloc_impl+0x173/0x180 [spl]
    [519015.269882]  spl_vmem_alloc+0x19/0x20 [spl]
    [519015.269887]  nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
    [519015.269889]  nv_mem_zalloc.isra.0+0x15/0x40 [znvpair]
    [519015.269891]  nvlist_xpack+0xb4/0x110 [znvpair]
    [519015.269894]  ? nvlist_common.part.89+0x118/0x200 [znvpair]
    [519015.269896]  nvlist_pack+0x34/0x40 [znvpair]
    [519015.269899]  fnvlist_pack+0x3e/0xa0 [znvpair]
    [519015.269931]  put_nvlist+0x95/0x100 [zfs]
    [519015.269953]  zfs_ioc_pool_stats+0x50/0x90 [zfs]
    [519015.269974]  zfsdev_ioctl+0x5d4/0x660 [zfs]
    [519015.269976]  do_vfs_ioctl+0xa3/0x610
    [519015.269979]  ? handle_mm_fault+0xce/0x1c0
    [519015.269980]  ? __do_page_fault+0x266/0x4e0
    [519015.269981]  SyS_ioctl+0x79/0x90
    [519015.269982]  entry_SYSCALL_64_fastpath+0x1e/0xa9
    [519015.269983] RIP: 0033:0x7f6cb065ae07
    [519015.269984] RSP: 002b:00007fff87ef6a68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [519015.269985] RAX: ffffffffffffffda RBX: 00007f6cb0913b00 RCX: 00007f6cb065ae07
    [519015.269986] RDX: 00007fff87ef6a90 RSI: 0000000000005a05 RDI: 0000000000000003
    [519015.269986] RBP: 0000558c42291fb0 R08: 0000000000000003 R09: 0000000000010010
    [519015.269987] R10: 00007f6cb069bb20 R11: 0000000000000246 R12: 0000000000010000
    [519015.269987] R13: 0000000000020060 R14: 0000558c42291fa0 R15: 00007fff87efa0e0
    [519015.269989] Possible memory allocation deadlock: size=32776 lflags=0x1404200
    [519015.269990] CPU: 1 PID: 16295 Comm: zpool Tainted: P           O    4.13.4-1-pve #1
    [519015.269991] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 3.0a 12/21/2015
    [519015.269991] Call Trace:
    [519015.269992]  dump_stack+0x63/0x8b
    [519015.269995]  spl_kmem_alloc_impl+0x173/0x180 [spl]
    [519015.269997]  spl_vmem_alloc+0x19/0x20 [spl]
    [519015.270000]  nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
    [519015.270002]  nv_mem_zalloc.isra.0+0x15/0x40 [znvpair]
    [519015.270004]  nvlist_xpack+0xb4/0x110 [znvpair]
    [519015.270007]  ? nvlist_common.part.89+0x118/0x200 [znvpair]
    [519015.270009]  nvlist_pack+0x34/0x40 [znvpair]
    [519015.270012]  fnvlist_pack+0x3e/0xa0 [znvpair]
    [519015.270032]  put_nvlist+0x95/0x100 [zfs]
    [519015.270052]  zfs_ioc_pool_stats+0x50/0x90 [zfs]
    [519015.270092]  zfsdev_ioctl+0x5d4/0x660 [zfs]
    [519015.270094]  do_vfs_ioctl+0xa3/0x610
    [519015.270095]  ? handle_mm_fault+0xce/0x1c0
    [519015.270096]  ? __do_page_fault+0x266/0x4e0
    [519015.270097]  SyS_ioctl+0x79/0x90
    [519015.270098]  entry_SYSCALL_64_fastpath+0x1e/0xa9
    [519015.270098] RIP: 0033:0x7f6cb065ae07
    [519015.270099] RSP: 002b:00007fff87ef6a68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [519015.270099] RAX: ffffffffffffffda RBX: 00007f6cb0913b00 RCX: 00007f6cb065ae07
    [519015.270100] RDX: 00007fff87ef6a90 RSI: 0000000000005a05 RDI: 0000000000000003
    [519015.270100] RBP: 0000558c42291fb0 R08: 0000000000000003 R09: 0000000000010010
    [519015.270101] R10: 00007f6cb069bb20 R11: 0000000000000246 R12: 0000000000010000
    [519015.270101] R13: 0000000000020060 R14: 0000558c42291fa0 R15: 00007fff87efa0e0
    [519015.270103] Possible memory allocation deadlock: size=32776 lflags=0x1404200
    [519015.270104] CPU: 1 PID: 16295 Comm: zpool Tainted: P           O    4.13.4-1-pve #1
    [519015.270104] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 3.0a 12/21/2015
    [519015.270104] Call Trace:
    [519015.270105]  dump_stack+0x63/0x8b
    [519015.270108]  spl_kmem_alloc_impl+0x173/0x180 [spl]
    [519015.270110]  spl_vmem_alloc+0x19/0x20 [spl]
    [519015.270122]  nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
    [519015.270124]  nv_mem_zalloc.isra.0+0x15/0x40 [znvpair]
    [519015.270127]  nvlist_xpack+0xb4/0x110 [znvpair]
    [519015.270130]  ? nvlist_common.part.89+0x118/0x200 [znvpair]
    [519015.270132]  nvlist_pack+0x34/0x40 [znvpair]
    [519015.270135]  fnvlist_pack+0x3e/0xa0 [znvpair]
    [519015.270164]  put_nvlist+0x95/0x100 [zfs]
    [519015.270184]  zfs_ioc_pool_stats+0x50/0x90 [zfs]
    [519015.270204]  zfsdev_ioctl+0x5d4/0x660 [zfs]
    [519015.270205]  do_vfs_ioctl+0xa3/0x610
    [519015.270206]  ? handle_mm_fault+0xce/0x1c0
    [519015.270207]  ? __do_page_fault+0x266/0x4e0
    [519015.270208]  SyS_ioctl+0x79/0x90
    [519015.270209]  entry_SYSCALL_64_fastpath+0x1e/0xa9
    [519015.270210] RIP: 0033:0x7f6cb065ae07
    [519015.270210] RSP: 002b:00007fff87ef6a68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [519015.270211] RAX: ffffffffffffffda RBX: 00007f6cb0913b00 RCX: 00007f6cb065ae07
    [519015.270211] RDX: 00007fff87ef6a90 RSI: 0000000000005a05 RDI: 0000000000000003
    [519015.270212] RBP: 0000558c42291fb0 R08: 0000000000000003 R09: 0000000000010010
    [519015.270212] R10: 00007f6cb069bb20 R11: 0000000000000246 R12: 0000000000010000
    [519015.270213] R13: 0000000000020060 R14: 0000558c42291fa0 R15: 00007fff87efa0e0
    [519015.270214] Possible memory allocation deadlock: size=32776 lflags=0x1404200
    [519015.270215] CPU: 1 PID: 16295 Comm: zpool Tainted: P           O    4.13.4-1-pve #1
    [519015.270216] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 3.0a 12/21/2015
    [519015.270216] Call Trace:
    [519015.270217]  dump_stack+0x63/0x8b
    [519015.270219]  spl_kmem_alloc_impl+0x173/0x180 [spl]
    [519015.270222]  spl_vmem_alloc+0x19/0x20 [spl]
    [519015.270224]  nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
    [519015.270226]  nv_mem_zalloc.isra.0+0x15/0x40 [znvpair]
    [519015.270229]  nvlist_xpack+0xb4/0x110 [znvpair]
    [519015.270231]  ? nvlist_common.part.89+0x118/0x200 [znvpair]
    [519015.270234]  nvlist_pack+0x34/0x40 [znvpair]
    [519015.270236]  fnvlist_pack+0x3e/0xa0 [znvpair]
    [519015.270256]  put_nvlist+0x95/0x100 [zfs]
    [519015.270275]  zfs_ioc_pool_stats+0x50/0x90 [zfs]
    [519015.270295]  zfsdev_ioctl+0x5d4/0x660 [zfs]
    [519015.270296]  do_vfs_ioctl+0xa3/0x610
    [519015.270297]  ? handle_mm_fault+0xce/0x1c0
    [519015.270298]  ? __do_page_fault+0x266/0x4e0
    [519015.270299]  SyS_ioctl+0x79/0x90
    [519015.270300]  entry_SYSCALL_64_fastpath+0x1e/0xa9
    [519015.270300] RIP: 0033:0x7f6cb065ae07
    [519015.270301] RSP: 002b:00007fff87ef6a68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [519015.270302] RAX: ffffffffffffffda RBX: 00007f6cb0913b00 RCX: 00007f6cb065ae07
    [519015.270302] RDX: 00007fff87ef6a90 RSI: 0000000000005a05 RDI: 0000000000000003
    [519015.270302] RBP: 0000558c42291fb0 R08: 0000000000000003 R09: 0000000000010010
    [519015.270303] R10: 00007f6cb069bb20 R11: 0000000000000246 R12: 0000000000010000
    [519015.270303] R13: 0000000000020060 R14: 0000558c42291fa0 R15: 00007fff87efa0e0
    [519015.270305] Possible memory allocation deadlock: size=32776 lflags=0x1404200
    [519015.270306] CPU: 1 PID: 16295 Comm: zpool Tainted: P           O    4.13.4-1-pve #1
    [519015.270306] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 3.0a 12/21/2015
    [519015.270306] Call Trace:
    [519015.270307]  dump_stack+0x63/0x8b
    [519015.270310]  spl_kmem_alloc_impl+0x173/0x180 [spl]
    [519015.270312]  spl_vmem_alloc+0x19/0x20 [spl]
    [519015.270315]  nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
    [519015.270317]  nv_mem_zalloc.isra.0+0x15/0x40 [znvpair]
    [519015.270319]  nvlist_xpack+0xb4/0x110 [znvpair]
    [519015.270322]  ? nvlist_common.part.89+0x118/0x200 [znvpair]
    [519015.270324]  nvlist_pack+0x34/0x40 [znvpair]
    [519015.270326]  fnvlist_pack+0x3e/0xa0 [znvpair]
    [519015.270346]  put_nvlist+0x95/0x100 [zfs]
    [519015.270366]  zfs_ioc_pool_stats+0x50/0x90 [zfs]
    [519015.270385]  zfsdev_ioctl+0x5d4/0x660 [zfs]
    [519015.270387]  do_vfs_ioctl+0xa3/0x610
    [519015.270388]  ? handle_mm_fault+0xce/0x1c0
    [519015.270388]  ? __do_page_fault+0x266/0x4e0
    [519015.270389]  SyS_ioctl+0x79/0x90
    [519015.270390]  entry_SYSCALL_64_fastpath+0x1e/0xa9
    [519015.270391] RIP: 0033:0x7f6cb065ae07
    [519015.270391] RSP: 002b:00007fff87ef6a68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    [519015.270392] RAX: ffffffffffffffda RBX: 00007f6cb0913b00 RCX: 00007f6cb065ae07
    [519015.270392] RDX: 00007fff87ef6a90 RSI: 0000000000005a05 RDI: 0000000000000003
    [519015.270393] RBP: 0000558c42291fb0 R08: 0000000000000003 R09: 0000000000010010
    [519015.270393] R10: 00007f6cb069bb20 R11: 0000000000000246 R12: 0000000000010000
    [519015.270394] R13: 0000000000020060 R14: 0000558c42291fa0 R15: 00007fff87efa0e0
    When I restart pvestatd and wait for some timeouts, the node reappears green again in GUI, but this process still hangs and can be killed only by system restart.

    Please, can you help me with some tips how to avoid this problem? In case of need, I can supply additional logs of course.

    Thanks, have a nice day!
    David
     
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,763
    Likes Received:
    315
    Hi,

    please send the output of

    Code:
    pveversion -v
    
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. davidindra

    davidindra New Member

    Joined:
    Oct 14, 2017
    Messages:
    29
    Likes Received:
    0
    Here it is:
    Code:
    proxmox-ve: 5.1-26 (running kernel: 4.13.4-1-pve)
    pve-manager: 5.1-36 (running version: 5.1-36/131401db)
    pve-kernel-4.13.4-1-pve: 4.13.4-26
    pve-kernel-4.10.17-2-pve: 4.10.17-20
    libpve-http-server-perl: 2.0-6
    lvm2: 2.02.168-pve6
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-15
    qemu-server: 5.0-17
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-20
    libpve-guest-common-perl: 2.0-13
    libpve-access-control: 5.0-7
    libpve-storage-perl: 5.0-16
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-2
    pve-docs: 5.1-12
    pve-qemu-kvm: 2.9.1-2
    pve-container: 2.0-17
    pve-firewall: 3.0-3
    pve-ha-manager: 2.0-3
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.1.0-2
    lxcfs: 2.0.7-pve4
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.7.3-pve1~bpo9
     
  4. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,763
    Likes Received:
    315
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. mbaldini

    mbaldini Member

    Joined:
    Nov 7, 2015
    Messages:
    167
    Likes Received:
    20
    I had a similar problem on a server, for me the solution was to reduce the used RAM by ARC ZFS cache and raise vm.min_free_kbytes
    For ARC size, in /etc/modprobe.d/zfs.conf put
    Code:
    options zfs zfs_arc_max=X
    where X is the size in bytes of how much RAM you want ARC cache to use, the value to use is based on how much RAM the server has and how much RAM is used by VMs and processes on the server
    After editing that file:
    Code:
    update-initramfs -u
    After I did that, the problem has happened less often, but it sometimes happened, especially under high load. So that change I did was to keep more free memory for the system, changing vm.min_free_kbytes
    Code:
    echo 524288 >  /proc/sys/vm/min_free_kbytes
    for 512M RAM free.
    If you want to keep it across reboots, just edit /etc/sysctl.d/pve-local.conf and add
    Code:
    vm.swappiness = 10
    vm.min_free_kbytes = 524288
    
    swappiness = 10 to reduce swap usage
     
  6. davidindra

    davidindra New Member

    Joined:
    Oct 14, 2017
    Messages:
    29
    Likes Received:
    0
    Hi,
    I've upgraded kernel from pve-test repository, set min_free_kbytes and swappiness and it looks the problem is gone (I will monitor it for longer time). Also IO delay significantly dropped (what was my long-term problem), maybe because of the kernel upgrade...?

    Anyway, thank you all a lot! :)
    David
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice