[SOLVED] [z_null_int] with 99.99 % IO load after 5.1 upgrade

JohnD

Renowned Member
Oct 7, 2012
84
12
73
Hello everybody,

i just upgraded one of my proxmox nodes from 4.4 to 5.1.

Code:
root@prox12 ~ # zpool status
  pool: hddtank
 state: ONLINE
  scan: scrub repaired 0B in 23h16m with 0 errors on Sun Nov 12 23:40:48 2017
config:

    NAME           STATE     READ WRITE CKSUM
    hddtank        ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        sda        ONLINE       0     0     0
        sdb        ONLINE       0     0     0
    logs
      mirror-1     ONLINE       0     0     0
        nvme0n1p3  ONLINE       0     0     0
        nvme1n1p3  ONLINE       0     0     0
    cache
      nvme0n1p5    ONLINE       0     0     0
      nvme1n1p5    ONLINE       0     0     0

errors: No known data errors

  pool: ssdtank
 state: ONLINE
  scan: scrub repaired 0B in 0h17m with 0 errors on Sun Nov 12 00:42:01 2017
config:

    NAME           STATE     READ WRITE CKSUM
    ssdtank        ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        nvme0n1p6  ONLINE       0     0     0
        nvme1n1p6  ONLINE       0     0     0

errors: No known data errors

  pool: ssdtank2
 state: ONLINE
  scan: scrub repaired 0B in 0h15m with 0 errors on Sun Nov 12 00:39:49 2017
config:

    NAME                                            STATE     READ WRITE CKSUM
    ssdtank2                                        ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        ata-Micron_5100_MTFDDAK480TBY_1721174288EC  ONLINE       0     0     0
        ata-Micron_5100_MTFDDAK480TBY_172117420E1C  ONLINE       0     0     0

Everything runs fine but the load on the node (with 10 VMs running) is a bit high because of constant IO load.
Iotop shows me 1-3 [z_null_int] processes with 99.99% IO load. At least one [z_null_int] process is always running.

Any ideas what [z_null_int] is really doing there?

Thanks in advance,

John
 
  • Like
Reactions: chrone
is dedup enabled (or has it ever been enabled on any of the datasets)?
 
Yes for ssdtank. But it has only ~400 GB total capacity so the dedup table should be small.

edit:
Code:
  pool: ssdtank
 state: ONLINE
  scan: scrub repaired 0B in 0h17m with 0 errors on Sun Nov 12 00:42:01 2017
config:

    NAME           STATE     READ WRITE CKSUM
    ssdtank        ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        nvme0n1p6  ONLINE       0     0     0
        nvme1n1p6  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 13114450, size 349B on disk, 201B in core

bucket              allocated                       referenced        
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    10.9M    117G   61.5G   61.5G    10.9M    117G   61.5G   61.5G
     2    1.10M   9.55G   5.95G   5.95G    2.62M   22.6G   14.3G   14.3G
     4     315K   2.91G   1.70G   1.70G    1.52M   14.6G   8.55G   8.55G
     8     145K   1.37G    827M    827M    1.47M   14.2G   8.28G   8.28G
    16    56.2K    490M    331M    331M    1.12M   9.79G   6.71G   6.71G
    32    24.3K    215M    162M    162M    1.13M   10.2G   7.55G   7.55G
    64    16.0K    196M    101M    101M    1.26M   14.7G   7.60G   7.60G
   128    4.02K   36.7M   25.0M   25.0M     574K   5.40G   3.64G   3.64G
   256    3.23K   25.8M   17.0M   17.0M    1.19M   9.51G   6.26G   6.26G
   512      114    912K    532K    532K    64.0K    512M    288M    288M
    1K        8     64K      4K      4K    9.25K   74.0M   4.62M   4.62M
    2K        4     32K      2K      2K    11.8K   94.5M   5.91M   5.91M
    4K        2     16K      1K      1K    10.6K   84.6M   5.29M   5.29M
    8K        1      8K    512B    512B    9.48K   75.8M   4.74M   4.74M
   16K        1      8K    512B    512B    17.5K    140M   8.75M   8.75M
  512K        1      8K    512B    512B     568K   4.44G    284M    284M
 Total    12.5M    132G   70.6G   70.6G    22.4M    224G    125G    125G
 
Last edited:
dedup still causes overhead, and you might just be hitting some kind of corner case there.. that thread type is (among other things) used for
  • L2ARC writes
  • DDT repair writes
  • sync frees
 
I also got constant z_null_int 99.99% IO albeit not using zfs deduplication since upgraded to Proxmox 5.1 with ZFS 0.7.2 or ZFS 0.7.3. The memory consumption is also increase daily, causing Proxmox random reboot although we allocated around 10GB of RAM for Proxmox and ZFS. The zfs_arc_max is set to 1GB.
 
I also got constant z_null_int 99.99% IO albeit not using zfs deduplication since upgraded to Proxmox 5.1 with ZFS 0.7.2 or ZFS 0.7.3. The memory consumption is also increase daily, causing Proxmox random reboot although we allocated around 10GB of RAM for Proxmox and ZFS. The zfs_arc_max is set to 1GB.

could you get an "arc_summary" output when the memory consumption has reached a high value? and the pool and dataset properties / pool layout would also be helpful.
 
  • Like
Reactions: chrone
could you get an "arc_summary" output when the memory consumption has reached a high value? and the pool and dataset properties / pool layout would also be helpful.

Hi Fabian,

Thanks. I attached the data and screenshot.

Reconfigured all VMs to use qemu discard and nocache seemed helped with the memory leak. I suspect the qemu writeback disk cache mode was taking up the precious RAM on the host if more than one VMs were constantly writing.

Forgot to report, using "qm migrate VMID NODEname --online --with-local-disks" causing very high CPU load (more than 50) and very high CPU IO wait (75-90%) on the target node. Didn't have this issue on Proxmox 4.4 with ZFS 0.6.x.

I'll monitor for the next couple days and update you later.
 

Attachments

Hi I have the same problem with proxmox 5.1 and zfs 0.7.2. I also restarted the node but the problem immediately reappeared.
It seems to be related to a problem with the arc. When the cache hit miss ratio is high [z_null_int] is overloaded.

Someone was able to find a work around?
 
Is there any possible work around? The bug is fixed in the upstream repo. I tried to backport it but it at least needs another (bigger) patch (zfsonlinux/zfs commit d4a72f2386) -- though I did not test if that is even possible to backport to the current debian version.

P.S.: Sorry, but new easier are not allowed to post links…
 
@fabian I've read in another thread that your are working on a new ZFS version (sorry cannot post links yet :/), can you consider backporting the fixes for #6171 too?

Also, I'd like to test the new patches if you need testers :)
 
@fabian I've read in another thread that your are working on a new ZFS version (sorry cannot post links yet :/), can you consider backporting the fixes for #6171 too?

Also, I'd like to test the new patches if you need testers :)

the patch set for 0.7.6 (which includes a backport of #6171) is currently running through the upstream build bots, once those went through successfully I'll build ZFS and kernel packages for PVE for testing.
 
  • Like
Reactions: JohnD and apollo13
the patch set for 0.7.6 (which includes a backport of #6171) is currently running through the upstream build bots, once those went through successfully I'll build ZFS and kernel packages for PVE for testing.

went with the cherry-pick since 0.7.6 took too long ;)

updated kernel and ZFS packages are available on pvetest
 
  • Like
Reactions: JohnD
@fabian I have applied the patch to one server in the pool and modinfo seems to confirm:

Code:
filename:       /lib/modules/4.13.13-5-pve/zfs/zfs.ko
version:        0.7.4-1
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
srcversion:     E8EDB9B5FFA260178BA7DC9
depends:        spl,znvpair,zcommon,zunicode,zavl,icp
name:           zfs
vermagic:       4.13.13-5-pve SMP mod_unload modversions

I/O according to iotop is still at 99.99% -- any hints?

EDIT:// apt policies:
Code:
apt-cache policy pve-kernel-4.13.13-5-pve zfsutils-linux zfs-initramfs
pve-kernel-4.13.13-5-pve:
  Installed: 4.13.13-37
  Candidate: 4.13.13-37
  Version table:
 *** 4.13.13-37 100
        100 /var/lib/dpkg/status
     4.13.13-36 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
zfsutils-linux:
  Installed: 0.7.4-pve2~bpo9
  Candidate: 0.7.4-pve2~bpo9
  Version table:
 *** 0.7.4-pve2~bpo9 100
        100 /var/lib/dpkg/status
     0.7.3-pve1~bpo9 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.7.2-pve1~bpo90 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.6.5.11-pve18~bpo90 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.6.5.11-pve17~bpo90 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.6.5.9-pve16~bpo90 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.6.5.9-5 500
        500 http://ftp.at.debian.org/debian stretch/contrib amd64 Packages
zfs-initramfs:
  Installed: 0.7.4-pve2~bpo9
  Candidate: 0.7.4-pve2~bpo9
  Version table:
 *** 0.7.4-pve2~bpo9 100
        100 /var/lib/dpkg/status
     0.7.3-pve1~bpo9 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.7.2-pve1~bpo90 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.6.5.11-pve18~bpo90 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.6.5.11-pve17~bpo90 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.6.5.9-pve16~bpo90 500
        500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
     0.6.5.9-5 500
        500 http://ftp.at.debian.org/debian stretch/contrib amd64 Packages
 
Mhm, is there any documentation on how to install perf for the proxmox kernel:
Code:
/usr/bin/perf: line 13: exec: perf_4.13: not found
E: linux-perf-4.13 is not installed.

Or is there any extra repo that I can use for that?
 
Also having this issue on host via iotop:
99.99 % [z_null_int]

Looking forward to the patch in enterprise repo
 
Mhm, is there any documentation on how to install perf for the proxmox kernel:
Code:
/usr/bin/perf: line 13: exec: perf_4.13: not found
E: linux-perf-4.13 is not installed.

Or is there any extra repo that I can use for that?
#apt-get install linux-tools-4.13
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!