[SOLVED] [z_null_int] with 99.99 % IO load after 5.1 upgrade

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,472
541
113
@fabian I have applied the patch to one server in the pool and modinfo seems to confirm:

Code:
filename:       /lib/modules/4.13.13-5-pve/zfs/zfs.ko
version:        0.7.4-1
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
srcversion:     E8EDB9B5FFA260178BA7DC9
depends:        spl,znvpair,zcommon,zunicode,zavl,icp
name:           zfs
vermagic:       4.13.13-5-pve SMP mod_unload modversions
I/O according to iotop is still at 99.99% -- any hints?
did you reboot after installing the kernel? (alternatively, you could unload and reload the zfs modules, but that requires exporting all ZFS pools ;))
 

apollo13

New Member
Jan 16, 2018
18
1
3
49
Yes I did reboot.

perf top shows:
Code:
Samples: 15K of event 'cycles:ppp', Event count (approx.): 3405002590
Overhead  Shared Object             Symbol
   3.32%  perl                      [.] Perl_yyparse
   2.24%  [kernel]                  [k] copy_page
   2.16%  [kernel]                  [k] native_queued_spin_lock_slowpath
   1.73%  perl                      [.] Perl_yylex
   1.42%  [kernel]                  [k] multilist_sublist_next
   1.40%  [kernel]                  [k] lz4_compress_zfs
   1.39%  [kernel]                  [k] vmx_vcpu_run
   1.03%  perl                      [.] Perl_hv_common
   0.98%  [kernel]                  [k] l2arc_feed_thread
   0.97%  [kernel]                  [k] _raw_spin_lock_irqsave
   0.94%  [kernel]                  [k] clear_page_erms
   0.92%  [kernel]                  [k] _raw_spin_lock
   0.85%  [kernel]                  [k] __schedule
   0.77%  perl                      [.] Perl_Slab_Alloc
   0.74%  perl                      [.] Perl_rpeep
   0.71%  [kernel]                  [k] menu_select
   0.64%  [kernel]                  [k] update_load_avg
   0.62%  [kernel]                  [k] osq_lock
   0.59%  libqb.so.0.18.1           [.] qb_array_index
   0.54%  perl                      [.] Perl_leave_scope
   0.52%  perl                      [.] Perl_gv_fetchpvn_flags
   0.51%  [kernel]                  [k] syscall_return_via_sysret
   0.51%  [kernel]                  [k] kvm_arch_vcpu_ioctl_run
   0.49%  [kernel]                  [k] unmap_page_range
   0.48%  perl                      [.] Perl_sv_clear
   0.47%  [kernel]                  [k] cpuidle_enter_state
   0.45%  [kernel]                  [k] switch_mm_irqs_off
   0.45%  perl                      [.] Perl_sv_gets
   0.45%  libc-2.24.so              [.] malloc
   0.42%  [kernel]                  [k] native_irq_return_iret
   0.41%  [kernel]                  [k] mutex_lock
   0.41%  [kernel]                  [k] kallsyms_expand_symbol.constprop.0
   0.41%  libc-2.24.so              [.] __libc_calloc
   0.39%  perl                      [.] Perl_runops_standard
   0.38%  perl                      [.] Perl_sv_upgrade
   0.37%  [kernel]                  [k] ktime_get
   0.37%  [kernel]                  [k] memcpy_erms
   0.37%  [kernel]                  [k] __fget
   0.37%  [kernel]                  [k] native_write_msr
   0.35%  [unknown]                 [k] 0xfffffe000055a01b
   0.35%  [kernel]                  [k] format_decode
   0.35%  [kernel]                  [k] _find_next_bit
   0.34%  libqb.so.0.18.1           [.] qb_poll_fds_usage_check_
   0.33%  [kernel]                  [k] number
   0.32%  [kernel]                  [k] module_get_kallsym
 

LnxBil

Well-Known Member
Feb 21, 2015
4,231
411
83
Germany
perf top shows:
Thanks, but you have to run it when your IO load is high. The output shows where exactly the cpu cycles are used and then one can argue what happens in the kernel.

here is an example showing that most of the time is spent in deflate_slow and longest_match to compress the datastream:

perf.png
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,472
541
113
you could also try adding a kprobe and collecting perf data including call stacks:

setup kprobe:
Code:
perf probe -m zfs -a zio_null
record 1 minute of perf data when you are experiencing the issue
Code:
perf record -ag -e probe:zio_null -F 99 -- sleep 60
convert the perf traces to readable text ;)
Code:
perf script > out.perf
and attach that file here.
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,472
541
113
The file is to large for the forum; uploaded to my server: http://apolloner.eu/~apollo13/out.perf
so that does make it look like it is actually just waiting for completion of (zfs recv) I/O, and it's probably just counted/attributed to zio_null by iotop. you might want to attach it to upstream's github issue nevertheless.
 

apollo13

New Member
Jan 16, 2018
18
1
3
49
I've attached the perf output to the issue. Any idea why it is waiting that much? The server is literally doing nothing (that is the vms running aren't doing much I/O -- certainly not 99%)
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,472
541
113
@fabian Ok, I've shut down all VMs and gathered a new perf: http://apolloner.eu/~apollo13/out2.perf -- I still see z_null_int at 99% every 3-5 seconds, but it reduced by much. Do so see anything obvious there? I'll try to start the vms one by one to see if one causes that.
the situation is still the same - iotop reports the z_null_int because there is stuff waiting for I/O to complete. in this case, we have:
  • pmxcfs (periodically writes because of LRM watchdog)
  • ip6tables (firewall checks periodically whether changes need to be applied)
  • ls
  • perf
  • pickup (postfix)
  • pvedaemon and pveproxy workers (open GUI?)
  • sshd
  • txg_sync (ZFS syncing regular writes)
  • zfs / zpool (possible pvestatd scanning configured pools or other requests over the API triggering such scans?)
this seems par for the course for an otherwise idle system, and is nothing to worry about in general.
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,472
541
113

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,472
541
113
@fabian: Got a new perf output while downloading a file in a VM (7GB) over the internet with maybe 2MB/s -- so not that much. The VM completely hangs http://apolloner.eu/~apollo13/proxmox_zfs/out_wtf.perf
that mainly shows waiting for sync writes (directly by the VM, or indirectly because it needs to swap stuff out) to finish. how is your pool setup? I assume you have swap on a ZVOL? how is that configured? could you re-run the test but collect the global perf data (e.g., "perf record -ag -F 99 -- sleep 60" for a 1 minute sample)?
 

apollo13

New Member
Jan 16, 2018
18
1
3
49
Yes, the system was setup with the default proxmox installer, zfs raid 1 with three disks. Swap according to fstab (no special config, whatever the installer did):
Code:
/dev/zvol/rpool/swap none swap sw 0 0
the VM is also in the same pool.

I'll try to capture the data this evening
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,472
541
113
Yes, the system was setup with the default proxmox installer, zfs raid 1 with three disks. Swap according to fstab (no special config, whatever the installer did):
Code:
/dev/zvol/rpool/swap none swap sw 0 0
the VM is also in the same pool.

I'll try to capture the data this evening
could you also get "zfs get all rpool/swap"? it might also make sense to repeat the test a third time and get some I/O data with "zpool iostat" (it has switches for request sizes and latency histograms, I'd use those to get a more in-depth view) and "iostat -dmx /dev/sd?" for data on the disks themselves.
 

Jeff Billimek

New Member
Feb 16, 2018
8
4
3
I'm experiencing this issue as well. It appears that zfs 0.7.6 corrects this for at least one person. Using the pvetest repo and associated backported zfs patches didn't seem to do the trick.

@fabian do you know when/if we can expect to see zfs 0.7.6 included in pvetest?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,472
541
113
I'm experiencing this issue as well. It appears that zfs 0.7.6 corrects this for at least one person. Using the pvetest repo and associated backported zfs patches didn't seem to do the trick.

@fabian do you know when/if we can expect to see zfs 0.7.6 included in pvetest?
likely today or tomorrow (already in internal testing).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!