[SOLVED] [z_null_int] with 99.99 % IO load after 5.1 upgrade

fabian · Jan 23, 2018

apollo13 said:
@fabian I have applied the patch to one server in the pool and modinfo seems to confirm:

Code:

filename: /lib/modules/4.13.13-5-pve/zfs/zfs.ko version: 0.7.4-1 license: CDDL author: OpenZFS on Linux description: ZFS srcversion: E8EDB9B5FFA260178BA7DC9 depends: spl,znvpair,zcommon,zunicode,zavl,icp name: zfs vermagic: 4.13.13-5-pve SMP mod_unload modversions

I/O according to iotop is still at 99.99% -- any hints?

did you reboot after installing the kernel? (alternatively, you could unload and reload the zfs modules, but that requires exporting all ZFS pools

)

apollo13 · Jan 24, 2018

Yes I did reboot.

perf top shows:

Code:

Samples: 15K of event 'cycles:ppp', Event count (approx.): 3405002590
Overhead  Shared Object             Symbol
   3.32%  perl                      [.] Perl_yyparse
   2.24%  [kernel]                  [k] copy_page
   2.16%  [kernel]                  [k] native_queued_spin_lock_slowpath
   1.73%  perl                      [.] Perl_yylex
   1.42%  [kernel]                  [k] multilist_sublist_next
   1.40%  [kernel]                  [k] lz4_compress_zfs
   1.39%  [kernel]                  [k] vmx_vcpu_run
   1.03%  perl                      [.] Perl_hv_common
   0.98%  [kernel]                  [k] l2arc_feed_thread
   0.97%  [kernel]                  [k] _raw_spin_lock_irqsave
   0.94%  [kernel]                  [k] clear_page_erms
   0.92%  [kernel]                  [k] _raw_spin_lock
   0.85%  [kernel]                  [k] __schedule
   0.77%  perl                      [.] Perl_Slab_Alloc
   0.74%  perl                      [.] Perl_rpeep
   0.71%  [kernel]                  [k] menu_select
   0.64%  [kernel]                  [k] update_load_avg
   0.62%  [kernel]                  [k] osq_lock
   0.59%  libqb.so.0.18.1           [.] qb_array_index
   0.54%  perl                      [.] Perl_leave_scope
   0.52%  perl                      [.] Perl_gv_fetchpvn_flags
   0.51%  [kernel]                  [k] syscall_return_via_sysret
   0.51%  [kernel]                  [k] kvm_arch_vcpu_ioctl_run
   0.49%  [kernel]                  [k] unmap_page_range
   0.48%  perl                      [.] Perl_sv_clear
   0.47%  [kernel]                  [k] cpuidle_enter_state
   0.45%  [kernel]                  [k] switch_mm_irqs_off
   0.45%  perl                      [.] Perl_sv_gets
   0.45%  libc-2.24.so              [.] malloc
   0.42%  [kernel]                  [k] native_irq_return_iret
   0.41%  [kernel]                  [k] mutex_lock
   0.41%  [kernel]                  [k] kallsyms_expand_symbol.constprop.0
   0.41%  libc-2.24.so              [.] __libc_calloc
   0.39%  perl                      [.] Perl_runops_standard
   0.38%  perl                      [.] Perl_sv_upgrade
   0.37%  [kernel]                  [k] ktime_get
   0.37%  [kernel]                  [k] memcpy_erms
   0.37%  [kernel]                  [k] __fget
   0.37%  [kernel]                  [k] native_write_msr
   0.35%  [unknown]                 [k] 0xfffffe000055a01b
   0.35%  [kernel]                  [k] format_decode
   0.35%  [kernel]                  [k] _find_next_bit
   0.34%  libqb.so.0.18.1           [.] qb_poll_fds_usage_check_
   0.33%  [kernel]                  [k] number
   0.32%  [kernel]                  [k] module_get_kallsym

LnxBil · Jan 24, 2018

apollo13 said:
perf top shows:

Thanks, but you have to run it when your IO load is high. The output shows where exactly the cpu cycles are used and then one can argue what happens in the kernel.

here is an example showing that most of the time is spent in deflate_slow and longest_match to compress the datastream:

apollo13 · Jan 24, 2018

LnxBil said:
Thanks, but you have to run it when your IO load is high.

The posted output is from a time when iotop reports 99.99% I/O by z_null_int

LnxBil · Jan 24, 2018

Oh, too bad it is not shown. The aforementioned #6171 was closed https://github.com/zfsonlinux/zfs/issues/6171 maybe another problem or still not fixed.

fabian · Jan 24, 2018

you could also try adding a kprobe and collecting perf data including call stacks:

setup kprobe:

Code:

perf probe -m zfs -a zio_null

record 1 minute of perf data when you are experiencing the issue

Code:

perf record -ag -e probe:zio_null -F 99 -- sleep 60

convert the perf traces to readable text

Code:

perf script > out.perf

and attach that file here.

apollo13 · Jan 24, 2018

The file is to large for the forum; uploaded to my server: http://apolloner.eu/~apollo13/out.perf

fabian · Jan 26, 2018

apollo13 said:
The file is to large for the forum; uploaded to my server: http://apolloner.eu/~apollo13/out.perf

so that does make it look like it is actually just waiting for completion of (zfs recv) I/O, and it's probably just counted/attributed to zio_null by iotop. you might want to attach it to upstream's github issue nevertheless.

apollo13 · Feb 1, 2018

I've attached the perf output to the issue. Any idea why it is waiting that much? The server is literally doing nothing (that is the vms running aren't doing much I/O -- certainly not 99%)

apollo13 · Feb 4, 2018

@fabian Ok, I've shut down all VMs and gathered a new perf: http://apolloner.eu/~apollo13/out2.perf -- I still see z_null_int at 99% every 3-5 seconds, but it reduced by much. Do so see anything obvious there? I'll try to start the vms one by one to see if one causes that.

fabian · Feb 12, 2018

apollo13 said:
@fabian Ok, I've shut down all VMs and gathered a new perf: http://apolloner.eu/~apollo13/out2.perf -- I still see z_null_int at 99% every 3-5 seconds, but it reduced by much. Do so see anything obvious there? I'll try to start the vms one by one to see if one causes that.

the situation is still the same - iotop reports the z_null_int because there is stuff waiting for I/O to complete. in this case, we have:

pmxcfs (periodically writes because of LRM watchdog)
ip6tables (firewall checks periodically whether changes need to be applied)
ls
perf
pickup (postfix)
pvedaemon and pveproxy workers (open GUI?)
sshd
txg_sync (ZFS syncing regular writes)
zfs / zpool (possible pvestatd scanning configured pools or other requests over the API triggering such scans?)

this seems par for the course for an otherwise idle system, and is nothing to worry about in general.

apollo13 · Feb 12, 2018

I've added more information to https://github.com/zfsonlinux/zfs/issues/6171#issuecomment-364197296 -- The idle load may be okay; but something feels wrong as soon as a VM is running. And apparently I am note the only one https://github.com/zfsonlinux/zfs/issues/6171#issuecomment-364362290

fabian · Feb 12, 2018

apollo13 said:
I've added more information to https://github.com/zfsonlinux/zfs/issues/6171#issuecomment-364197296 -- The idle load may be okay; but something feels wrong as soon as a VM is running. And apparently I am note the only one https://github.com/zfsonlinux/zfs/issues/6171#issuecomment-364362290

I am not saying there is no issue, I am just saying that the zio_null stuff looks benign... I'll try once more to reproduce the symptoms.

apollo13 · Feb 12, 2018

Sorry, didn't mean to imply that! Thank you for your efforts, much appreciated!

apollo13 · Feb 14, 2018

@fabian: Got a new perf output while downloading a file in a VM (7GB) over the internet with maybe 2MB/s -- so not that much. The VM completely hangs http://apolloner.eu/~apollo13/proxmox_zfs/out_wtf.perf

fabian · Feb 14, 2018

apollo13 said:
@fabian: Got a new perf output while downloading a file in a VM (7GB) over the internet with maybe 2MB/s -- so not that much. The VM completely hangs http://apolloner.eu/~apollo13/proxmox_zfs/out_wtf.perf

that mainly shows waiting for sync writes (directly by the VM, or indirectly because it needs to swap stuff out) to finish. how is your pool setup? I assume you have swap on a ZVOL? how is that configured? could you re-run the test but collect the global perf data (e.g., "perf record -ag -F 99 -- sleep 60" for a 1 minute sample)?

apollo13 · Feb 14, 2018

Yes, the system was setup with the default proxmox installer, zfs raid 1 with three disks. Swap according to fstab (no special config, whatever the installer did):

Code:

/dev/zvol/rpool/swap none swap sw 0 0

the VM is also in the same pool.

I'll try to capture the data this evening

fabian · Feb 14, 2018

apollo13 said:
Yes, the system was setup with the default proxmox installer, zfs raid 1 with three disks. Swap according to fstab (no special config, whatever the installer did):

Code:

/dev/zvol/rpool/swap none swap sw 0 0

the VM is also in the same pool.

I'll try to capture the data this evening

could you also get "zfs get all rpool/swap"? it might also make sense to repeat the test a third time and get some I/O data with "zpool iostat" (it has switches for request sizes and latency histograms, I'd use those to get a more in-depth view) and "iostat -dmx /dev/sd?" for data on the disks themselves.

Jeff Billimek · Feb 20, 2018

I'm experiencing this issue as well. It appears that zfs 0.7.6 corrects this for at least one person. Using the pvetest repo and associated backported zfs patches didn't seem to do the trick.

@fabian do you know when/if we can expect to see zfs 0.7.6 included in pvetest?

fabian · Feb 21, 2018

Jeff Billimek said:
I'm experiencing this issue as well. It appears that zfs 0.7.6 corrects this for at least one person. Using the pvetest repo and associated backported zfs patches didn't seem to do the trick.

@fabian do you know when/if we can expect to see zfs 0.7.6 included in pvetest?

likely today or tomorrow (already in internal testing).

[SOLVED] [z_null_int] with 99.99 % IO load after 5.1 upgrade

Proxmox Staff Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Member

Proxmox Staff Member

We value your privacy