[SOLVED] [z_null_int] with 99.99 % IO load after 5.1 upgrade

Discussion in 'Proxmox VE: Installation and configuration' started by JohnD, Nov 16, 2017.

Tags:
  1. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    did you reboot after installing the kernel? (alternatively, you could unload and reload the zfs modules, but that requires exporting all ZFS pools ;))
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  2. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
    Yes I did reboot.

    perf top shows:
    Code:
    Samples: 15K of event 'cycles:ppp', Event count (approx.): 3405002590
    Overhead  Shared Object             Symbol
       3.32%  perl                      [.] Perl_yyparse
       2.24%  [kernel]                  [k] copy_page
       2.16%  [kernel]                  [k] native_queued_spin_lock_slowpath
       1.73%  perl                      [.] Perl_yylex
       1.42%  [kernel]                  [k] multilist_sublist_next
       1.40%  [kernel]                  [k] lz4_compress_zfs
       1.39%  [kernel]                  [k] vmx_vcpu_run
       1.03%  perl                      [.] Perl_hv_common
       0.98%  [kernel]                  [k] l2arc_feed_thread
       0.97%  [kernel]                  [k] _raw_spin_lock_irqsave
       0.94%  [kernel]                  [k] clear_page_erms
       0.92%  [kernel]                  [k] _raw_spin_lock
       0.85%  [kernel]                  [k] __schedule
       0.77%  perl                      [.] Perl_Slab_Alloc
       0.74%  perl                      [.] Perl_rpeep
       0.71%  [kernel]                  [k] menu_select
       0.64%  [kernel]                  [k] update_load_avg
       0.62%  [kernel]                  [k] osq_lock
       0.59%  libqb.so.0.18.1           [.] qb_array_index
       0.54%  perl                      [.] Perl_leave_scope
       0.52%  perl                      [.] Perl_gv_fetchpvn_flags
       0.51%  [kernel]                  [k] syscall_return_via_sysret
       0.51%  [kernel]                  [k] kvm_arch_vcpu_ioctl_run
       0.49%  [kernel]                  [k] unmap_page_range
       0.48%  perl                      [.] Perl_sv_clear
       0.47%  [kernel]                  [k] cpuidle_enter_state
       0.45%  [kernel]                  [k] switch_mm_irqs_off
       0.45%  perl                      [.] Perl_sv_gets
       0.45%  libc-2.24.so              [.] malloc
       0.42%  [kernel]                  [k] native_irq_return_iret
       0.41%  [kernel]                  [k] mutex_lock
       0.41%  [kernel]                  [k] kallsyms_expand_symbol.constprop.0
       0.41%  libc-2.24.so              [.] __libc_calloc
       0.39%  perl                      [.] Perl_runops_standard
       0.38%  perl                      [.] Perl_sv_upgrade
       0.37%  [kernel]                  [k] ktime_get
       0.37%  [kernel]                  [k] memcpy_erms
       0.37%  [kernel]                  [k] __fget
       0.37%  [kernel]                  [k] native_write_msr
       0.35%  [unknown]                 [k] 0xfffffe000055a01b
       0.35%  [kernel]                  [k] format_decode
       0.35%  [kernel]                  [k] _find_next_bit
       0.34%  libqb.so.0.18.1           [.] qb_poll_fds_usage_check_
       0.33%  [kernel]                  [k] number
       0.32%  [kernel]                  [k] module_get_kallsym
    
     
  3. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,929
    Likes Received:
    361
    Thanks, but you have to run it when your IO load is high. The output shows where exactly the cpu cycles are used and then one can argue what happens in the kernel.

    here is an example showing that most of the time is spent in deflate_slow and longest_match to compress the datastream:

    perf.png
     
  4. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
    The posted output is from a time when iotop reports 99.99% I/O by z_null_int
     
  5. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,929
    Likes Received:
    361
  6. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    you could also try adding a kprobe and collecting perf data including call stacks:

    setup kprobe:
    Code:
    perf probe -m zfs -a zio_null
    
    record 1 minute of perf data when you are experiencing the issue
    Code:
    perf record -ag -e probe:zio_null -F 99 -- sleep 60 
    
    convert the perf traces to readable text ;)
    Code:
    perf script > out.perf
    
    and attach that file here.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  7. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
  8. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    so that does make it look like it is actually just waiting for completion of (zfs recv) I/O, and it's probably just counted/attributed to zio_null by iotop. you might want to attach it to upstream's github issue nevertheless.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  9. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
    I've attached the perf output to the issue. Any idea why it is waiting that much? The server is literally doing nothing (that is the vms running aren't doing much I/O -- certainly not 99%)
     
  10. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
    @fabian Ok, I've shut down all VMs and gathered a new perf: http://apolloner.eu/~apollo13/out2.perf -- I still see z_null_int at 99% every 3-5 seconds, but it reduced by much. Do so see anything obvious there? I'll try to start the vms one by one to see if one causes that.
     
  11. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    the situation is still the same - iotop reports the z_null_int because there is stuff waiting for I/O to complete. in this case, we have:
    • pmxcfs (periodically writes because of LRM watchdog)
    • ip6tables (firewall checks periodically whether changes need to be applied)
    • ls
    • perf
    • pickup (postfix)
    • pvedaemon and pveproxy workers (open GUI?)
    • sshd
    • txg_sync (ZFS syncing regular writes)
    • zfs / zpool (possible pvestatd scanning configured pools or other requests over the API triggering such scans?)
    this seems par for the course for an otherwise idle system, and is nothing to worry about in general.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  12. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
  13. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  14. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
    Sorry, didn't mean to imply that! Thank you for your efforts, much appreciated!
     
  15. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
  16. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    that mainly shows waiting for sync writes (directly by the VM, or indirectly because it needs to swap stuff out) to finish. how is your pool setup? I assume you have swap on a ZVOL? how is that configured? could you re-run the test but collect the global perf data (e.g., "perf record -ag -F 99 -- sleep 60" for a 1 minute sample)?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  17. apollo13

    apollo13 New Member

    Joined:
    Jan 16, 2018
    Messages:
    18
    Likes Received:
    1
    Yes, the system was setup with the default proxmox installer, zfs raid 1 with three disks. Swap according to fstab (no special config, whatever the installer did):
    Code:
    /dev/zvol/rpool/swap none swap sw 0 0
    the VM is also in the same pool.

    I'll try to capture the data this evening
     
  18. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    could you also get "zfs get all rpool/swap"? it might also make sense to repeat the test a third time and get some I/O data with "zpool iostat" (it has switches for request sizes and latency histograms, I'd use those to get a more in-depth view) and "iostat -dmx /dev/sd?" for data on the disks themselves.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  19. Jeff Billimek

    Jeff Billimek New Member

    Joined:
    Feb 16, 2018
    Messages:
    6
    Likes Received:
    4
    I'm experiencing this issue as well. It appears that zfs 0.7.6 corrects this for at least one person. Using the pvetest repo and associated backported zfs patches didn't seem to do the trick.

    @fabian do you know when/if we can expect to see zfs 0.7.6 included in pvetest?
     
  20. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    likely today or tomorrow (already in internal testing).
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice