arc_prune stuck in D state

ca_maer

Well-Known Member
Dec 5, 2017
181
14
58
45
Hello,

There seems to be a problem with one of our server where the arc_prune is stuck in D state. Sync from the initramfs-update never completes and wait on something which make me believe it's the arc_prune.

The server is a Dell R710 with 142GB ram and SSDs

I also have a bunch of errors:
Screen Shot 2018-11-19 at 11.54.17 AM.png

Here's a bunch of logs that might help:

Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-11
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-9
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve2~bpo1

Code:
root       308  0.0  0.0      0     0 ?        D    Nov16   0:00 [arc_prune]
root      4690  0.0  0.0   4200   672 ?        D    09:58   0:00 sync
root      8224  0.0  0.0      0     0 ?        D    Nov16   0:00 [kworker/u50:4]
root     10828  0.0  0.0   7060   712 ?        D    10:54   0:00 sync
root     10912  0.0  0.0   7060   660 pts/4    D    11:02   0:00 sync
root     13611  0.0  0.0   7060   696 ?        D    10:39   0:00 sync
root     17043  0.0  0.0   7060   676 pts/4    D    10:56   0:00 sync
root     19438  0.0  0.0   7060   652 pts/4    D    10:51   0:00 sync
root     23121  0.0  0.0   7060   700 ?        D    10:47   0:00 sync
root     32588  0.0  0.0   7060   616 ?        D    11:09   0:00 sync

Code:
------------------------------------------------------------------------
ZFS Subsystem Report                Mon Nov 19 11:29:32 2018
ARC Summary: (HEALTHY)
    Memory Throttle Count:            0

ARC Misc:
    Deleted:                20.63M
    Mutex Misses:                1.25k
    Evict Skips:                8.58k

ARC Size:                96.62%    20.52    GiB
    Target Size: (Adaptive)        100.00%    21.23    GiB
    Min Size (Hard Limit):        40.00%    8.49    GiB
    Max Size (High Water):        2:1    21.23    GiB

ARC Size Breakdown:
    Recently Used Cache Size:    19.62%    3.41    GiB
    Frequently Used Cache Size:    80.38%    13.99    GiB

ARC Hash Breakdown:
    Elements Max:                1.79M
    Elements Current:        66.87%    1.20M
    Collisions:                3.52M
    Chain Max:                4
    Chains:                    20.55k

ARC Total accesses:                    4.80G
    Cache Hit Ratio:        90.60%    4.34G
    Cache Miss Ratio:        9.40%    451.02M
    Actual Hit Ratio:        90.43%    4.34G

    Data Demand Efficiency:        98.61%    1.17G
    Data Prefetch Efficiency:    61.12%    15.28M

    CACHE HITS BY CACHE LIST:
      Anonymously Used:        0.14%    6.00M
      Most Recently Used:        10.02%    435.40M
      Most Frequently Used:        89.80%    3.90G
      Most Recently Used Ghost:    0.03%    1.26M
      Most Frequently Used Ghost:    0.02%    727.97k

    CACHE HITS BY DATA TYPE:
      Demand Data:            26.47%    1.15G
      Prefetch Data:        0.21%    9.34M
      Demand Metadata:        73.16%    3.18G
      Prefetch Metadata:        0.15%    6.56M

    CACHE MISSES BY DATA TYPE:
      Demand Data:            3.59%    16.21M
      Prefetch Data:        1.32%    5.94M
      Demand Metadata:        94.86%    427.83M
      Prefetch Metadata:        0.23%    1.03M


DMU Prefetch Efficiency:                    2.50G
    Hit Ratio:            1.19%    29.62M
    Miss Ratio:            98.81%    2.47G



ZFS Tunables:
    dbuf_cache_hiwater_pct                            10
    dbuf_cache_lowater_pct                            10
    dbuf_cache_max_bytes                              104857600
    dbuf_cache_max_shift                              5
    dmu_object_alloc_chunk_shift                      7
    ignore_hole_birth                                 1
    l2arc_feed_again                                  1
    l2arc_feed_min_ms                                 200
    l2arc_feed_secs                                   1
    l2arc_headroom                                    2
    l2arc_headroom_boost                              200
    l2arc_noprefetch                                  1
    l2arc_norw                                        0
    l2arc_write_boost                                 8388608
    l2arc_write_max                                   8388608
    metaslab_aliquot                                  524288
    metaslab_bias_enabled                             1
    metaslab_debug_load                               0
    metaslab_debug_unload                             0
    metaslab_fragmentation_factor_enabled             1
    metaslab_lba_weighting_enabled                    1
    metaslab_preload_enabled                          1
    metaslabs_per_vdev                                200
    send_holes_without_birth_time                     1
    spa_asize_inflation                               24
    spa_config_path                                   /etc/zfs/zpool.cache
    spa_load_verify_data                              1
    spa_load_verify_maxinflight                       10000
    spa_load_verify_metadata                          1
    spa_slop_shift                                    5
    zfetch_array_rd_sz                                1048576
    zfetch_max_distance                               8388608
    zfetch_max_streams                                8
    zfetch_min_sec_reap                               2
    zfs_abd_scatter_enabled                           1
    zfs_abd_scatter_max_order                         10
    zfs_admin_snapshot                                1
    zfs_arc_average_blocksize                         8192
    zfs_arc_dnode_limit                               0
    zfs_arc_dnode_limit_percent                       10
    zfs_arc_dnode_reduce_percent                      10
    zfs_arc_grow_retry                                0
    zfs_arc_lotsfree_percent                          10
    zfs_arc_max                                       22797719962
    zfs_arc_meta_adjust_restarts                      4096
    zfs_arc_meta_limit                                0
    zfs_arc_meta_limit_percent                        75
    zfs_arc_meta_min                                  0
    zfs_arc_meta_prune                                10000
    zfs_arc_meta_strategy                             1
    zfs_arc_min                                       9119087985
    zfs_arc_min_prefetch_lifespan                     0
    zfs_arc_p_dampener_disable                        1
    zfs_arc_p_min_shift                               0
    zfs_arc_pc_percent                                0
    zfs_arc_shrink_shift                              0
    zfs_arc_sys_free                                  0
    zfs_autoimport_disable                            1
    zfs_checksums_per_second                          20
    zfs_compressed_arc_enabled                        1
    zfs_dbgmsg_enable                                 0
    zfs_dbgmsg_maxsize                                4194304
    zfs_dbuf_state_index                              0
    zfs_deadman_checktime_ms                          5000
    zfs_deadman_enabled                               1
    zfs_deadman_synctime_ms                           1000000
    zfs_dedup_prefetch                                0
    zfs_delay_min_dirty_percent                       60
    zfs_delay_scale                                   500000
    zfs_delays_per_second                             20
    zfs_delete_blocks                                 20480
    zfs_dirty_data_max                                4294967296
    zfs_dirty_data_max_max                            4294967296
    zfs_dirty_data_max_max_percent                    25
    zfs_dirty_data_max_percent                        10
    zfs_dirty_data_sync                               67108864
    zfs_dmu_offset_next_sync                          0
    zfs_expire_snapshot                               300
    zfs_flags                                         0
    zfs_free_bpobj_enabled                            1
    zfs_free_leak_on_eio                              0
    zfs_free_max_blocks                               100000
    zfs_free_min_time_ms                              1000
    zfs_immediate_write_sz                            32768
    zfs_max_recordsize                                1048576
    zfs_mdcomp_disable                                0
    zfs_metaslab_fragmentation_threshold              70
    zfs_metaslab_segment_weight_enabled               1
    zfs_metaslab_switch_threshold                     2
    zfs_mg_fragmentation_threshold                    85
    zfs_mg_noalloc_threshold                          0
    zfs_multihost_fail_intervals                      5
    zfs_multihost_history                             0
    zfs_multihost_import_intervals                    10
    zfs_multihost_interval                            1000
    zfs_multilist_num_sublists                        0
    zfs_no_scrub_io                                   0
    zfs_no_scrub_prefetch                             0
    zfs_nocacheflush                                  0
    zfs_nopwrite_enabled                              1
    zfs_object_mutex_size                             64
    zfs_pd_bytes_max                                  52428800
    zfs_per_txg_dirty_frees_percent                   30
    zfs_prefetch_disable                              0
    zfs_read_chunk_size                               1048576
    zfs_read_history                                  0
    zfs_read_history_hits                             0
    zfs_recover                                       0
    zfs_recv_queue_length                             16777216
    zfs_resilver_delay                                2
    zfs_resilver_min_time_ms                          3000
    zfs_scan_idle                                     50
    zfs_scan_ignore_errors                            0
    zfs_scan_min_time_ms                              1000
    zfs_scrub_delay                                   4
    zfs_send_corrupt_data                             0
    zfs_send_queue_length                             16777216
    zfs_sync_pass_deferred_free                       2
    zfs_sync_pass_dont_compress                       5
    zfs_sync_pass_rewrite                             2
    zfs_sync_taskq_batch_pct                          75
    zfs_top_maxinflight                               32
    zfs_txg_history                                   0
    zfs_txg_timeout                                   5
    zfs_vdev_aggregation_limit                        131072
    zfs_vdev_async_read_max_active                    3
    zfs_vdev_async_read_min_active                    1
    zfs_vdev_async_write_active_max_dirty_percent     60
    zfs_vdev_async_write_active_min_dirty_percent     30
    zfs_vdev_async_write_max_active                   10
    zfs_vdev_async_write_min_active                   2
    zfs_vdev_cache_bshift                             16
    zfs_vdev_cache_max                                16384
    zfs_vdev_cache_size                               0
    zfs_vdev_max_active                               1000
    zfs_vdev_mirror_non_rotating_inc                  0
    zfs_vdev_mirror_non_rotating_seek_inc             1
    zfs_vdev_mirror_rotating_inc                      0
    zfs_vdev_mirror_rotating_seek_inc                 5
    zfs_vdev_mirror_rotating_seek_offset              1048576
    zfs_vdev_queue_depth_pct                          1000
    zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3
    zfs_vdev_read_gap_limit                           32768
    zfs_vdev_scheduler                                noop
    zfs_vdev_scrub_max_active                         2
    zfs_vdev_scrub_min_active                         1
    zfs_vdev_sync_read_max_active                     10
    zfs_vdev_sync_read_min_active                     10
    zfs_vdev_sync_write_max_active                    10
    zfs_vdev_sync_write_min_active                    10
    zfs_vdev_write_gap_limit                          4096
    zfs_zevent_cols                                   80
    zfs_zevent_console                                0
    zfs_zevent_len_max                                384
    zil_replay_disable                                0
    zil_slog_bulk                                     786432
    zio_delay_max                                     30000
    zio_dva_throttle_enabled                          1
    zio_requeue_io_start_cut_in_line                  1
    zio_taskq_batch_pct                               75
    zvol_inhibit_dev                                  0
    zvol_major                                        230
    zvol_max_discard_blocks                           16384
    zvol_prefetch_bytes                               131072
    zvol_request_sync                                 0
    zvol_threads                                      32
    zvol_volmode                                      1

Code:
options zfs zfs_arc_min=9119087985
options zfs zfs_arc_max=22797719962

Any idea what might cause this ? The only fix so far is rebooting :/
 
Hi,

does this happen only with the last kernel?
Do you have an iPERC on this server?
If yes ZFS is not compatible with Raid controller.
Also, JBOD mode is still a Raid.
 
Do you know which was the first kernel Version where this occurs?
And which was the last working one?
 
Still happened today when rolling back a snapshot
Code:
root     25899  0.0  0.0  35148  3108 ?        D    10:56   0:00 zfs rollback rpool/data/subvol-101-disk-0@pre_deploy

EDIT: Realised it's behind in kernel. I will update and reboot to see if it solves the issue

Code:
 INFO: task kworker/u66:3:17147 blocked for more than 120 seconds.
[2019-01-09 10:03:43]        Tainted: P          IO     4.15.17-2-pve #1
[2019-01-09 10:03:43]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2019-01-09 10:03:43]  kworker/u66:3   D    0 17147      2 0x80000000
[2019-01-09 10:03:43]  Workqueue: writeback wb_workfn (flush-zfs-237)
[2019-01-09 10:03:43]  Call Trace:
[2019-01-09 10:03:43]   __schedule+0x3e3/0x880
[2019-01-09 10:03:43]   schedule+0x36/0x80
[2019-01-09 10:03:43]   io_schedule+0x16/0x40
[2019-01-09 10:03:43]   __lock_page+0xff/0x140
[2019-01-09 10:03:43]   ? page_cache_tree_insert+0xe0/0xe0
[2019-01-09 10:03:43]   write_cache_pages+0x2e9/0x4b0
[2019-01-09 10:03:43]   ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[2019-01-09 10:03:43]   ? _cond_resched+0x1a/0x50
[2019-01-09 10:03:43]   ? mutex_lock+0x12/0x40
[2019-01-09 10:03:43]   ? rrw_exit+0x5a/0x150 [zfs]
[2019-01-09 10:03:43]   zpl_writepages+0x91/0x170 [zfs]
[2019-01-09 10:03:43]   do_writepages+0x1f/0x70
[2019-01-09 10:03:43]   __writeback_single_inode+0x45/0x330
[2019-01-09 10:03:43]   writeback_sb_inodes+0x266/0x590
[2019-01-09 10:03:43]   __writeback_inodes_wb+0x92/0xc0
[2019-01-09 10:03:43]   wb_writeback+0x288/0x320
[2019-01-09 10:03:43]   wb_workfn+0x1a3/0x450
[2019-01-09 10:03:43]   ? wb_workfn+0x1a3/0x450
[2019-01-09 10:03:43]   process_one_work+0x1e0/0x400
[2019-01-09 10:03:43]   worker_thread+0x4b/0x420
[2019-01-09 10:03:43]   kthread+0x105/0x140
[2019-01-09 10:03:43]   ? process_one_work+0x400/0x400
[2019-01-09 10:03:43]   ? kthread_create_worker_on_cpu+0x70/0x70
[2019-01-09 10:03:43]   ? do_syscall_64+0x73/0x130
[2019-01-09 10:03:43]   ? SyS_exit_group+0x14/0x20
[2019-01-09 10:03:43]   ret_from_fork+0x35/0x40
[2019-01-09 10:03:44]  audit: type=1400 audit(1547046543.654:220): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-101_</var/lib/lxc>" name="/" pid=951 comm="(ionclean)" flags="rw, rslave"
[2019-01-09 10:05:43]  INFO: task kworker/u66:3:17147 blocked for more than 120 seconds.
[2019-01-09 10:05:43]        Tainted: P          IO     4.15.17-2-pve #1
[2019-01-09 10:05:43]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2019-01-09 10:05:43]  kworker/u66:3   D    0 17147      2 0x80000000
[2019-01-09 10:05:43]  Workqueue: writeback wb_workfn (flush-zfs-237)
[2019-01-09 10:05:43]  Call Trace:
[2019-01-09 10:05:43]   __schedule+0x3e3/0x880
[2019-01-09 10:05:43]   schedule+0x36/0x80
[2019-01-09 10:05:43]   io_schedule+0x16/0x40
[2019-01-09 10:05:43]   __lock_page+0xff/0x140
[2019-01-09 10:05:43]   ? page_cache_tree_insert+0xe0/0xe0
[2019-01-09 10:05:43]   write_cache_pages+0x2e9/0x4b0
[2019-01-09 10:05:43]   ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[2019-01-09 10:05:43]   ? _cond_resched+0x1a/0x50
[2019-01-09 10:05:43]   ? mutex_lock+0x12/0x40
[2019-01-09 10:05:43]   ? rrw_exit+0x5a/0x150 [zfs]
[2019-01-09 10:05:43]   zpl_writepages+0x91/0x170 [zfs]
[2019-01-09 10:05:43]   do_writepages+0x1f/0x70
[2019-01-09 10:05:43]   __writeback_single_inode+0x45/0x330
[2019-01-09 10:05:43]   writeback_sb_inodes+0x266/0x590
[2019-01-09 10:05:43]   __writeback_inodes_wb+0x92/0xc0
[2019-01-09 10:05:43]   wb_writeback+0x288/0x320
[2019-01-09 10:05:43]   wb_workfn+0x1a3/0x450
[2019-01-09 10:05:43]   ? wb_workfn+0x1a3/0x450
[2019-01-09 10:05:43]   process_one_work+0x1e0/0x400
[2019-01-09 10:05:43]   worker_thread+0x4b/0x420
[2019-01-09 10:05:43]   kthread+0x105/0x140
[2019-01-09 10:05:43]   ? process_one_work+0x400/0x400
[2019-01-09 10:05:43]   ? kthread_create_worker_on_cpu+0x70/0x70
[2019-01-09 10:05:43]   ? do_syscall_64+0x73/0x130
[2019-01-09 10:05:43]   ? SyS_exit_group+0x14/0x20
[2019-01-09 10:05:43]   ret_from_fork+0x35/0x40
[2019-01-09 10:07:44]  INFO: task kworker/u66:3:17147 blocked for more than 120 seconds.
[2019-01-09 10:07:44]        Tainted: P          IO     4.15.17-2-pve #1
[2019-01-09 10:07:44]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2019-01-09 10:07:44]  kworker/u66:3   D    0 17147      2 0x80000000
[2019-01-09 10:07:44]  Workqueue: writeback wb_workfn (flush-zfs-237)
[2019-01-09 10:07:44]  Call Trace:
[2019-01-09 10:07:44]   __schedule+0x3e3/0x880
[2019-01-09 10:07:44]   schedule+0x36/0x80
[2019-01-09 10:07:44]   io_schedule+0x16/0x40
[2019-01-09 10:07:44]   __lock_page+0xff/0x140
[2019-01-09 10:07:44]   ? page_cache_tree_insert+0xe0/0xe0
[2019-01-09 10:07:44]   write_cache_pages+0x2e9/0x4b0
[2019-01-09 10:07:44]   ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[2019-01-09 10:07:44]   ? _cond_resched+0x1a/0x50
[2019-01-09 10:07:44]   ? mutex_lock+0x12/0x40
[2019-01-09 10:07:44]   ? rrw_exit+0x5a/0x150 [zfs]
[2019-01-09 10:07:44]   zpl_writepages+0x91/0x170 [zfs]
[2019-01-09 10:07:44]   do_writepages+0x1f/0x70
[2019-01-09 10:07:44]   __writeback_single_inode+0x45/0x330
[2019-01-09 10:07:44]   writeback_sb_inodes+0x266/0x590
[2019-01-09 10:07:44]   __writeback_inodes_wb+0x92/0xc0
[2019-01-09 10:07:44]   wb_writeback+0x288/0x320
[2019-01-09 10:07:44]   wb_workfn+0x1a3/0x450
[2019-01-09 10:07:44]   ? wb_workfn+0x1a3/0x450
[2019-01-09 10:07:44]   process_one_work+0x1e0/0x400
[2019-01-09 10:07:44]   worker_thread+0x4b/0x420
[2019-01-09 10:07:44]   kthread+0x105/0x140
[2019-01-09 10:07:44]   ? process_one_work+0x400/0x400
[2019-01-09 10:07:44]   ? kthread_create_worker_on_cpu+0x70/0x70
[2019-01-09 10:07:44]   ? do_syscall_64+0x73/0x130
[2019-01-09 10:07:44]   ? SyS_exit_group+0x14/0x20
[2019-01-09 10:07:44]   ret_from_fork+0x35/0x40
[2019-01-09 10:09:45]  INFO: task kworker/u66:3:17147 blocked for more than 120 seconds.
[2019-01-09 10:09:45]        Tainted: P          IO     4.15.17-2-pve #1
[2019-01-09 10:09:45]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2019-01-09 10:09:45]  kworker/u66:3   D    0 17147      2 0x80000000
[2019-01-09 10:09:45]  Workqueue: writeback wb_workfn (flush-zfs-237)
[2019-01-09 10:09:45]  Call Trace:
[2019-01-09 10:09:45]   __schedule+0x3e3/0x880
[2019-01-09 10:09:45]   schedule+0x36/0x80
[2019-01-09 10:09:45]   io_schedule+0x16/0x40
[2019-01-09 10:09:45]   __lock_page+0xff/0x140
[2019-01-09 10:09:45]   ? page_cache_tree_insert+0xe0/0xe0
[2019-01-09 10:09:45]   write_cache_pages+0x2e9/0x4b0
[2019-01-09 10:09:45]   ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[2019-01-09 10:09:45]   ? _cond_resched+0x1a/0x50
[2019-01-09 10:09:45]   ? mutex_lock+0x12/0x40
[2019-01-09 10:09:45]   ? rrw_exit+0x5a/0x150 [zfs]
[2019-01-09 10:09:45]   zpl_writepages+0x91/0x170 [zfs]
[2019-01-09 10:09:45]   do_writepages+0x1f/0x70
[2019-01-09 10:09:45]   __writeback_single_inode+0x45/0x330
[2019-01-09 10:09:45]   writeback_sb_inodes+0x266/0x590
[2019-01-09 10:09:45]   __writeback_inodes_wb+0x92/0xc0
[2019-01-09 10:09:45]   wb_writeback+0x288/0x320
[2019-01-09 10:09:45]   wb_workfn+0x1a3/0x450
[2019-01-09 10:09:45]   ? wb_workfn+0x1a3/0x450
[2019-01-09 10:09:45]   process_one_work+0x1e0/0x400
[2019-01-09 10:09:45]   worker_thread+0x4b/0x420
[2019-01-09 10:09:45]   kthread+0x105/0x140
[2019-01-09 10:09:45]   ? process_one_work+0x400/0x400
[2019-01-09 10:09:45]   ? kthread_create_worker_on_cpu+0x70/0x70
[2019-01-09 10:09:45]   ? do_syscall_64+0x73/0x130
[2019-01-09 10:09:45]   ? SyS_exit_group+0x14/0x20
[2019-01-09 10:09:45]   ret_from_fork+0x35/0x40
[2019-01-09 10:10:16]  php[24179]: segfault at c8 ip 000055bff72bb05e sp 00007ffff87b7df0 error 4 in php7.2[55bff6fd7000+418000]
[2019-01-09 10:10:16]  php[24178]: segfault at c8 ip 0000561bf710205e sp 00007ffef10c9110 error 4 in php7.2[561bf6e1e000+418000]

Code:
[<0>] cv_wait_common+0x11e/0x140 [spl]
[<0>] __cv_wait+0x15/0x20 [spl]
[<0>] rrw_enter_write+0x3c/0xa0 [zfs]
[<0>] rrm_enter+0x2a/0x70 [zfs]
[<0>] zfsvfs_teardown+0x7b/0x300 [zfs]
[<0>] zfs_suspend_fs+0x10/0x20 [zfs]
[<0>] zfs_ioc_rollback+0xe2/0x1a0 [zfs]
[<0>] zfsdev_ioctl+0x216/0x670 [zfs]
[<0>] do_vfs_ioctl+0xa6/0x620
[<0>] SyS_ioctl+0x79/0x90
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.17-2-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
Last edited:
Still happened today. Seems to be related to the call to sync and not arc_prune. Here's some more logs. Let me know if you need any informations:

Code:
root      6959  1.5  0.0 199056 72048 ?        SLsl  2018 1112:56 /usr/sbin/corosync -f
root     19671  0.0  0.0   7060   664 ?        D    Jan14   0:00      \_ sync

Code:
cat /proc/19671/stack
[<0>] wb_wait_for_completion+0x64/0x90
[<0>] sync_inodes_sb+0xa6/0x280
[<0>] sync_inodes_one_sb+0x15/0x20
[<0>] iterate_supers+0xbf/0x110
[<0>] sys_sync+0x42/0xb0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

Code:
/proc/19671# cat syscall
162 0x7ff0dd329078 0x7ffded9d72b8 0x7ff0dd561e01 0x81f 0x0 0x0 0x7ffded9d7188 0x7ff0dd2a75b7

Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-7-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-2-pve: 4.15.18-21
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
Last edited:
I do. I thought Proxmox set one by default during the installation ?

Code:
rpool/swap                    8.50G  1004G   869M  -

Code:
Filename                Type        Size    Used    Priority
/dev/zd32                               partition    8388604    0    -2

When testing. I could crash the whole system when hitting swap and the processes changed to a D state (using stress). Could it be the same when sync was called ? What I find weird is that It's stuck in D state but it did not crashed the whole system like the github issue mention.

There is also no mention of swap in the stack trace.
 
I do. I thought Proxmox set one by default during the installation ?
It did - we changed that in the 5.3 installer for precisely this issue (the github link)
I would suggest to disable your swap-zvol if possible - it takes one potential culprit out of the system
Also - please try reproducing it in the latest released kernel ( pve-kernel-4.15.18-9-pve: 4.15.18-30)​
 
hey Stoiko,

Even after disable the swap, the issue is still present.

Code:
root      6073  0.0  0.0   7060   760 pts/10   D+   09:04   0:00  |                           \_ sync

Code:
[<0>] wb_wait_for_completion+0x64/0x90
[<0>] sync_inodes_sb+0xa6/0x280
[<0>] sync_inodes_one_sb+0x15/0x20
[<0>] iterate_supers+0xbf/0x110
[<0>] sys_sync+0x42/0xb0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

Code:
cat /proc/6073/syscall
162 0x7f633f7c1078 0x7fff602753f8 0x7f633f9f9e01 0x81f 0x0 0x0 0x7fff602752c8 0x7f633f73f5b7

Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-8-pve)
pve-manager: 5.3-8 (running version: 5.3-8/2929af8e)
pve-kernel-4.15: 5.3-1
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-19
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-36
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-33
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-45
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

Code:
execve("/usr/sbin/update-initramfs", ["update-initramfs", "-u"], [/* 16 vars */]) = 0
brk(NULL)                               = 0x56112fbcd000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17c42f7000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=65446, ...}) = 0
mmap(NULL, 65446, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f17c42e7000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\4\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1689360, ...}) = 0
mmap(NULL, 3795296, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f17c3d38000
mprotect(0x7f17c3ecd000, 2097152, PROT_NONE) = 0
mmap(0x7f17c40cd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f17c40cd000
mmap(0x7f17c40d3000, 14688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f17c40d3000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17c42e5000
arch_prctl(ARCH_SET_FS, 0x7f17c42e5700) = 0
mprotect(0x7f17c40cd000, 16384, PROT_READ) = 0
mprotect(0x56112f1dd000, 8192, PROT_READ) = 0
mprotect(0x7f17c42fa000, 4096, PROT_READ) = 0
munmap(0x7f17c42e7000, 65446)           = 0
getpid()                                = 8030
rt_sigaction(SIGCHLD, {sa_handler=0x56112efd3ef0, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
geteuid()                               = 0
brk(NULL)                               = 0x56112fbcd000
brk(0x56112fbee000)                     = 0x56112fbee000
getppid()                               = 8028
stat("/root", {st_mode=S_IFDIR|0700, st_size=19, ...}) = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=19, ...}) = 0
open("/usr/sbin/update-initramfs", O_RDONLY) = 3
fcntl(3, F_DUPFD, 10)                   = 10
close(3)                                = 0
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x56112efd3ef0, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
rt_sigaction(SIGQUIT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
rt_sigaction(SIGTERM, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
read(10, "#!/bin/sh\n\nSTATEDIR=/var/lib/ini"..., 8192) = 8192
faccessat(AT_FDCWD, "/etc/initramfs-tools/update-initramfs.conf", R_OK) = 0
open("/etc/initramfs-tools/update-initramfs.conf", O_RDONLY) = 3
fcntl(3, F_DUPFD, 10)                   = 11
close(3)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
read(11, "#\n# Configuration file for updat"..., 8192) = 378
read(11, "", 8192)                      = 0
close(11)                               = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8031
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 8031
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8031, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8031
read(10, ";\n\tu)\n\t\tupdate\n\t\t;;\nesac\n", 8192) = 25
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8032
close(4)                                = 0
read(3, "4.15.18-10-pve\n4.15.18-9-pve\n4.1"..., 128) = 128
read(3, ".13.16-3-pve\n4.13.16-2-pve\n4.13."..., 128) = 128
read(3, "3.8-3-pve\n4.13.8-2-pve\n4.13.4-1-"..., 128) = 36
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8032, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 36
read(3, "", 128)                        = 0
close(3)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8032
faccessat(AT_FDCWD, "/proc/mounts", R_OK) = 0
stat("/usr/local/sbin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/bin/ischroot", {st_mode=S_IFREG|0755, st_size=10552, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8035
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 8035
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8035, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8035
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8036
close(4)                                = 0
read(3, "", 128)                        = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8036, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 0
close(3)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8036
stat("/boot/initrd.img-4.15.18-10-pve", {st_mode=S_IFREG|0644, st_size=36184842, ...}) = 0
stat("/usr/local/sbin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/bin/sha1sum", {st_mode=S_IFREG|0755, st_size=47976, ...}) = 0
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8037
close(4)                                = 0
stat("/usr/local/sbin/diff", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/diff", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/diff", 0x7ffe69edd340)  = -1 ENOENT (No such file or directory)
stat("/usr/bin/diff", {st_mode=S_IFREG|0755, st_size=146824, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8038
close(3)                                = 0
close(-1)                               = -1 EBADF (Bad file descriptor)
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8037
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8037, si_uid=0, si_status=0, si_utime=9, si_stime=1} ---
rt_sigreturn({mask=[]})                 = 8037
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8038
faccessat(AT_FDCWD, "/boot/initrd.img-4.15.18-10-pve", R_OK) = 0
faccessat(AT_FDCWD, "/boot/initrd.img-4.15.18-10-pve.dpkg-bak", R_OK) = 0
stat("/usr/local/sbin/rm", 0x7ffe69edd520) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/rm", 0x7ffe69edd520) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/rm", 0x7ffe69edd520)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/rm", 0x7ffe69edd520)     = -1 ENOENT (No such file or directory)
stat("/sbin/rm", 0x7ffe69edd520)        = -1 ENOENT (No such file or directory)
stat("/bin/rm", {st_mode=S_IFREG|0755, st_size=64424, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8039
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8039
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8039, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8039
stat("/usr/local/sbin/ln", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ln", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ln", 0x7ffe69edd540)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/ln", 0x7ffe69edd540)     = -1 ENOENT (No such file or directory)
stat("/sbin/ln", 0x7ffe69edd540)        = -1 ENOENT (No such file or directory)
stat("/bin/ln", {st_mode=S_IFREG|0755, st_size=56240, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8040
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8040
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8040, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8040
write(1, "update-initramfs: Generating /bo"..., 61update-initramfs: Generating /boot/initrd.img-4.15.18-10-pve
) = 61
stat("/usr/local/sbin/mkinitramfs", 0x7ffe69edd580) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/mkinitramfs", 0x7ffe69edd580) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/mkinitramfs", {st_mode=S_IFREG|0755, st_size=10255, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8041
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8041
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8041, si_uid=0, si_status=0, si_utime=1, si_stime=3} ---
rt_sigreturn({mask=[]})                 = 8041
stat("/usr/local/sbin/mv", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/mv", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/mv", 0x7ffe69edd540)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/mv", 0x7ffe69edd540)     = -1 ENOENT (No such file or directory)
stat("/sbin/mv", 0x7ffe69edd540)        = -1 ENOENT (No such file or directory)
stat("/bin/mv", {st_mode=S_IFREG|0755, st_size=126416, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14543
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14543
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14543, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 14543
open("/var/lib/initramfs-tools/4.15.18-10-pve", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(3, 1)                              = 1
close(3)                                = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14544
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14544
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14544, si_uid=0, si_status=0, si_utime=8, si_stime=1} ---
rt_sigreturn({mask=[]})                 = 14544
dup2(11, 1)                             = 1
close(11)                               = 0
stat("/usr/local/sbin/sync", 0x7ffe69edd560) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/sync", 0x7ffe69edd560) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/sync", 0x7ffe69edd560)  = -1 ENOENT (No such file or directory)
stat("/usr/bin/sync", 0x7ffe69edd560)   = -1 ENOENT (No such file or directory)
stat("/sbin/sync", 0x7ffe69edd560)      = -1 ENOENT (No such file or directory)
stat("/bin/sync", {st_mode=S_IFREG|0755, st_size=31496, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14545
wait4(-1,

Code:
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=74244120k,nr_inodes=18561030,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=14852988k,mode=755)
rpool/ROOT/pve-1 on / type zfs (rw,relatime,xattr,noacl)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=36,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=4157)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
rpool on /rpool type zfs (rw,noatime,xattr,noacl)
rpool/ROOT on /rpool/ROOT type zfs (rw,noatime,xattr,noacl)
rpool/data on /rpool/data type zfs (rw,noatime,xattr,noacl)
rpool/data/subvol-102-disk-0 on /rpool/data/subvol-102-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-110-disk-0 on /rpool/data/subvol-110-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-111-disk-1 on /rpool/data/subvol-111-disk-1 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-120-disk-0 on /rpool/data/subvol-120-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-134-disk-1 on /rpool/data/subvol-134-disk-1 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-137-disk-1 on /rpool/data/subvol-137-disk-1 type zfs (rw,noatime,xattr,posixacl)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
dev-nas-1:/mnt/Pool-NAS/Backups on /mnt/dev-nas-1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.104,mountvers=3,mountport=881,mountproto=udp,local_lock=none,addr=192.168.1.104)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
dev-nas-1:/mnt/Pool-NAS/Proxmox on /mnt/pve/dev-nas-1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.104,mountvers=3,mountport=881,mountproto=udp,local_lock=none,addr=192.168.1.104)
rpool/data/subvol-104-disk-0 on /rpool/data/subvol-104-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-101-disk-0 on /rpool/data/subvol-101-disk-0 type zfs (rw,noatime,xattr,posixacl)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700)
rpool/data/subvol-109-disk-0 on /rpool/data/subvol-109-disk-0 type zfs (rw,noatime,xattr,posixacl)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
tmpfs on /run/user/1006 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700,uid=1006,gid=1006)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700,uid=1001,gid=1001)

Code:
[Mon Feb  4 11:08:25 2019] INFO: task kworker/u49:1:30643 blocked for more than 120 seconds.
[Mon Feb  4 11:08:25 2019]       Tainted: P        W IO     4.15.18-8-pve #1
[Mon Feb  4 11:08:25 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Feb  4 11:08:25 2019] kworker/u49:1   D    0 30643      2 0x80000000
[Mon Feb  4 11:08:25 2019] Workqueue: writeback wb_workfn (flush-zfs-9)
[Mon Feb  4 11:08:25 2019] Call Trace:
[Mon Feb  4 11:08:25 2019]  __schedule+0x3e0/0x870
[Mon Feb  4 11:08:25 2019]  schedule+0x36/0x80
[Mon Feb  4 11:08:25 2019]  io_schedule+0x16/0x40
[Mon Feb  4 11:08:25 2019]  __lock_page+0xff/0x140
[Mon Feb  4 11:08:25 2019]  ? page_cache_tree_insert+0xe0/0xe0
[Mon Feb  4 11:08:25 2019]  write_cache_pages+0x2e9/0x4b0
[Mon Feb  4 11:08:25 2019]  ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[Mon Feb  4 11:08:25 2019]  ? _cond_resched+0x1a/0x50
[Mon Feb  4 11:08:25 2019]  ? mutex_lock+0x12/0x40
[Mon Feb  4 11:08:25 2019]  ? rrw_exit+0x5a/0x150 [zfs]
[Mon Feb  4 11:08:25 2019]  zpl_writepages+0x91/0x170 [zfs]
[Mon Feb  4 11:08:25 2019]  do_writepages+0x1f/0x70
[Mon Feb  4 11:08:25 2019]  __writeback_single_inode+0x45/0x330
[Mon Feb  4 11:08:25 2019]  writeback_sb_inodes+0x266/0x590
[Mon Feb  4 11:08:25 2019]  __writeback_inodes_wb+0x92/0xc0
[Mon Feb  4 11:08:25 2019]  wb_writeback+0x288/0x320
[Mon Feb  4 11:08:25 2019]  wb_workfn+0x1a3/0x440
[Mon Feb  4 11:08:25 2019]  ? wb_workfn+0x1a3/0x440
[Mon Feb  4 11:08:25 2019]  process_one_work+0x1e0/0x400
[Mon Feb  4 11:08:25 2019]  worker_thread+0x4b/0x420
[Mon Feb  4 11:08:25 2019]  kthread+0x105/0x140
[Mon Feb  4 11:08:25 2019]  ? process_one_work+0x400/0x400
[Mon Feb  4 11:08:25 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Mon Feb  4 11:08:25 2019]  ? do_syscall_64+0x73/0x130
[Mon Feb  4 11:08:25 2019]  ? SyS_exit_group+0x14/0x20
[Mon Feb  4 11:08:25 2019]  ret_from_fork+0x35/0x40

Is there anything else I can provide so we can fix this once and for all ?

Thanks

EDIT: I've found this issue on the ZOL github which is exactly what is happening here. Unfortunately it's still open. https://github.com/zfsonlinux/zfs/issues/7484
 
Last edited:
Hey,

The update-initramfs is hanging after issueing a `sync` - any chance that's related to the NFS-mounts you have on your system?
* can you disable the Storages and unmount them temporarily?
* alternatively can you write/read to/from them?