arc_prune stuck in D state

ca_maer

Well-Known Member
Dec 5, 2017
181
14
58
44
Hello,

There seems to be a problem with one of our server where the arc_prune is stuck in D state. Sync from the initramfs-update never completes and wait on something which make me believe it's the arc_prune.

The server is a Dell R710 with 142GB ram and SSDs

I also have a bunch of errors:
Screen Shot 2018-11-19 at 11.54.17 AM.png

Here's a bunch of logs that might help:

Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-11
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-9
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve2~bpo1

Code:
root       308  0.0  0.0      0     0 ?        D    Nov16   0:00 [arc_prune]
root      4690  0.0  0.0   4200   672 ?        D    09:58   0:00 sync
root      8224  0.0  0.0      0     0 ?        D    Nov16   0:00 [kworker/u50:4]
root     10828  0.0  0.0   7060   712 ?        D    10:54   0:00 sync
root     10912  0.0  0.0   7060   660 pts/4    D    11:02   0:00 sync
root     13611  0.0  0.0   7060   696 ?        D    10:39   0:00 sync
root     17043  0.0  0.0   7060   676 pts/4    D    10:56   0:00 sync
root     19438  0.0  0.0   7060   652 pts/4    D    10:51   0:00 sync
root     23121  0.0  0.0   7060   700 ?        D    10:47   0:00 sync
root     32588  0.0  0.0   7060   616 ?        D    11:09   0:00 sync

Code:
------------------------------------------------------------------------
ZFS Subsystem Report                Mon Nov 19 11:29:32 2018
ARC Summary: (HEALTHY)
    Memory Throttle Count:            0

ARC Misc:
    Deleted:                20.63M
    Mutex Misses:                1.25k
    Evict Skips:                8.58k

ARC Size:                96.62%    20.52    GiB
    Target Size: (Adaptive)        100.00%    21.23    GiB
    Min Size (Hard Limit):        40.00%    8.49    GiB
    Max Size (High Water):        2:1    21.23    GiB

ARC Size Breakdown:
    Recently Used Cache Size:    19.62%    3.41    GiB
    Frequently Used Cache Size:    80.38%    13.99    GiB

ARC Hash Breakdown:
    Elements Max:                1.79M
    Elements Current:        66.87%    1.20M
    Collisions:                3.52M
    Chain Max:                4
    Chains:                    20.55k

ARC Total accesses:                    4.80G
    Cache Hit Ratio:        90.60%    4.34G
    Cache Miss Ratio:        9.40%    451.02M
    Actual Hit Ratio:        90.43%    4.34G

    Data Demand Efficiency:        98.61%    1.17G
    Data Prefetch Efficiency:    61.12%    15.28M

    CACHE HITS BY CACHE LIST:
      Anonymously Used:        0.14%    6.00M
      Most Recently Used:        10.02%    435.40M
      Most Frequently Used:        89.80%    3.90G
      Most Recently Used Ghost:    0.03%    1.26M
      Most Frequently Used Ghost:    0.02%    727.97k

    CACHE HITS BY DATA TYPE:
      Demand Data:            26.47%    1.15G
      Prefetch Data:        0.21%    9.34M
      Demand Metadata:        73.16%    3.18G
      Prefetch Metadata:        0.15%    6.56M

    CACHE MISSES BY DATA TYPE:
      Demand Data:            3.59%    16.21M
      Prefetch Data:        1.32%    5.94M
      Demand Metadata:        94.86%    427.83M
      Prefetch Metadata:        0.23%    1.03M


DMU Prefetch Efficiency:                    2.50G
    Hit Ratio:            1.19%    29.62M
    Miss Ratio:            98.81%    2.47G



ZFS Tunables:
    dbuf_cache_hiwater_pct                            10
    dbuf_cache_lowater_pct                            10
    dbuf_cache_max_bytes                              104857600
    dbuf_cache_max_shift                              5
    dmu_object_alloc_chunk_shift                      7
    ignore_hole_birth                                 1
    l2arc_feed_again                                  1
    l2arc_feed_min_ms                                 200
    l2arc_feed_secs                                   1
    l2arc_headroom                                    2
    l2arc_headroom_boost                              200
    l2arc_noprefetch                                  1
    l2arc_norw                                        0
    l2arc_write_boost                                 8388608
    l2arc_write_max                                   8388608
    metaslab_aliquot                                  524288
    metaslab_bias_enabled                             1
    metaslab_debug_load                               0
    metaslab_debug_unload                             0
    metaslab_fragmentation_factor_enabled             1
    metaslab_lba_weighting_enabled                    1
    metaslab_preload_enabled                          1
    metaslabs_per_vdev                                200
    send_holes_without_birth_time                     1
    spa_asize_inflation                               24
    spa_config_path                                   /etc/zfs/zpool.cache
    spa_load_verify_data                              1
    spa_load_verify_maxinflight                       10000
    spa_load_verify_metadata                          1
    spa_slop_shift                                    5
    zfetch_array_rd_sz                                1048576
    zfetch_max_distance                               8388608
    zfetch_max_streams                                8
    zfetch_min_sec_reap                               2
    zfs_abd_scatter_enabled                           1
    zfs_abd_scatter_max_order                         10
    zfs_admin_snapshot                                1
    zfs_arc_average_blocksize                         8192
    zfs_arc_dnode_limit                               0
    zfs_arc_dnode_limit_percent                       10
    zfs_arc_dnode_reduce_percent                      10
    zfs_arc_grow_retry                                0
    zfs_arc_lotsfree_percent                          10
    zfs_arc_max                                       22797719962
    zfs_arc_meta_adjust_restarts                      4096
    zfs_arc_meta_limit                                0
    zfs_arc_meta_limit_percent                        75
    zfs_arc_meta_min                                  0
    zfs_arc_meta_prune                                10000
    zfs_arc_meta_strategy                             1
    zfs_arc_min                                       9119087985
    zfs_arc_min_prefetch_lifespan                     0
    zfs_arc_p_dampener_disable                        1
    zfs_arc_p_min_shift                               0
    zfs_arc_pc_percent                                0
    zfs_arc_shrink_shift                              0
    zfs_arc_sys_free                                  0
    zfs_autoimport_disable                            1
    zfs_checksums_per_second                          20
    zfs_compressed_arc_enabled                        1
    zfs_dbgmsg_enable                                 0
    zfs_dbgmsg_maxsize                                4194304
    zfs_dbuf_state_index                              0
    zfs_deadman_checktime_ms                          5000
    zfs_deadman_enabled                               1
    zfs_deadman_synctime_ms                           1000000
    zfs_dedup_prefetch                                0
    zfs_delay_min_dirty_percent                       60
    zfs_delay_scale                                   500000
    zfs_delays_per_second                             20
    zfs_delete_blocks                                 20480
    zfs_dirty_data_max                                4294967296
    zfs_dirty_data_max_max                            4294967296
    zfs_dirty_data_max_max_percent                    25
    zfs_dirty_data_max_percent                        10
    zfs_dirty_data_sync                               67108864
    zfs_dmu_offset_next_sync                          0
    zfs_expire_snapshot                               300
    zfs_flags                                         0
    zfs_free_bpobj_enabled                            1
    zfs_free_leak_on_eio                              0
    zfs_free_max_blocks                               100000
    zfs_free_min_time_ms                              1000
    zfs_immediate_write_sz                            32768
    zfs_max_recordsize                                1048576
    zfs_mdcomp_disable                                0
    zfs_metaslab_fragmentation_threshold              70
    zfs_metaslab_segment_weight_enabled               1
    zfs_metaslab_switch_threshold                     2
    zfs_mg_fragmentation_threshold                    85
    zfs_mg_noalloc_threshold                          0
    zfs_multihost_fail_intervals                      5
    zfs_multihost_history                             0
    zfs_multihost_import_intervals                    10
    zfs_multihost_interval                            1000
    zfs_multilist_num_sublists                        0
    zfs_no_scrub_io                                   0
    zfs_no_scrub_prefetch                             0
    zfs_nocacheflush                                  0
    zfs_nopwrite_enabled                              1
    zfs_object_mutex_size                             64
    zfs_pd_bytes_max                                  52428800
    zfs_per_txg_dirty_frees_percent                   30
    zfs_prefetch_disable                              0
    zfs_read_chunk_size                               1048576
    zfs_read_history                                  0
    zfs_read_history_hits                             0
    zfs_recover                                       0
    zfs_recv_queue_length                             16777216
    zfs_resilver_delay                                2
    zfs_resilver_min_time_ms                          3000
    zfs_scan_idle                                     50
    zfs_scan_ignore_errors                            0
    zfs_scan_min_time_ms                              1000
    zfs_scrub_delay                                   4
    zfs_send_corrupt_data                             0
    zfs_send_queue_length                             16777216
    zfs_sync_pass_deferred_free                       2
    zfs_sync_pass_dont_compress                       5
    zfs_sync_pass_rewrite                             2
    zfs_sync_taskq_batch_pct                          75
    zfs_top_maxinflight                               32
    zfs_txg_history                                   0
    zfs_txg_timeout                                   5
    zfs_vdev_aggregation_limit                        131072
    zfs_vdev_async_read_max_active                    3
    zfs_vdev_async_read_min_active                    1
    zfs_vdev_async_write_active_max_dirty_percent     60
    zfs_vdev_async_write_active_min_dirty_percent     30
    zfs_vdev_async_write_max_active                   10
    zfs_vdev_async_write_min_active                   2
    zfs_vdev_cache_bshift                             16
    zfs_vdev_cache_max                                16384
    zfs_vdev_cache_size                               0
    zfs_vdev_max_active                               1000
    zfs_vdev_mirror_non_rotating_inc                  0
    zfs_vdev_mirror_non_rotating_seek_inc             1
    zfs_vdev_mirror_rotating_inc                      0
    zfs_vdev_mirror_rotating_seek_inc                 5
    zfs_vdev_mirror_rotating_seek_offset              1048576
    zfs_vdev_queue_depth_pct                          1000
    zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3
    zfs_vdev_read_gap_limit                           32768
    zfs_vdev_scheduler                                noop
    zfs_vdev_scrub_max_active                         2
    zfs_vdev_scrub_min_active                         1
    zfs_vdev_sync_read_max_active                     10
    zfs_vdev_sync_read_min_active                     10
    zfs_vdev_sync_write_max_active                    10
    zfs_vdev_sync_write_min_active                    10
    zfs_vdev_write_gap_limit                          4096
    zfs_zevent_cols                                   80
    zfs_zevent_console                                0
    zfs_zevent_len_max                                384
    zil_replay_disable                                0
    zil_slog_bulk                                     786432
    zio_delay_max                                     30000
    zio_dva_throttle_enabled                          1
    zio_requeue_io_start_cut_in_line                  1
    zio_taskq_batch_pct                               75
    zvol_inhibit_dev                                  0
    zvol_major                                        230
    zvol_max_discard_blocks                           16384
    zvol_prefetch_bytes                               131072
    zvol_request_sync                                 0
    zvol_threads                                      32
    zvol_volmode                                      1

Code:
options zfs zfs_arc_min=9119087985
options zfs zfs_arc_max=22797719962

Any idea what might cause this ? The only fix so far is rebooting :/
 
Hi,

does this happen only with the last kernel?
Do you have an iPERC on this server?
If yes ZFS is not compatible with Raid controller.
Also, JBOD mode is still a Raid.
 
Do you know which was the first kernel Version where this occurs?
And which was the last working one?
 
Still happened today when rolling back a snapshot
Code:
root     25899  0.0  0.0  35148  3108 ?        D    10:56   0:00 zfs rollback rpool/data/subvol-101-disk-0@pre_deploy

EDIT: Realised it's behind in kernel. I will update and reboot to see if it solves the issue

Code:
 INFO: task kworker/u66:3:17147 blocked for more than 120 seconds.
[2019-01-09 10:03:43]        Tainted: P          IO     4.15.17-2-pve #1
[2019-01-09 10:03:43]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2019-01-09 10:03:43]  kworker/u66:3   D    0 17147      2 0x80000000
[2019-01-09 10:03:43]  Workqueue: writeback wb_workfn (flush-zfs-237)
[2019-01-09 10:03:43]  Call Trace:
[2019-01-09 10:03:43]   __schedule+0x3e3/0x880
[2019-01-09 10:03:43]   schedule+0x36/0x80
[2019-01-09 10:03:43]   io_schedule+0x16/0x40
[2019-01-09 10:03:43]   __lock_page+0xff/0x140
[2019-01-09 10:03:43]   ? page_cache_tree_insert+0xe0/0xe0
[2019-01-09 10:03:43]   write_cache_pages+0x2e9/0x4b0
[2019-01-09 10:03:43]   ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[2019-01-09 10:03:43]   ? _cond_resched+0x1a/0x50
[2019-01-09 10:03:43]   ? mutex_lock+0x12/0x40
[2019-01-09 10:03:43]   ? rrw_exit+0x5a/0x150 [zfs]
[2019-01-09 10:03:43]   zpl_writepages+0x91/0x170 [zfs]
[2019-01-09 10:03:43]   do_writepages+0x1f/0x70
[2019-01-09 10:03:43]   __writeback_single_inode+0x45/0x330
[2019-01-09 10:03:43]   writeback_sb_inodes+0x266/0x590
[2019-01-09 10:03:43]   __writeback_inodes_wb+0x92/0xc0
[2019-01-09 10:03:43]   wb_writeback+0x288/0x320
[2019-01-09 10:03:43]   wb_workfn+0x1a3/0x450
[2019-01-09 10:03:43]   ? wb_workfn+0x1a3/0x450
[2019-01-09 10:03:43]   process_one_work+0x1e0/0x400
[2019-01-09 10:03:43]   worker_thread+0x4b/0x420
[2019-01-09 10:03:43]   kthread+0x105/0x140
[2019-01-09 10:03:43]   ? process_one_work+0x400/0x400
[2019-01-09 10:03:43]   ? kthread_create_worker_on_cpu+0x70/0x70
[2019-01-09 10:03:43]   ? do_syscall_64+0x73/0x130
[2019-01-09 10:03:43]   ? SyS_exit_group+0x14/0x20
[2019-01-09 10:03:43]   ret_from_fork+0x35/0x40
[2019-01-09 10:03:44]  audit: type=1400 audit(1547046543.654:220): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-101_</var/lib/lxc>" name="/" pid=951 comm="(ionclean)" flags="rw, rslave"
[2019-01-09 10:05:43]  INFO: task kworker/u66:3:17147 blocked for more than 120 seconds.
[2019-01-09 10:05:43]        Tainted: P          IO     4.15.17-2-pve #1
[2019-01-09 10:05:43]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2019-01-09 10:05:43]  kworker/u66:3   D    0 17147      2 0x80000000
[2019-01-09 10:05:43]  Workqueue: writeback wb_workfn (flush-zfs-237)
[2019-01-09 10:05:43]  Call Trace:
[2019-01-09 10:05:43]   __schedule+0x3e3/0x880
[2019-01-09 10:05:43]   schedule+0x36/0x80
[2019-01-09 10:05:43]   io_schedule+0x16/0x40
[2019-01-09 10:05:43]   __lock_page+0xff/0x140
[2019-01-09 10:05:43]   ? page_cache_tree_insert+0xe0/0xe0
[2019-01-09 10:05:43]   write_cache_pages+0x2e9/0x4b0
[2019-01-09 10:05:43]   ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[2019-01-09 10:05:43]   ? _cond_resched+0x1a/0x50
[2019-01-09 10:05:43]   ? mutex_lock+0x12/0x40
[2019-01-09 10:05:43]   ? rrw_exit+0x5a/0x150 [zfs]
[2019-01-09 10:05:43]   zpl_writepages+0x91/0x170 [zfs]
[2019-01-09 10:05:43]   do_writepages+0x1f/0x70
[2019-01-09 10:05:43]   __writeback_single_inode+0x45/0x330
[2019-01-09 10:05:43]   writeback_sb_inodes+0x266/0x590
[2019-01-09 10:05:43]   __writeback_inodes_wb+0x92/0xc0
[2019-01-09 10:05:43]   wb_writeback+0x288/0x320
[2019-01-09 10:05:43]   wb_workfn+0x1a3/0x450
[2019-01-09 10:05:43]   ? wb_workfn+0x1a3/0x450
[2019-01-09 10:05:43]   process_one_work+0x1e0/0x400
[2019-01-09 10:05:43]   worker_thread+0x4b/0x420
[2019-01-09 10:05:43]   kthread+0x105/0x140
[2019-01-09 10:05:43]   ? process_one_work+0x400/0x400
[2019-01-09 10:05:43]   ? kthread_create_worker_on_cpu+0x70/0x70
[2019-01-09 10:05:43]   ? do_syscall_64+0x73/0x130
[2019-01-09 10:05:43]   ? SyS_exit_group+0x14/0x20
[2019-01-09 10:05:43]   ret_from_fork+0x35/0x40
[2019-01-09 10:07:44]  INFO: task kworker/u66:3:17147 blocked for more than 120 seconds.
[2019-01-09 10:07:44]        Tainted: P          IO     4.15.17-2-pve #1
[2019-01-09 10:07:44]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2019-01-09 10:07:44]  kworker/u66:3   D    0 17147      2 0x80000000
[2019-01-09 10:07:44]  Workqueue: writeback wb_workfn (flush-zfs-237)
[2019-01-09 10:07:44]  Call Trace:
[2019-01-09 10:07:44]   __schedule+0x3e3/0x880
[2019-01-09 10:07:44]   schedule+0x36/0x80
[2019-01-09 10:07:44]   io_schedule+0x16/0x40
[2019-01-09 10:07:44]   __lock_page+0xff/0x140
[2019-01-09 10:07:44]   ? page_cache_tree_insert+0xe0/0xe0
[2019-01-09 10:07:44]   write_cache_pages+0x2e9/0x4b0
[2019-01-09 10:07:44]   ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[2019-01-09 10:07:44]   ? _cond_resched+0x1a/0x50
[2019-01-09 10:07:44]   ? mutex_lock+0x12/0x40
[2019-01-09 10:07:44]   ? rrw_exit+0x5a/0x150 [zfs]
[2019-01-09 10:07:44]   zpl_writepages+0x91/0x170 [zfs]
[2019-01-09 10:07:44]   do_writepages+0x1f/0x70
[2019-01-09 10:07:44]   __writeback_single_inode+0x45/0x330
[2019-01-09 10:07:44]   writeback_sb_inodes+0x266/0x590
[2019-01-09 10:07:44]   __writeback_inodes_wb+0x92/0xc0
[2019-01-09 10:07:44]   wb_writeback+0x288/0x320
[2019-01-09 10:07:44]   wb_workfn+0x1a3/0x450
[2019-01-09 10:07:44]   ? wb_workfn+0x1a3/0x450
[2019-01-09 10:07:44]   process_one_work+0x1e0/0x400
[2019-01-09 10:07:44]   worker_thread+0x4b/0x420
[2019-01-09 10:07:44]   kthread+0x105/0x140
[2019-01-09 10:07:44]   ? process_one_work+0x400/0x400
[2019-01-09 10:07:44]   ? kthread_create_worker_on_cpu+0x70/0x70
[2019-01-09 10:07:44]   ? do_syscall_64+0x73/0x130
[2019-01-09 10:07:44]   ? SyS_exit_group+0x14/0x20
[2019-01-09 10:07:44]   ret_from_fork+0x35/0x40
[2019-01-09 10:09:45]  INFO: task kworker/u66:3:17147 blocked for more than 120 seconds.
[2019-01-09 10:09:45]        Tainted: P          IO     4.15.17-2-pve #1
[2019-01-09 10:09:45]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2019-01-09 10:09:45]  kworker/u66:3   D    0 17147      2 0x80000000
[2019-01-09 10:09:45]  Workqueue: writeback wb_workfn (flush-zfs-237)
[2019-01-09 10:09:45]  Call Trace:
[2019-01-09 10:09:45]   __schedule+0x3e3/0x880
[2019-01-09 10:09:45]   schedule+0x36/0x80
[2019-01-09 10:09:45]   io_schedule+0x16/0x40
[2019-01-09 10:09:45]   __lock_page+0xff/0x140
[2019-01-09 10:09:45]   ? page_cache_tree_insert+0xe0/0xe0
[2019-01-09 10:09:45]   write_cache_pages+0x2e9/0x4b0
[2019-01-09 10:09:45]   ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[2019-01-09 10:09:45]   ? _cond_resched+0x1a/0x50
[2019-01-09 10:09:45]   ? mutex_lock+0x12/0x40
[2019-01-09 10:09:45]   ? rrw_exit+0x5a/0x150 [zfs]
[2019-01-09 10:09:45]   zpl_writepages+0x91/0x170 [zfs]
[2019-01-09 10:09:45]   do_writepages+0x1f/0x70
[2019-01-09 10:09:45]   __writeback_single_inode+0x45/0x330
[2019-01-09 10:09:45]   writeback_sb_inodes+0x266/0x590
[2019-01-09 10:09:45]   __writeback_inodes_wb+0x92/0xc0
[2019-01-09 10:09:45]   wb_writeback+0x288/0x320
[2019-01-09 10:09:45]   wb_workfn+0x1a3/0x450
[2019-01-09 10:09:45]   ? wb_workfn+0x1a3/0x450
[2019-01-09 10:09:45]   process_one_work+0x1e0/0x400
[2019-01-09 10:09:45]   worker_thread+0x4b/0x420
[2019-01-09 10:09:45]   kthread+0x105/0x140
[2019-01-09 10:09:45]   ? process_one_work+0x400/0x400
[2019-01-09 10:09:45]   ? kthread_create_worker_on_cpu+0x70/0x70
[2019-01-09 10:09:45]   ? do_syscall_64+0x73/0x130
[2019-01-09 10:09:45]   ? SyS_exit_group+0x14/0x20
[2019-01-09 10:09:45]   ret_from_fork+0x35/0x40
[2019-01-09 10:10:16]  php[24179]: segfault at c8 ip 000055bff72bb05e sp 00007ffff87b7df0 error 4 in php7.2[55bff6fd7000+418000]
[2019-01-09 10:10:16]  php[24178]: segfault at c8 ip 0000561bf710205e sp 00007ffef10c9110 error 4 in php7.2[561bf6e1e000+418000]

Code:
[<0>] cv_wait_common+0x11e/0x140 [spl]
[<0>] __cv_wait+0x15/0x20 [spl]
[<0>] rrw_enter_write+0x3c/0xa0 [zfs]
[<0>] rrm_enter+0x2a/0x70 [zfs]
[<0>] zfsvfs_teardown+0x7b/0x300 [zfs]
[<0>] zfs_suspend_fs+0x10/0x20 [zfs]
[<0>] zfs_ioc_rollback+0xe2/0x1a0 [zfs]
[<0>] zfsdev_ioctl+0x216/0x670 [zfs]
[<0>] do_vfs_ioctl+0xa6/0x620
[<0>] SyS_ioctl+0x79/0x90
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.17-2-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
Last edited:
Still happened today. Seems to be related to the call to sync and not arc_prune. Here's some more logs. Let me know if you need any informations:

Code:
root      6959  1.5  0.0 199056 72048 ?        SLsl  2018 1112:56 /usr/sbin/corosync -f
root     19671  0.0  0.0   7060   664 ?        D    Jan14   0:00      \_ sync

Code:
cat /proc/19671/stack
[<0>] wb_wait_for_completion+0x64/0x90
[<0>] sync_inodes_sb+0xa6/0x280
[<0>] sync_inodes_one_sb+0x15/0x20
[<0>] iterate_supers+0xbf/0x110
[<0>] sys_sync+0x42/0xb0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

Code:
/proc/19671# cat syscall
162 0x7ff0dd329078 0x7ffded9d72b8 0x7ff0dd561e01 0x81f 0x0 0x0 0x7ffded9d7188 0x7ff0dd2a75b7

Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-7-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-2-pve: 4.15.18-21
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
Last edited:
I do. I thought Proxmox set one by default during the installation ?

Code:
rpool/swap                    8.50G  1004G   869M  -

Code:
Filename                Type        Size    Used    Priority
/dev/zd32                               partition    8388604    0    -2

When testing. I could crash the whole system when hitting swap and the processes changed to a D state (using stress). Could it be the same when sync was called ? What I find weird is that It's stuck in D state but it did not crashed the whole system like the github issue mention.

There is also no mention of swap in the stack trace.
 
I do. I thought Proxmox set one by default during the installation ?
It did - we changed that in the 5.3 installer for precisely this issue (the github link)
I would suggest to disable your swap-zvol if possible - it takes one potential culprit out of the system
Also - please try reproducing it in the latest released kernel ( pve-kernel-4.15.18-9-pve: 4.15.18-30)​
 
hey Stoiko,

Even after disable the swap, the issue is still present.

Code:
root      6073  0.0  0.0   7060   760 pts/10   D+   09:04   0:00  |                           \_ sync

Code:
[<0>] wb_wait_for_completion+0x64/0x90
[<0>] sync_inodes_sb+0xa6/0x280
[<0>] sync_inodes_one_sb+0x15/0x20
[<0>] iterate_supers+0xbf/0x110
[<0>] sys_sync+0x42/0xb0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

Code:
cat /proc/6073/syscall
162 0x7f633f7c1078 0x7fff602753f8 0x7f633f9f9e01 0x81f 0x0 0x0 0x7fff602752c8 0x7f633f73f5b7

Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-8-pve)
pve-manager: 5.3-8 (running version: 5.3-8/2929af8e)
pve-kernel-4.15: 5.3-1
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-19
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-36
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-33
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-45
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

Code:
execve("/usr/sbin/update-initramfs", ["update-initramfs", "-u"], [/* 16 vars */]) = 0
brk(NULL)                               = 0x56112fbcd000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17c42f7000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=65446, ...}) = 0
mmap(NULL, 65446, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f17c42e7000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\4\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1689360, ...}) = 0
mmap(NULL, 3795296, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f17c3d38000
mprotect(0x7f17c3ecd000, 2097152, PROT_NONE) = 0
mmap(0x7f17c40cd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f17c40cd000
mmap(0x7f17c40d3000, 14688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f17c40d3000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17c42e5000
arch_prctl(ARCH_SET_FS, 0x7f17c42e5700) = 0
mprotect(0x7f17c40cd000, 16384, PROT_READ) = 0
mprotect(0x56112f1dd000, 8192, PROT_READ) = 0
mprotect(0x7f17c42fa000, 4096, PROT_READ) = 0
munmap(0x7f17c42e7000, 65446)           = 0
getpid()                                = 8030
rt_sigaction(SIGCHLD, {sa_handler=0x56112efd3ef0, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
geteuid()                               = 0
brk(NULL)                               = 0x56112fbcd000
brk(0x56112fbee000)                     = 0x56112fbee000
getppid()                               = 8028
stat("/root", {st_mode=S_IFDIR|0700, st_size=19, ...}) = 0
stat(".", {st_mode=S_IFDIR|0700, st_size=19, ...}) = 0
open("/usr/sbin/update-initramfs", O_RDONLY) = 3
fcntl(3, F_DUPFD, 10)                   = 10
close(3)                                = 0
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x56112efd3ef0, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
rt_sigaction(SIGQUIT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
rt_sigaction(SIGTERM, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER, sa_restorer=0x7f17c3d6b060}, NULL, 8) = 0
read(10, "#!/bin/sh\n\nSTATEDIR=/var/lib/ini"..., 8192) = 8192
faccessat(AT_FDCWD, "/etc/initramfs-tools/update-initramfs.conf", R_OK) = 0
open("/etc/initramfs-tools/update-initramfs.conf", O_RDONLY) = 3
fcntl(3, F_DUPFD, 10)                   = 11
close(3)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
read(11, "#\n# Configuration file for updat"..., 8192) = 378
read(11, "", 8192)                      = 0
close(11)                               = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8031
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 8031
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8031, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8031
read(10, ";\n\tu)\n\t\tupdate\n\t\t;;\nesac\n", 8192) = 25
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8032
close(4)                                = 0
read(3, "4.15.18-10-pve\n4.15.18-9-pve\n4.1"..., 128) = 128
read(3, ".13.16-3-pve\n4.13.16-2-pve\n4.13."..., 128) = 128
read(3, "3.8-3-pve\n4.13.8-2-pve\n4.13.4-1-"..., 128) = 36
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8032, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 36
read(3, "", 128)                        = 0
close(3)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8032
faccessat(AT_FDCWD, "/proc/mounts", R_OK) = 0
stat("/usr/local/sbin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ischroot", 0x7ffe69edd4e0) = -1 ENOENT (No such file or directory)
stat("/usr/bin/ischroot", {st_mode=S_IFREG|0755, st_size=10552, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8035
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 8035
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8035, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8035
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8036
close(4)                                = 0
read(3, "", 128)                        = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8036, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 0
close(3)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8036
stat("/boot/initrd.img-4.15.18-10-pve", {st_mode=S_IFREG|0644, st_size=36184842, ...}) = 0
stat("/usr/local/sbin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/sha1sum", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/bin/sha1sum", {st_mode=S_IFREG|0755, st_size=47976, ...}) = 0
pipe([3, 4])                            = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8037
close(4)                                = 0
stat("/usr/local/sbin/diff", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/diff", 0x7ffe69edd340) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/diff", 0x7ffe69edd340)  = -1 ENOENT (No such file or directory)
stat("/usr/bin/diff", {st_mode=S_IFREG|0755, st_size=146824, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8038
close(3)                                = 0
close(-1)                               = -1 EBADF (Bad file descriptor)
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8037
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8037, si_uid=0, si_status=0, si_utime=9, si_stime=1} ---
rt_sigreturn({mask=[]})                 = 8037
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8038
faccessat(AT_FDCWD, "/boot/initrd.img-4.15.18-10-pve", R_OK) = 0
faccessat(AT_FDCWD, "/boot/initrd.img-4.15.18-10-pve.dpkg-bak", R_OK) = 0
stat("/usr/local/sbin/rm", 0x7ffe69edd520) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/rm", 0x7ffe69edd520) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/rm", 0x7ffe69edd520)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/rm", 0x7ffe69edd520)     = -1 ENOENT (No such file or directory)
stat("/sbin/rm", 0x7ffe69edd520)        = -1 ENOENT (No such file or directory)
stat("/bin/rm", {st_mode=S_IFREG|0755, st_size=64424, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8039
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8039
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8039, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8039
stat("/usr/local/sbin/ln", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ln", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ln", 0x7ffe69edd540)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/ln", 0x7ffe69edd540)     = -1 ENOENT (No such file or directory)
stat("/sbin/ln", 0x7ffe69edd540)        = -1 ENOENT (No such file or directory)
stat("/bin/ln", {st_mode=S_IFREG|0755, st_size=56240, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8040
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8040
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8040, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 8040
write(1, "update-initramfs: Generating /bo"..., 61update-initramfs: Generating /boot/initrd.img-4.15.18-10-pve
) = 61
stat("/usr/local/sbin/mkinitramfs", 0x7ffe69edd580) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/mkinitramfs", 0x7ffe69edd580) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/mkinitramfs", {st_mode=S_IFREG|0755, st_size=10255, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 8041
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 8041
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8041, si_uid=0, si_status=0, si_utime=1, si_stime=3} ---
rt_sigreturn({mask=[]})                 = 8041
stat("/usr/local/sbin/mv", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/mv", 0x7ffe69edd540) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/mv", 0x7ffe69edd540)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/mv", 0x7ffe69edd540)     = -1 ENOENT (No such file or directory)
stat("/sbin/mv", 0x7ffe69edd540)        = -1 ENOENT (No such file or directory)
stat("/bin/mv", {st_mode=S_IFREG|0755, st_size=126416, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14543
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14543
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14543, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn({mask=[]})                 = 14543
open("/var/lib/initramfs-tools/4.15.18-10-pve", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(3, 1)                              = 1
close(3)                                = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14544
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14544
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14544, si_uid=0, si_status=0, si_utime=8, si_stime=1} ---
rt_sigreturn({mask=[]})                 = 14544
dup2(11, 1)                             = 1
close(11)                               = 0
stat("/usr/local/sbin/sync", 0x7ffe69edd560) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/sync", 0x7ffe69edd560) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/sync", 0x7ffe69edd560)  = -1 ENOENT (No such file or directory)
stat("/usr/bin/sync", 0x7ffe69edd560)   = -1 ENOENT (No such file or directory)
stat("/sbin/sync", 0x7ffe69edd560)      = -1 ENOENT (No such file or directory)
stat("/bin/sync", {st_mode=S_IFREG|0755, st_size=31496, ...}) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f17c42e59d0) = 14545
wait4(-1,

Code:
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=74244120k,nr_inodes=18561030,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=14852988k,mode=755)
rpool/ROOT/pve-1 on / type zfs (rw,relatime,xattr,noacl)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=36,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=4157)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
rpool on /rpool type zfs (rw,noatime,xattr,noacl)
rpool/ROOT on /rpool/ROOT type zfs (rw,noatime,xattr,noacl)
rpool/data on /rpool/data type zfs (rw,noatime,xattr,noacl)
rpool/data/subvol-102-disk-0 on /rpool/data/subvol-102-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-110-disk-0 on /rpool/data/subvol-110-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-111-disk-1 on /rpool/data/subvol-111-disk-1 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-120-disk-0 on /rpool/data/subvol-120-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-134-disk-1 on /rpool/data/subvol-134-disk-1 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-137-disk-1 on /rpool/data/subvol-137-disk-1 type zfs (rw,noatime,xattr,posixacl)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
dev-nas-1:/mnt/Pool-NAS/Backups on /mnt/dev-nas-1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.104,mountvers=3,mountport=881,mountproto=udp,local_lock=none,addr=192.168.1.104)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
dev-nas-1:/mnt/Pool-NAS/Proxmox on /mnt/pve/dev-nas-1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.104,mountvers=3,mountport=881,mountproto=udp,local_lock=none,addr=192.168.1.104)
rpool/data/subvol-104-disk-0 on /rpool/data/subvol-104-disk-0 type zfs (rw,noatime,xattr,posixacl)
rpool/data/subvol-101-disk-0 on /rpool/data/subvol-101-disk-0 type zfs (rw,noatime,xattr,posixacl)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700)
rpool/data/subvol-109-disk-0 on /rpool/data/subvol-109-disk-0 type zfs (rw,noatime,xattr,posixacl)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
tmpfs on /run/user/1006 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700,uid=1006,gid=1006)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=14852984k,mode=700,uid=1001,gid=1001)

Code:
[Mon Feb  4 11:08:25 2019] INFO: task kworker/u49:1:30643 blocked for more than 120 seconds.
[Mon Feb  4 11:08:25 2019]       Tainted: P        W IO     4.15.18-8-pve #1
[Mon Feb  4 11:08:25 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Feb  4 11:08:25 2019] kworker/u49:1   D    0 30643      2 0x80000000
[Mon Feb  4 11:08:25 2019] Workqueue: writeback wb_workfn (flush-zfs-9)
[Mon Feb  4 11:08:25 2019] Call Trace:
[Mon Feb  4 11:08:25 2019]  __schedule+0x3e0/0x870
[Mon Feb  4 11:08:25 2019]  schedule+0x36/0x80
[Mon Feb  4 11:08:25 2019]  io_schedule+0x16/0x40
[Mon Feb  4 11:08:25 2019]  __lock_page+0xff/0x140
[Mon Feb  4 11:08:25 2019]  ? page_cache_tree_insert+0xe0/0xe0
[Mon Feb  4 11:08:25 2019]  write_cache_pages+0x2e9/0x4b0
[Mon Feb  4 11:08:25 2019]  ? zpl_write_common_iovec+0xe0/0xe0 [zfs]
[Mon Feb  4 11:08:25 2019]  ? _cond_resched+0x1a/0x50
[Mon Feb  4 11:08:25 2019]  ? mutex_lock+0x12/0x40
[Mon Feb  4 11:08:25 2019]  ? rrw_exit+0x5a/0x150 [zfs]
[Mon Feb  4 11:08:25 2019]  zpl_writepages+0x91/0x170 [zfs]
[Mon Feb  4 11:08:25 2019]  do_writepages+0x1f/0x70
[Mon Feb  4 11:08:25 2019]  __writeback_single_inode+0x45/0x330
[Mon Feb  4 11:08:25 2019]  writeback_sb_inodes+0x266/0x590
[Mon Feb  4 11:08:25 2019]  __writeback_inodes_wb+0x92/0xc0
[Mon Feb  4 11:08:25 2019]  wb_writeback+0x288/0x320
[Mon Feb  4 11:08:25 2019]  wb_workfn+0x1a3/0x440
[Mon Feb  4 11:08:25 2019]  ? wb_workfn+0x1a3/0x440
[Mon Feb  4 11:08:25 2019]  process_one_work+0x1e0/0x400
[Mon Feb  4 11:08:25 2019]  worker_thread+0x4b/0x420
[Mon Feb  4 11:08:25 2019]  kthread+0x105/0x140
[Mon Feb  4 11:08:25 2019]  ? process_one_work+0x400/0x400
[Mon Feb  4 11:08:25 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Mon Feb  4 11:08:25 2019]  ? do_syscall_64+0x73/0x130
[Mon Feb  4 11:08:25 2019]  ? SyS_exit_group+0x14/0x20
[Mon Feb  4 11:08:25 2019]  ret_from_fork+0x35/0x40

Is there anything else I can provide so we can fix this once and for all ?

Thanks

EDIT: I've found this issue on the ZOL github which is exactly what is happening here. Unfortunately it's still open. https://github.com/zfsonlinux/zfs/issues/7484
 
Last edited:
Hey,

The update-initramfs is hanging after issueing a `sync` - any chance that's related to the NFS-mounts you have on your system?
* can you disable the Storages and unmount them temporarily?
* alternatively can you write/read to/from them?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!