[KNOWN ZFS PROBLEM] Freezing while high IO (only on host - not guest)

0.7.6 is not yet done in any way (there is basically just a TODO list at the moment). once patches for 0.7 are available, we can think of cherry-picking them.
 
0.7.6 is not yet done in any way (there is basically just a TODO list at the moment). once patches for 0.7 are available, we can think of cherry-picking them.
By upcoming I meant just the next version. As soon as the patch is merged to master I was planning to compile ZFS from source and make upgrade as right now I'm still unable to do any real backups and I need to resolve this ASAP.

Is there any danger in doing that or what would be your recommendation? I compiled few things from source from time to time but I'm definitely not use to it so I will appreciate any help.
 
I would not recommend compiling from master unless you are familiar with the ZFS code and keep up with current developments. it contains features currently in development which might be incompatible with 0.7 and/or eat your data.
 
I would not recommend compiling from master unless you are familiar with the ZFS code and keep up with current developments. it contains features currently in development which might be incompatible with 0.7 and/or eat your data.
Ok so would it be better then to download source for 0.7.5 and apply this specific patch?
 
probably a better idea. I'll try to do a ZFS update later this week or early next week, and I'll re-evaluate the stuff slated for 0.7.6 so you could also wait for that to happen ;)
 
probably a better idea. I'll try to do a ZFS update later this week or early next week, and I'll re-evaluate the stuff slated for 0.7.6 so you could also wait for that to happen ;)
Ok thank you very much. I'm rushing this because as I stated before I couldn't do any backups of VMs for now more than 3 weeks because (hopefully) this bug that totally kills any data transfers and will hang system so I'm jumping to every possible solution.
 
probably a better idea. I'll try to do a ZFS update later this week or early next week, and I'll re-evaluate the stuff slated for 0.7.6 so you could also wait for that to happen ;)
Any Updates on this? Thank you...
 
Any Updates on this? Thank you...

the proposed 0.7.6 patchset is already running through the buildbots upstream, I'm waiting for positive results so that we can just update to 0.7.6 instead of cherry-picking individual master commits.
 
the proposed 0.7.6 patchset is already running through the buildbots upstream, I'm waiting for positive results so that we can just update to 0.7.6 instead of cherry-picking individual master commits.

went with the cherry-pick since 0.7.6 took too long ;)

updated kernel and ZFS packages are available on pvetest
 
went with the cherry-pick since 0.7.6 took too long ;)

updated kernel and ZFS packages are available on pvetest

I saw update to v0.7.4-1, is patch included in this version? If it is then I have some different problem because after update I'm still experiencing same problem - system hanging with any intense IO. I was almost sure that above mentioned problem was the cause because it described almost exactly same symptoms as I have... Well back to searching new solution...

EDIT:
[ 363.485362] INFO: task txg_sync:785 blocked for more than 120 seconds.
[ 363.485388] Tainted: P O 4.13.13-5-pve #1
[ 363.485406] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.485431] txg_sync D 0 785 2 0x00000000
[ 363.485433] Call Trace:
[ 363.485441] __schedule+0x3cc/0x850
[ 363.485443] schedule+0x36/0x80
[ 363.485446] io_schedule+0x16/0x40
[ 363.485453] cv_wait_common+0xb2/0x140 [spl]
[ 363.485456] ? wait_woken+0x80/0x80
[ 363.485460] __cv_wait_io+0x18/0x20 [spl]
[ 363.485503] zio_wait+0xfd/0x1b0 [zfs]
[ 363.485548] dsl_pool_sync+0xb8/0x440 [zfs]
[ 363.485635] spa_sync+0x42d/0xdb0 [zfs]
[ 363.485694] txg_sync_thread+0x2d4/0x4a0 [zfs]
[ 363.485719] ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[ 363.485722] thread_generic_wrapper+0x72/0x80 [spl]
[ 363.485724] kthread+0x109/0x140
[ 363.485727] ? __thread_exit+0x20/0x20 [spl]
[ 363.485728] ? kthread_create_on_node+0x70/0x70
[ 363.485728] ? kthread_create_on_node+0x70/0x70
[ 363.485730] ret_from_fork+0x1f/0x30
[ 387.639108] kauditd_printk_skb: 1 callbacks suppressed
[ 387.639109] audit: type=1400 audit(1517141349.643:29): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=10490 comm="(ionclean)" flags="rw, rslave"
[ 484.317903] INFO: task txg_sync:785 blocked for more than 120 seconds.
[ 484.317931] Tainted: P O 4.13.13-5-pve #1
[ 484.317955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 484.317997] txg_sync D 0 785 2 0x00000000
[ 484.318000] Call Trace:
[ 484.318007] __schedule+0x3cc/0x850
[ 484.318009] schedule+0x36/0x80
[ 484.318012] io_schedule+0x16/0x40
[ 484.318019] cv_wait_common+0xb2/0x140 [spl]
[ 484.318022] ? wait_woken+0x80/0x80
[ 484.318026] __cv_wait_io+0x18/0x20 [spl]
[ 484.318069] zio_wait+0xfd/0x1b0 [zfs]
[ 484.318094] dsl_pool_sync+0xb8/0x440 [zfs]
[ 484.318153] spa_sync+0x42d/0xdb0 [zfs]
[ 484.318197] txg_sync_thread+0x2d4/0x4a0 [zfs]
[ 484.318222] ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[ 484.318225] thread_generic_wrapper+0x72/0x80 [spl]
[ 484.318227] kthread+0x109/0x140
[ 484.318229] ? __thread_exit+0x20/0x20 [spl]
[ 484.318230] ? kthread_create_on_node+0x70/0x70
[ 484.318231] ? kthread_create_on_node+0x70/0x70
[ 484.318233] ret_from_fork+0x1f/0x30
[ 605.154228] INFO: task txg_sync:785 blocked for more than 120 seconds.
[ 605.154258] Tainted: P O 4.13.13-5-pve #1
[ 605.154277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
(6x more the same)
 
Last edited:
Somehow it's actually worst now. Today I noticed that server is getting stuck every 4 hours. Load will shoot up to like 70 (max. should be 16) for like 10 minutes. And whole server will start slowing down so much that even VMs will get affected eventually and different services will stop answering different requests (Proxmox itself is totally unusable at this time as what ever I do will timeout).
According to top IO delay is only 12 %...
 
the trace you posted just tells you that ZFS is waiting for I/O to complete.. are your disks busy when this is happening? you can compare zpool iostat and iostat output, the former even gives you various histogram output options. you can also temporarily enable the ZFS txg_history to get an overview over ZFS transaction sizes and latencies.
 
the trace you posted just tells you that ZFS is waiting for I/O to complete.. are your disks busy when this is happening? you can compare zpool iostat and iostat output, the former even gives you various histogram output options. you can also temporarily enable the ZFS txg_history to get an overview over ZFS transaction sizes and latencies.
I think that post #20 shows what is exactly happening to disc (I will try testing it again tonight). What I don't understand is that even when I try to operate with data on one pair of DATA SSDs io will always hit system SSDs.

Right now I will also test to drastically lower ARC so far it was set to 32GB max... And tune few other parameters - I played with them before without any noticeable difference but as I don't have any other Ideas I may as well test everything again...

Code:
options zfs zfs_arc_max=5120000000
options zfs zfs_arc_min=1024000000
options zfs zfs_prefetch_disable=1
options zfs zfs_txg_timeout=5
 
the trace you posted just tells you that ZFS is waiting for I/O to complete.. are your disks busy when this is happening? you can compare zpool iostat and iostat output, the former even gives you various histogram output options. you can also temporarily enable the ZFS txg_history to get an overview over ZFS transaction sizes and latencies.
Ok server just froze again so I quickly grabbed some stats:

Code:
zpool iostat 2
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G     47     53  1.53M  1.40M
rpool       7.25G  22.5G      1     25  12.6K   323K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0    100  4.00K  1.79M
rpool       7.25G  22.5G      0      2      0   124K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0      0  4.00K  16.0K
rpool       7.25G  22.5G      0      2      0   122K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0      0      0      0
rpool       7.25G  22.5G      0      2      0  66.0K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      5     99  30.0K  2.51M
rpool       7.25G  22.5G      0      1      0   126K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0      0      0      0
rpool       7.25G  22.5G      0      2      0   122K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0     94  12.0K  2.12M
rpool       7.25G  22.5G      0      2      0  78.0K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      4      5  26.0K  96.0K
rpool       7.28G  22.5G      0     35  4.00K  2.15M
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0      3      0  16.0K
rpool       7.28G  22.5G      0     14  8.00K   926K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      1    100  8.00K  3.88M
rpool       7.28G  22.5G      0      4  4.00K   276K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0      1  26.0K  12.0K
rpool       7.28G  22.5G      0      4      0   140K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0     94      0  2.03M
rpool       7.28G  22.5G      0      7  4.00K   280K
----------  -----  -----  -----  -----  -----  -----
DP1          436G   452G      0      0      0      0
rpool       7.28G  22.5G      0      2  2.00K   168K
----------  -----  -----  -----  -----  -----  -----

Code:
iostat -x -d 2
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
loop0             0.00     0.00    0.00    1.00     0.00     4.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.50    0.00    1.00     0.00     6.00    12.00     0.02   18.00    0.00   18.00  18.00   1.80
sdd               0.00     0.00    1.50   58.00     2.00  1478.00    49.75     0.01    0.20    2.67    0.14   0.13   0.80
sde               0.00     0.00    1.00   60.00     0.00  1478.00    48.46     0.01    0.16    4.00    0.10   0.13   0.80
sdf               0.00     0.00    1.00    2.00     4.00     8.00     8.00     3.89 1254.00  816.00 1473.00 333.33 100.00
sdg               0.00     0.00    1.00    3.00     6.00    82.00    44.00     1.24  309.50  106.00  377.33  79.50  31.80
zd0               0.00     0.00    0.00   15.50     0.00    62.00     8.00    86.97 3120.00    0.00 3120.00  86.32 133.80
zd16              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
zd32              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
loop0             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    2.00    1.50     2.00    20.00    12.57     0.00    0.57    1.00    0.00   0.57   0.20
sde               0.00     0.00    2.00    1.50     2.00    20.00    12.57     0.00    0.57    1.00    0.00   0.57   0.20
sdf               0.00     0.00    0.00    1.00     0.00    74.00   148.00     3.41 2336.00    0.00 2336.00 1000.00 100.00
sdg               0.00     0.00    0.50    0.00     0.00     0.00     0.00     0.01   12.00   12.00    0.00  12.00   0.60
zd0               0.00     0.00    0.00    0.00     0.00     0.00     0.00    65.00    0.00    0.00    0.00   0.00 100.00
zd16              0.00     0.00    0.00    3.50     0.00    14.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
zd32              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
loop0             0.00     0.00    0.00    0.50     0.00     2.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.50    0.00    1.00     0.00     6.00    12.00     0.01    8.00    0.00    8.00   8.00   0.80
sdd               0.00     0.00    2.50    3.00     0.00    40.00    14.55     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    3.00    3.00     2.00    40.00    14.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.50    5.00   24.50     4.00   642.00    43.80     3.48  219.32  141.20  235.27  33.90 100.00
sdg               0.00     0.50    5.00   24.50     8.00   636.00    43.66     1.83   60.14   24.40   67.43  22.98  67.80
zd0               0.00     0.00    0.00  145.50     0.00   582.00     8.00    64.57 1503.35    0.00 1503.35   6.87 100.00
zd16              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
zd32              0.00     0.00    0.00   48.50     0.00   512.00    21.11     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
loop0             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    4.00   53.50     8.00  1410.00    49.32     0.01    0.21    0.50    0.19   0.14   0.80
sde               0.00     0.00    2.00   52.00     0.00  1410.00    52.22     0.01    0.15    1.00    0.12   0.11   0.60
sdf               0.00     1.50    6.50   30.50     8.00  1026.00    55.89     3.24   85.41   51.08   92.72  27.03 100.00
sdg               0.00     1.00    5.50   32.50     0.00  1004.00    52.84     2.67   68.95   33.09   75.02  25.32  96.20
zd0               0.00     0.00    0.00  175.00     0.00   700.00     8.00    64.00  343.19    0.00  343.19   5.71 100.00
zd16              0.00     0.00    0.00    2.00     0.00     8.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
zd32              0.00     0.00    0.00   26.50     0.00   224.00    16.91     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
loop0             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    1.00    1.00     0.00    20.00    20.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    1.50    1.00    10.00    20.00    24.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    1.00   12.50     0.00   188.00    27.85     2.43  152.74  190.00  149.76  74.07 100.00
sdg               0.00     1.00    1.00    3.00     0.00   104.00    52.00     2.98  642.00  404.00  721.33 250.00 100.00
zd0               0.00     0.00    0.00   32.00     0.00   128.00     8.00    64.00 1130.75    0.00 1130.75  31.25 100.00
zd16              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
zd32              0.00     0.00    0.00    1.00     0.00     4.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
loop0             0.00     0.00    0.00    0.50     0.00     2.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.50     0.00     2.00     8.00     0.01   20.00    0.00   20.00  20.00   1.00
sdd               0.00     0.00    1.00   45.50     0.00  1476.00    63.48     0.02    0.39   10.00    0.18   0.30   1.40
sde               0.00     0.00    1.00   49.00     0.00  1476.00    59.04     0.01    0.24    4.00    0.16   0.16   0.80
sdf               0.00     0.50    1.00    8.00     4.00   310.00    69.78     3.47  438.22   74.00  483.75 111.11 100.00
sdg               0.00     0.00    0.50    5.00     0.00   228.00    82.91     1.28  248.73    4.00  273.20  82.91  45.60
zd0               0.00     0.00    0.00   17.00     0.00    68.00     8.00    64.00 3617.41    0.00 3617.41  58.82 100.00
zd16              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
zd32              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Code:
top - 16:23:33 up 3 days,  3:20,  2 users,  load average: 72.98, 65.76, 36.83
Tasks: 1172 total,   2 running, 1158 sleeping,   0 stopped,  12 zombie
%Cpu(s):  4.2 us,  0.7 sy,  0.0 ni, 82.8 id, 12.2 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 65854380 total,  1291248 free, 21927196 used, 42635936 buff/cache
KiB Swap:  3801084 total,  1319164 free,  2481920 used. 41961236 avail Mem
 
Yesterday I tried to do backups again after few weeks, I managed to backup 7 out of 8 VMs without problem. Last one (around 200 GB in size) start slowing down server as usual but then suddenly I lost connection to server entirely, I left it over night but even after 10 hours it wasn't responding so I had to drove to site where server is located and do hard reboot (I wasn't even able to login locally to console).

This time I only have photo of monitor (sigh):

20180204_073713.jpg

PS: Backup of largest CT was running fine for about 45 minutes before problems start occurring.
 
Last edited:
Next observation:
When IO gets high (I simulated high load with rsync) I took a look at dirty pages (after I stopped rsync because I noticed increasing system load again):
Code:
cat /proc/vmstat | egrep "dirty|writeback"
nr_dirty 59
nr_writeback 32
nr_writeback_temp 0
nr_dirty_threshold 904847
nr_dirty_background_threshold 451317

nr_writeback took like 3 minutes to go to 31 and then just stayed there, there were other writes that appeared but they were quickly resolved but It didn't drop under 31. And after about 10 minutes of waiting I took a look at top and I saw system load of 130 (also the usual proxmox gui not responding etc.) so I quickly hit system reboot because I didn't want to have frozen server again on my hand (After about 15 minutes I noticed that system is still up and didn't reboot so I hit hard reset through IPMI).

My config for dirtypages (RAM 64 GB):
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 5
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 15
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200


So yeah there are writes that apparently just doesn't get written to disc at all.
 
went with the cherry-pick since 0.7.6 took too long ;)

updated kernel and ZFS packages are available on pvetest

Sorry for bothering but could we gut update to full 0.7.6 now when it's fully released to actually have all the patches?

Thank you
 
Sorry for bothering but could we gut update to full 0.7.6 now when it's fully released to actually have all the patches?

Thank you

yes, some time soon (but note that the difference only consists of some minor, unrelated bug fixes).
 
yes, some time soon (but note that the difference only consists of some minor, unrelated bug fixes).
Thank you. I'm trying to solve this directly with ZFS on Linux team as there seems to be more people with this problem even though it has been marked as fixed in 0.7.6 (and now moved to 0.7.7) (curiously lot of people reporting this are running Proxmox - that probably just because Proxmox is one of the largest Linux distro using ZFS?).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!