Proxmox 4.0 with ZFS / near lock up situation

voltage

Member
Nov 2, 2015
11
0
21
Hello $all,

I am running a Proxmox 4.0 Server for a customer, with qcow2 files and zfs as storage.

ZFS is set up this way:
- 2x2 mirror
- 2 cache devices (SSD)

The server has:
- 32gig of ram
- 16gig are used for the VMs (5, all windows XP to 7)
- 2 Xeon CPUs

For backup I run a script which creates a zfs snapshot and then rdiffs the qcow2 files. This evening the server became close to unresponsive. Once I managed to login, I saw this:

Code:
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 6626 root      20   0 2526920 1.401g   2808 S 231.5  4.5 370:28.37 kvm                                                                                               
 1104 root       1 -19       0      0      0 R  90.3  0.0   1221:11 z_wr_iss                                                                                          
 1107 root       0 -20       0      0      0 R  90.3  0.0  67:06.84 z_wr_int_1                                                                                        
 1175 root      20   0       0      0      0 R  90.3  0.0   1105:08 txg_sync                                                                                          
12554 root      20   0   18996  10184   1388 R  90.3  0.0 138:10.60 rdiff                                                                                             
13574 root      20   0 5373656 3.733g   2304 S  90.3 11.9   6339:06 kvm                                                                                               
31375 root       1 -19       0      0      0 R  90.3  0.0   0:26.16 z_wr_iss                                                                                          
31430 root       1 -19       0      0      0 R  90.3  0.0   0:20.30 z_wr_iss                                                                                          
31462 root       1 -19       0      0      0 R  90.3  0.0   0:06.71 z_wr_iss                                                                                          
31463 root       1 -19       0      0      0 R  90.3  0.0   0:03.54 z_wr_iss                                                                                          
31464 root       1 -19       0      0      0 R  90.3  0.0   0:02.51 z_wr_iss                                                                                          
31470 root       1 -19       0      0      0 R  90.3  0.0   0:02.43 z_wr_iss                                                                                          
31471 root       1 -19       0      0      0 R  90.3  0.0   0:02.50 z_wr_iss                                                                                          
 3252 root      20   0 1449640 870372   2248 S  79.1  2.6 405:42.81 kvm                                                                                               
31476 root      20   0   25864   3048   2392 R  11.3  0.0   0:00.02 top

Please note the CPU usage values. Also, in the web interface I see totally strange CPU usage values for the VMs. First I thought a fan has died and the CPUs are throtteling massively, but I checked the BMC as well as thermal throttle counters. The Xeons look fine and run with full power.

What is happening here?

Regards,

Andreas
 
Code:
# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
ssd    248G  62.5G   185G         -    20%    25%  1.00x  ONLINE  -
tank  4.53T   362G  4.18T         -     3%     7%  1.00x  ONLINE  -

# zfs list -o name,compression
NAME           COMPRESS
ssd                 lz4
tank                lz4
tank/no_cache       lz4
 
Can I set that parameter at runtime?

Code:
/sys/module/spl/parameters# echo 0 > spl_taskq_thread_dynamic

Also, will it have performance implications? Sorry I am not very deep in the architecture of ZFS.

At the moment, I am between a rock and a hard place for taking Proxmox with ZFS into a production setup. :(
 
I really don't know.

I've set it at runtime but I don't know if it is really applied. Also, regarding performance, I also have no answer.

Like yourself, I'm trying to achieve a stable (and fast) ZFS on Linux platform.

A good advice is to not read too much the "Issues" section on Github, otherwise you will start wondering why using ZoL :)
 
Well, please don't scare me ....

I knew ZFS was having trouble in borderline cases, but with my vanilla setup I had hope that it will run alright.

This night I will set the parameter:

Code:
/etc/modprobe.d# cat zfs 
options spl spl_taskq_thread_dynamic=0

And run the backup (full backup). Next week when the incremental backups run again I will see if the machine locks up again.

If it does, I am out of options. Its a production machine, so any lockup during working hours will be a bad thing.

Does anyone else have an idea or encountered that problem?

EDIT: The parameter obviously cannot be set at kernel load time, but can be changed during runtime.
 
Last edited:
ZoL has lots of issues, not ZFS.

You don't need to set the paramter. You can upgrade to latest PVE release (from test repo I think). It has 0.6.5.3.
 
Uuuuuh, when inspecting the logfiles I found this:

Code:
Oct 28 21:22:12 vm2 kernel: [99920.391085] txg_sync        D 000000000000000b     0  1175      2 0x00000000
Oct 28 21:22:12 vm2 kernel: [99920.391090]  ffff88042c7ff608 0000000000000046 ffff880829dde400 ffff88042c7f0c80
Oct 28 21:22:12 vm2 kernel: [99920.391093]  ffff88042c7ff638 ffff88042c800000 ffff8808282ea7b4 ffff88042c7f0c80
Oct 28 21:22:12 vm2 kernel: [99920.391096]  00000000ffffffff ffff8808282ea7b8 ffff88042c7ff628 ffffffff817cc077
Oct 28 21:22:12 vm2 kernel: [99920.391099] Call Trace:
Oct 28 21:22:12 vm2 kernel: [99920.391108]  [<ffffffff817cc077>] schedule+0x37/0x80
Oct 28 21:22:12 vm2 kernel: [99920.391111]  [<ffffffff817cc32e>] schedule_preempt_disabled+0xe/0x10
Oct 28 21:22:12 vm2 kernel: [99920.391115]  [<ffffffff817cdd93>] __mutex_lock_slowpath+0x93/0x110
Oct 28 21:22:12 vm2 kernel: [99920.391118]  [<ffffffff817cde33>] mutex_lock+0x23/0x40
Oct 28 21:22:12 vm2 kernel: [99920.391152]  [<ffffffffc0cc2a32>] arc_release+0x312/0x4b0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391156]  [<ffffffff810b72b8>] ? __wake_up+0x48/0x60
Oct 28 21:22:12 vm2 kernel: [99920.391199]  [<ffffffffc0d67a10>] ? zio_taskq_member.isra.6+0x80/0x80 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391216]  [<ffffffffc0cc8528>] dbuf_write.isra.14+0x88/0x3f0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391246]  [<ffffffffc0d11ba4>] ? spa_taskq_dispatch_ent+0x74/0x90 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391282]  [<ffffffffc0d66282>] ? zio_taskq_dispatch+0x92/0xa0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391319]  [<ffffffffc0d69f72>] ? zio_nowait+0x112/0x1a0 [zfs]   
Oct 28 21:22:12 vm2 kernel: [99920.391337]  [<ffffffffc0ccb742>] ? dbuf_sync_list+0xf2/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391354]  [<ffffffffc0cc77e0>] ? dmu_buf_rele+0x10/0x10 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391372]  [<ffffffffc0ccb579>] dbuf_sync_indirect+0xb9/0x190 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391389]  [<ffffffffc0ccb720>] dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391407]  [<ffffffffc0ccb5ad>] dbuf_sync_indirect+0xed/0x190 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391424]  [<ffffffffc0ccb720>] dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391441]  [<ffffffffc0cc77e0>] ? dmu_buf_rele+0x10/0x10 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391459]  [<ffffffffc0ccb5ad>] dbuf_sync_indirect+0xed/0x190 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391476]  [<ffffffffc0ccb720>] dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391493]  [<ffffffffc0ccb5ad>] dbuf_sync_indirect+0xed/0x190 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391511]  [<ffffffffc0ccb720>] dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391534]  [<ffffffffc0ce5709>] dnode_sync+0x2f9/0x8d0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391552]  [<ffffffffc0ccb720>] ? dbuf_sync_list+0xd0/0x100 [zfs]  
Oct 28 21:22:12 vm2 kernel: [99920.391567]  [<ffffffffc0cbe880>] ? l2arc_read_done+0x470/0x470 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391589]  [<ffffffffc0ce5709>] ? dnode_sync+0x2f9/0x8d0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391552]  [<ffffffffc0ccb720>] ? dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391567]  [<ffffffffc0cbe880>] ? l2arc_read_done+0x470/0x470 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391589]  [<ffffffffc0ce5709>] ? dnode_sync+0x2f9/0x8d0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391610]  [<ffffffffc0cd576e>] dmu_objset_sync_dnodes+0xce/0xf0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391630]  [<ffffffffc0cd595e>] dmu_objset_sync+0x1ce/0x2f0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391651]  [<ffffffffc0cd3e80>] ? recordsize_changed_cb+0x20/0x20 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391672]  [<ffffffffc0cd5a80>] ? dmu_objset_sync+0x2f0/0x2f0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391695]  [<ffffffffc0ced9c2>] dsl_dataset_sync+0x52/0xa0 [zfs]   
Oct 28 21:22:12 vm2 kernel: [99920.391721]  [<ffffffffc0cf6670>] dsl_pool_sync+0x90/0x420 [zfs]     
Oct 28 21:22:12 vm2 kernel: [99920.391752]  [<ffffffffc0d1bed0>] ? spa_lookup+0x60/0x60 [zfs]       
Oct 28 21:22:12 vm2 kernel: [99920.391782]  [<ffffffffc0d10a57>] spa_sync+0x357/0xb00 [zfs]         
Oct 28 21:22:12 vm2 kernel: [99920.391786]  [<ffffffff810b7052>] ? __wake_up_common+0x52/0x90       
Oct 28 21:22:12 vm2 kernel: [99920.391817]  [<ffffffffc0d2204b>] txg_sync_thread+0x3db/0x690 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391821]  [<ffffffff811cefb4>] ? __slab_free+0xa4/0x290
Oct 28 21:22:12 vm2 kernel: [99920.391825]  [<ffffffff8101d2a9>] ? sched_clock+0x9/0x10  
Oct 28 21:22:12 vm2 kernel: [99920.391858]  [<ffffffffc0d21c70>] ? txg_delay+0x160/0x160 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391866]  [<ffffffffc0201f8a>] thread_generic_wrapper+0x7a/0x90 [spl]
Oct 28 21:22:12 vm2 kernel: [99920.391872]  [<ffffffffc0201f10>] ? __thread_exit+0x20/0x20 [spl]
Oct 28 21:22:12 vm2 kernel: [99920.391875]  [<ffffffff810957db>] kthread+0xdb/0x100
Oct 28 21:22:12 vm2 kernel: [99920.391878]  [<ffffffff81095700>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 28 21:22:12 vm2 kernel: [99920.391881]  [<ffffffff817d019f>] ret_from_fork+0x3f/0x70
Oct 28 21:22:12 vm2 kernel: [99920.391883]  [<ffffffff81095700>] ? kthread_create_on_node+0x1c0/0x1c0

This was a few days before the lockup happened.

Now I am scared of the integrity of my data. Isn't txg_sync the thread that commits data form the writecache to the disk.

Could one of the devs briefly comment if this is safe to run in production?

Thanks,

Andreas


EDIT: I am on the pve-enterprise rep and want to stay there. Will wait until the new ZoL package has been tested and set the parameter manually in the meantime.
 
Last edited: