Proxmox 4.0 with ZFS / near lock up situation

voltage · Nov 2, 2015

Hello $all,

I am running a Proxmox 4.0 Server for a customer, with qcow2 files and zfs as storage.

ZFS is set up this way:
- 2x2 mirror
- 2 cache devices (SSD)

The server has:
- 32gig of ram
- 16gig are used for the VMs (5, all windows XP to 7)
- 2 Xeon CPUs

For backup I run a script which creates a zfs snapshot and then rdiffs the qcow2 files. This evening the server became close to unresponsive. Once I managed to login, I saw this:

Code:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 6626 root      20   0 2526920 1.401g   2808 S 231.5  4.5 370:28.37 kvm                                                                                               
 1104 root       1 -19       0      0      0 R  90.3  0.0   1221:11 z_wr_iss                                                                                          
 1107 root       0 -20       0      0      0 R  90.3  0.0  67:06.84 z_wr_int_1                                                                                        
 1175 root      20   0       0      0      0 R  90.3  0.0   1105:08 txg_sync                                                                                          
12554 root      20   0   18996  10184   1388 R  90.3  0.0 138:10.60 rdiff                                                                                             
13574 root      20   0 5373656 3.733g   2304 S  90.3 11.9   6339:06 kvm                                                                                               
31375 root       1 -19       0      0      0 R  90.3  0.0   0:26.16 z_wr_iss                                                                                          
31430 root       1 -19       0      0      0 R  90.3  0.0   0:20.30 z_wr_iss                                                                                          
31462 root       1 -19       0      0      0 R  90.3  0.0   0:06.71 z_wr_iss                                                                                          
31463 root       1 -19       0      0      0 R  90.3  0.0   0:03.54 z_wr_iss                                                                                          
31464 root       1 -19       0      0      0 R  90.3  0.0   0:02.51 z_wr_iss                                                                                          
31470 root       1 -19       0      0      0 R  90.3  0.0   0:02.43 z_wr_iss                                                                                          
31471 root       1 -19       0      0      0 R  90.3  0.0   0:02.50 z_wr_iss                                                                                          
 3252 root      20   0 1449640 870372   2248 S  79.1  2.6 405:42.81 kvm                                                                                               
31476 root      20   0   25864   3048   2392 R  11.3  0.0   0:00.02 top

Please note the CPU usage values. Also, in the web interface I see totally strange CPU usage values for the VMs. First I thought a fan has died and the CPUs are throtteling massively, but I checked the BMC as well as thermal throttle counters. The Xeons look fine and run with full power.

What is happening here?

Regards,

Andreas

sigxcpu · Nov 3, 2015

Please post the output of:

Code:

zpool list

Code:

zfs list -o name,compression

voltage · Nov 3, 2015

Code:

# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
ssd    248G  62.5G   185G         -    20%    25%  1.00x  ONLINE  -
tank  4.53T   362G  4.18T         -     3%     7%  1.00x  ONLINE  -

# zfs list -o name,compression
NAME           COMPRESS
ssd                 lz4
tank                lz4
tank/no_cache       lz4

sigxcpu · Nov 3, 2015

Thanks. I've asked for those because it was a (closed now) issue regarding high CPU usage/lockups on almost full pools and/or compression (gzip), but these look fine.

Did you try the fix from 0.6.5.3 (not available in Proxmox yet)? https://github.com/zfsonlinux/spl/pull/484

voltage · Nov 3, 2015

Can I set that parameter at runtime?

Code:

/sys/module/spl/parameters# echo 0 > spl_taskq_thread_dynamic

Also, will it have performance implications? Sorry I am not very deep in the architecture of ZFS.

At the moment, I am between a rock and a hard place for taking Proxmox with ZFS into a production setup.

sigxcpu · Nov 3, 2015

I really don't know.

I've set it at runtime but I don't know if it is really applied. Also, regarding performance, I also have no answer.

Like yourself, I'm trying to achieve a stable (and fast) ZFS on Linux platform.

A good advice is to not read too much the "Issues" section on Github, otherwise you will start wondering why using ZoL

voltage · Nov 6, 2015

Well, please don't scare me ....

I knew ZFS was having trouble in borderline cases, but with my vanilla setup I had hope that it will run alright.

This night I will set the parameter:

Code:

/etc/modprobe.d# cat zfs 
options spl spl_taskq_thread_dynamic=0

And run the backup (full backup). Next week when the incremental backups run again I will see if the machine locks up again.

If it does, I am out of options. Its a production machine, so any lockup during working hours will be a bad thing.

Does anyone else have an idea or encountered that problem?

EDIT: The parameter obviously cannot be set at kernel load time, but can be changed during runtime.

sigxcpu · Nov 6, 2015

ZoL has lots of issues, not ZFS.

You don't need to set the paramter. You can upgrade to latest PVE release (from test repo I think). It has 0.6.5.3.

voltage · Nov 6, 2015

Uuuuuh, when inspecting the logfiles I found this:

Code:

Oct 28 21:22:12 vm2 kernel: [99920.391085] txg_sync        D 000000000000000b     0  1175      2 0x00000000
Oct 28 21:22:12 vm2 kernel: [99920.391090]  ffff88042c7ff608 0000000000000046 ffff880829dde400 ffff88042c7f0c80
Oct 28 21:22:12 vm2 kernel: [99920.391093]  ffff88042c7ff638 ffff88042c800000 ffff8808282ea7b4 ffff88042c7f0c80
Oct 28 21:22:12 vm2 kernel: [99920.391096]  00000000ffffffff ffff8808282ea7b8 ffff88042c7ff628 ffffffff817cc077
Oct 28 21:22:12 vm2 kernel: [99920.391099] Call Trace:
Oct 28 21:22:12 vm2 kernel: [99920.391108]  [<ffffffff817cc077>] schedule+0x37/0x80
Oct 28 21:22:12 vm2 kernel: [99920.391111]  [<ffffffff817cc32e>] schedule_preempt_disabled+0xe/0x10
Oct 28 21:22:12 vm2 kernel: [99920.391115]  [<ffffffff817cdd93>] __mutex_lock_slowpath+0x93/0x110
Oct 28 21:22:12 vm2 kernel: [99920.391118]  [<ffffffff817cde33>] mutex_lock+0x23/0x40
Oct 28 21:22:12 vm2 kernel: [99920.391152]  [<ffffffffc0cc2a32>] arc_release+0x312/0x4b0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391156]  [<ffffffff810b72b8>] ? __wake_up+0x48/0x60
Oct 28 21:22:12 vm2 kernel: [99920.391199]  [<ffffffffc0d67a10>] ? zio_taskq_member.isra.6+0x80/0x80 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391216]  [<ffffffffc0cc8528>] dbuf_write.isra.14+0x88/0x3f0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391246]  [<ffffffffc0d11ba4>] ? spa_taskq_dispatch_ent+0x74/0x90 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391282]  [<ffffffffc0d66282>] ? zio_taskq_dispatch+0x92/0xa0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391319]  [<ffffffffc0d69f72>] ? zio_nowait+0x112/0x1a0 [zfs]   
Oct 28 21:22:12 vm2 kernel: [99920.391337]  [<ffffffffc0ccb742>] ? dbuf_sync_list+0xf2/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391354]  [<ffffffffc0cc77e0>] ? dmu_buf_rele+0x10/0x10 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391372]  [<ffffffffc0ccb579>] dbuf_sync_indirect+0xb9/0x190 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391389]  [<ffffffffc0ccb720>] dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391407]  [<ffffffffc0ccb5ad>] dbuf_sync_indirect+0xed/0x190 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391424]  [<ffffffffc0ccb720>] dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391441]  [<ffffffffc0cc77e0>] ? dmu_buf_rele+0x10/0x10 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391459]  [<ffffffffc0ccb5ad>] dbuf_sync_indirect+0xed/0x190 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391476]  [<ffffffffc0ccb720>] dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391493]  [<ffffffffc0ccb5ad>] dbuf_sync_indirect+0xed/0x190 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391511]  [<ffffffffc0ccb720>] dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391534]  [<ffffffffc0ce5709>] dnode_sync+0x2f9/0x8d0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391552]  [<ffffffffc0ccb720>] ? dbuf_sync_list+0xd0/0x100 [zfs]  
Oct 28 21:22:12 vm2 kernel: [99920.391567]  [<ffffffffc0cbe880>] ? l2arc_read_done+0x470/0x470 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391589]  [<ffffffffc0ce5709>] ? dnode_sync+0x2f9/0x8d0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391552]  [<ffffffffc0ccb720>] ? dbuf_sync_list+0xd0/0x100 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391567]  [<ffffffffc0cbe880>] ? l2arc_read_done+0x470/0x470 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391589]  [<ffffffffc0ce5709>] ? dnode_sync+0x2f9/0x8d0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391610]  [<ffffffffc0cd576e>] dmu_objset_sync_dnodes+0xce/0xf0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391630]  [<ffffffffc0cd595e>] dmu_objset_sync+0x1ce/0x2f0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391651]  [<ffffffffc0cd3e80>] ? recordsize_changed_cb+0x20/0x20 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391672]  [<ffffffffc0cd5a80>] ? dmu_objset_sync+0x2f0/0x2f0 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391695]  [<ffffffffc0ced9c2>] dsl_dataset_sync+0x52/0xa0 [zfs]   
Oct 28 21:22:12 vm2 kernel: [99920.391721]  [<ffffffffc0cf6670>] dsl_pool_sync+0x90/0x420 [zfs]     
Oct 28 21:22:12 vm2 kernel: [99920.391752]  [<ffffffffc0d1bed0>] ? spa_lookup+0x60/0x60 [zfs]       
Oct 28 21:22:12 vm2 kernel: [99920.391782]  [<ffffffffc0d10a57>] spa_sync+0x357/0xb00 [zfs]         
Oct 28 21:22:12 vm2 kernel: [99920.391786]  [<ffffffff810b7052>] ? __wake_up_common+0x52/0x90       
Oct 28 21:22:12 vm2 kernel: [99920.391817]  [<ffffffffc0d2204b>] txg_sync_thread+0x3db/0x690 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391821]  [<ffffffff811cefb4>] ? __slab_free+0xa4/0x290
Oct 28 21:22:12 vm2 kernel: [99920.391825]  [<ffffffff8101d2a9>] ? sched_clock+0x9/0x10  
Oct 28 21:22:12 vm2 kernel: [99920.391858]  [<ffffffffc0d21c70>] ? txg_delay+0x160/0x160 [zfs]
Oct 28 21:22:12 vm2 kernel: [99920.391866]  [<ffffffffc0201f8a>] thread_generic_wrapper+0x7a/0x90 [spl]
Oct 28 21:22:12 vm2 kernel: [99920.391872]  [<ffffffffc0201f10>] ? __thread_exit+0x20/0x20 [spl]
Oct 28 21:22:12 vm2 kernel: [99920.391875]  [<ffffffff810957db>] kthread+0xdb/0x100
Oct 28 21:22:12 vm2 kernel: [99920.391878]  [<ffffffff81095700>] ? kthread_create_on_node+0x1c0/0x1c0
Oct 28 21:22:12 vm2 kernel: [99920.391881]  [<ffffffff817d019f>] ret_from_fork+0x3f/0x70
Oct 28 21:22:12 vm2 kernel: [99920.391883]  [<ffffffff81095700>] ? kthread_create_on_node+0x1c0/0x1c0

This was a few days before the lockup happened.

Now I am scared of the integrity of my data. Isn't txg_sync the thread that commits data form the writecache to the disk.

Could one of the devs briefly comment if this is safe to run in production?

Thanks,

Andreas

EDIT: I am on the pve-enterprise rep and want to stay there. Will wait until the new ZoL package has been tested and set the parameter manually in the meantime.

Search

Search

Proxmox 4.0 with ZFS / near lock up situation

voltage

Member

sigxcpu

Well-Known Member

voltage

Member

sigxcpu

Well-Known Member

voltage

Member

sigxcpu

Well-Known Member

voltage

Member

sigxcpu

Well-Known Member

voltage

Member