Again bad ZFS Performance

cpzengel

Renowned Member
Nov 12, 2015
221
26
93
Aschaffenburg, Germany
zfs.rocks
Again horrible ZFS Performance

4 x Constellation in Raid 10
Only getting 150-200MB write on DD, in VM only 10MB/sec (Virtio, Win10, no cache, also tested qcow with writeback on ZFS Dataset)
Log Device does not seem to be used at all!
VMs causing massive load up to 25! (2 x 4 Cores)

Already tried
  • Offline each HDD
  • Tested Smart
  • Upgrade zpool and zfs
  • Using NVMe for Log an Cache
  • Disabling Compression
  • Benchmark with dd writing zeros to file
  • Performance in same System with 4 x SSD up to 800MB write, no load
I am out of Ideas
Perhaps its about the Fragmentation
Any advice welcome!

NAME STATE READ WRITE CKSUM

Raid10 ONLINE 0 0 0

mirror-0 ONLINE 0 0 0

wwn-0x5000c5006361d943 ONLINE 0 0 0

scsi-35000c500636267bb ONLINE 0 0 0

mirror-1 ONLINE 0 0 0

wwn-0x5000c500634ea057 ONLINE 0 0 0

wwn-0x5000c5006360a7eb ONLINE 0 0 0

logs

nvme0n1p1 ONLINE 0 0 0

cache

nvme0n1p2 ONLINE 0 0 0




root@pve252:~# zpool get all Raid10

NAME PROPERTY VALUE SOURCE

Raid10 size 3.62T -

Raid10 capacity 61% -

Raid10 altroot - default

Raid10 health ONLINE -

Raid10 guid 10200424180081588444 -

Raid10 version - default

Raid10 bootfs - default

Raid10 delegation on default

Raid10 autoreplace off default

Raid10 cachefile - default

Raid10 failmode wait default

Raid10 listsnapshots off default

Raid10 autoexpand off default

Raid10 dedupditto 0 default

Raid10 dedupratio 1.00x -

Raid10 free 1.39T -

Raid10 allocated 2.23T -

Raid10 readonly off -

Raid10 ashift 12 local

Raid10 comment - default

Raid10 expandsize - -

Raid10 freeing 0 -

Raid10 fragmentation 45% -

Raid10 leaked 0 -

Raid10 multihost off default

Raid10 feature@async_destroy enabled local

Raid10 feature@empty_bpobj active local

Raid10 feature@lz4_compress active local

Raid10 feature@multi_vdev_crash_dump enabled local

Raid10 feature@spacemap_histogram active local

Raid10 feature@enabled_txg active local

Raid10 feature@hole_birth active local

Raid10 feature@extensible_dataset active local

Raid10 feature@embedded_data active local

Raid10 feature@bookmarks enabled local

Raid10 feature@filesystem_limits enabled local

Raid10 feature@large_blocks enabled local

Raid10 feature@large_dnode enabled local

Raid10 feature@sha512 enabled local

Raid10 feature@skein enabled local

Raid10 feature@edonr enabled local

Raid10 feature@userobj_accounting active​


proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)

pve-manager: 5.1-41 (running version: 5.1-41/0b958203)

pve-kernel-4.13.13-2-pve: 4.13.13-32

libpve-http-server-perl: 2.0-8

lvm2: 2.02.168-pve6

corosync: 2.4.2-pve3

libqb0: 1.0.1-1

pve-cluster: 5.0-19

qemu-server: 5.0-18

pve-firmware: 2.0-3

libpve-common-perl: 5.0-25



libpve-guest-common-perl: 2.0-14

libpve-access-control: 5.0-7

libpve-storage-perl: 5.0-17

pve-libspice-server1: 0.12.8-3

vncterm: 1.5-3

pve-docs: 5.1-12

pve-qemu-kvm: 2.9.1-5

pve-container: 2.0-18

pve-firewall: 3.0-5

pve-ha-manager: 2.0-4

ksm-control-daemon: 1.2-2

glusterfs-client: 3.8.8-1

lxc-pve: 2.1.1-2

lxcfs: 2.0.8-1

criu: 2.11.1-1~bpo90

novnc-pve: 0.6-4

smartmontools: 6.5+svn4324-1

zfsutils-linux: 0.7.3-pve1~bpo9


 
you talk about a nvme device, which product do you run in detail?
 
I'm seeing a similar issue on 2 of my servers. I think this could be related to issues with zfsonlinux.

Check zfsonlinux on github for issues 6171 and 6852
PS: I'm new to the forum and cannot post links...
 
In the problem with ZFS I always look at HDD load (atop - read/write/busy) and ZFS stats (arc_summary/zpool iostat -v 1) and server ram usage. With low free ram I had problem with heavy writes (undetermined server restart).
 
I would suggest create new dataset and try:
1. set zfs option (for this new dataset) compression to "lz4" (or off)
2. change volblocksize to 4k ( recordsize in case you mount ZFS as folder and store qcow2/raw VM disk images)
3. set zfs option xattr to "sa"
4. set zfs atime option to "off"

Try using this options one by one. And keep in mind that this options should be set before you create VM disk image on that dataset (especially new volblocksize/recordsize and compression are used only for "new" written data

One more point:
There was an issue with ARC hits in zfsonlinux 0.7.4 which had been fixed but not released yet. It could also impacts overall performance forcing ZFS reads data from HDDs even though they present in ARC cache
 
So in the moring I had 6GB ARC, now 2GB and horrible Peformance.
It happened obviously when I stopped a VM with a ZVOL Datastore.
After reboot 6GB went bak

proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)

pve-manager: 5.1-41 (running version: 5.1-41/0b958203)

pve-kernel-4.13.13-2-pve: 4.13.13-32

libpve-http-server-perl: 2.0-8

lvm2: 2.02.168-pve6

corosync: 2.4.2-pve3

libqb0: 1.0.1-1

pve-cluster: 5.0-19

qemu-server: 5.0-18

pve-firmware: 2.0-3

libpve-common-perl: 5.0-25

libpve-guest-common-perl: 2.0-14

libpve-access-control: 5.0-7

libpve-storage-perl: 5.0-17

pve-libspice-server1: 0.12.8-3

vncterm: 1.5-3

pve-docs: 5.1-12

pve-qemu-kvm: 2.9.1-5

pve-container: 2.0-18

pve-firewall: 3.0-5

pve-ha-manager: 2.0-4

ksm-control-daemon: 1.2-2

glusterfs-client: 3.8.8-1

lxc-pve: 2.1.1-2

lxcfs: 2.0.8-1

criu: 2.11.1-1~bpo90

novnc-pve: 0.6-4

smartmontools: 6.5+svn4324-1

zfsutils-linux: 0.7.3-pve1~bpo9​
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!