ZFS Performance drop drastically after upgrade

Lock

New Member
Jan 1, 2019
6
0
1
29
I'm happy to upgrade Proxmox6 a month ago(Feb/20), however ZFS IO delay increase higher and higher.
I bench zpool rpool with fio, sequential write only get IOPS=28, BW=115KiB/s. Due to cache, sequential read is IOPS=206k, BW=806MiB/s. But it only get about 30~100MB/s to download older files from Proxmox, which seem to single disk speed.

It is similar to SIMD issue at GitHub #8836, but Promox is running in kernel 5.13-8 now.

I still can not figure out what is question. But I think that something must definitely wrong. Before more reboot, I turn here for HELP! ANY IDEA IS WELCOME!

Bench Script
Code:
fio --filename=test --sync=1 --rw=read --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test
fio --filename=test --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300 && rm test

Hardware:
12x12TB HGST HHD pass-through to bios directly, ZFS raidz2, 512GB memory.

Proxmox version:
Code:
root@pwr:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-6
pve-kernel-4.15: 5.4-13
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-4.15.18-25-pve: 4.15.18-53
pve-kernel-4.15.18-10-pve: 4.15.18-32
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

Zpool status
Code:
root@pwr:~# zpool status
  pool: rpool
state: ONLINE
  scan: scrub repaired 0B in 5 days 06:31:36 with 0 errors on Fri Mar 27 05:36:16 2020
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda3    ONLINE       0     0     0
            sdb3    ONLINE       0     0     0
            sdc3    ONLINE       0     0     0
            sdd3    ONLINE       0     0     0
            sde3    ONLINE       0     0     0
            sdf3    ONLINE       0     0     0
            sdg3    ONLINE       0     0     0
            sdh3    ONLINE       0     0     0
            sdi3    ONLINE       0     0     0
            sdj3    ONLINE       0     0     0
            sdk3    ONLINE       0     0     0
            sdl3    ONLINE       0     0     0

errors: No known data errors

Code:
root@pwr:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool   109T  93.4T  15.6T        -         -    18%    85%  1.00x    ONLINE  -

Code:
root@pwr:~# zfs get all rpool
NAME   PROPERTY              VALUE                   SOURCE
rpool  type                  filesystem              -
rpool  creation              一 4月  1 10:35 2019  -
rpool  used                  71.2T                   -
rpool  available             9.25T                   -
rpool  referenced            238K                    -
rpool  compressratio         1.12x                   -
rpool  mounted               yes                     -
rpool  quota                 none                    default
rpool  reservation           none                    default
rpool  recordsize            128K                    default
rpool  mountpoint            /rpool                  default
rpool  sharenfs              off                     default
rpool  checksum              on                      default
rpool  compression           lz4                     local
rpool  atime                 off                     local
rpool  devices               on                      default
rpool  exec                  on                      default
rpool  setuid                on                      default
rpool  readonly              off                     default
rpool  zoned                 off                     default
rpool  snapdir               hidden                  default
rpool  aclinherit            restricted              default
rpool  createtxg             1                       -
rpool  canmount              on                      default
rpool  xattr                 sa                      local
rpool  copies                1                       default
rpool  version               5                       -
rpool  utf8only              off                     -
rpool  normalization         none                    -
rpool  casesensitivity       sensitive               -
rpool  vscan                 off                     default
rpool  nbmand                off                     default
rpool  sharesmb              off                     default
rpool  refquota              none                    default
rpool  refreservation        none                    default
rpool  guid                  15681326581931532947    -
rpool  primarycache          all                     default
rpool  secondarycache        all                     default
rpool  usedbysnapshots       0B                      -
rpool  usedbydataset         238K                    -
rpool  usedbychildren        71.2T                   -
rpool  usedbyrefreservation  0B                      -
rpool  logbias               latency                 default
rpool  objsetid              51                      -
rpool  dedup                 off                     default
rpool  mlslabel              none                    default
rpool  sync                  standard                local
rpool  dnodesize             legacy                  default
rpool  refcompressratio      1.00x                   -
rpool  written               238K                    -
rpool  logicalused           79.5T                   -
rpool  logicalreferenced     46K                     -
rpool  volmode               default                 default
rpool  filesystem_limit      none                    default
rpool  snapshot_limit        none                    default
rpool  filesystem_count      none                    default
rpool  snapshot_count        none                    default
rpool  snapdev               hidden                  default
rpool  acltype               off                     default
rpool  context               none                    default
rpool  fscontext             none                    default
rpool  defcontext            none                    default
rpool  rootcontext           none                    default
rpool  relatime              off                     default
rpool  redundant_metadata    all                     default
rpool  overlay               off                     default
rpool  encryption            off                     default
rpool  keylocation           none                    default
rpool  keyformat             none                    default
rpool  pbkdf2iters           0                       default
rpool  special_small_blocks  0                       default



1585665312214.png
 
Hi,

As I see/read zfs 0.8 have some performance problem. Because you have a raidz2 setup, your iops must be the same as a single hdd.

It would be intersting if you have some older tests during pmx 5.x so we can compare with what you get now.

Some notes about your zfs setup:
- to many hdds in this pool -> bad iops, a lot of time for scrub or replace a faulty hdd, ver

- if I would be in your shoes, I would recreate a new design, using at least a raid10 (2 striped raidz1-or-2)
 
Thanks!
Because you have a raidz2 setup, your iops must be the same as a single hdd.
Yes. But stream speed should be much faster than single hdd. I'm sorry that I don't have pmx5.x test for you. It is somehow careless and out of expectation for me. The regular write/read speed is about 300MB/s~400MB/s(even faster) before upgrade.

With more internet research, I agree your note about zfs layout. This zfs pool mostly is used to store microscopy images in our small lab. We are happy in previous year to find that raidz2 iops can handle camera data stream and data analysis. But now, it is sad that no second machine migrate the existed data(up to 80T).
 
This zfs pool mostly is used to store microscopy images in our small lab. We are happy in previous year to find that raidz2 iops can handle camera data stream and data analysis. But now, it is sad that no second machine migrate the existed data(up to 80T).

Hi again,

You need to change your zfs design. If you have bad luck and only one hdd will be broken, it will take maybe days until your new disk will be fully operational / resilver. In this time because of high stress you have a good probability to lose another disk. During resilver process your iops will be very very low.

This scenario is acceptable for your case? If the answer is no, the do what is better: change your zfs pool.

Good luck / Bafta !
 
Hi again,

You need to change your zfs design. If you have bad luck and only one hdd will be broken, it will take maybe days until your new disk will be fully operational / resilver. In this time because of high stress you have a good probability to lose another disk. During resilver process your iops will be very very low.

This scenario is acceptable for your case? If the answer is no, the do what is better: change your zfs pool.

Good luck / Bafta !
Finally, we prepare to divide 12 hdd to two raidz.
Before our new machine is well-done, I will try to figure out this question firstly. Slow read/write speed will waste lots of time to transfer old data to new machine.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!