fsync performance oddities?

jayg30

Member
Nov 8, 2017
50
4
13
38
I have 3 different nodes that I'm messing with. All 3 nodes are the same (servers, CPU, RAM, HBAs, etc),
Code:
Intel R2224GZ4GC4 (barebones)
Intel S2600GZ (motherboard)
Intel E5-2670 x 2 (CPUs)
Hynix HMT31GR7CFR4A-H9 (8GB x 16) (RAM)
LSI 9340-8i (SAS3008 flashed to IT mode)

They all use ZFS. Was actually upgrading them to PVE6. And I was doing some testing and can't understand whats going on with the pveperf FSYNC/SEC results. My understanding is that FSYNC with ZFS is handled by the ZIL, for which you can dedicate a SLOG(s) to handle if needed. So if you have a SLOG, the backing pool shouldn't matter. Without a SLOG the sync writes are written to the ZIL on the pool.

With that in mind, I'm seeing HALF the performance between Node 1 and Node 3. they both using the same SLOG device though, an Intel DC S3710 400GB. The only differences between these 2 nodes are;
  • the backing disks (node 1 has more of them) but my understanding is this doesn't matter
  • Node 1 I provisioned as 2 pools (rpool and tank) instead of the default manner on Node 3 (just rpool)
  • Node 1 is running PVE 5.4 and Node 3 is running 6
  • Node 1 is currently not hosting any VM's while Node 3 is hosting 2 small VMs
Node 2 was a bit strange to me as well. The FSYNC slots between Node 1 and 3. It has no SLOG device, but the pool is made of 6 SSD's in 3 mirrors. they are Samsung PM853T disks. They're not write optimized SSD's and maybe I'm expecting to much, but I thought they would perform better then the single Intel SLOG SSD.


Condensed view of results;
Node IDPVEZFS versionDisk LayoutNotesFSYNC/SEC
15.4-13/aee6f0ec
(running kernel: 4.13.4-1-pve)
0.7.13-pve1~bpo2Intel DC 3710 SLOG + 7 mirrored vdevs (10K SAS)2 pools (rpool & tank)
No running VMs
4272.08
26.0-7/28984024
(running kernel: 5.0.21-1-pve)
0.8.1-pve23 mirrored vdevs (Samsung PM853T)1 pool (rpool)
running VMs
2213.35
36.0-7/28984024
(running kernel: 5.0.21-1-pve)
0.8.1-pve2Intel DC 3710 SLOG + 4 mirroed vdevs (10K SAS)1 pool (rpool)
running VMs
1816.79
 
CLI outputs

Code:
root@pve01:~# pveversion
pve-manager/5.4-13/aee6f0ec (running kernel: 4.13.4-1-pve)

root@pve01:~# dpkg -l zfs*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                    Version          Architecture     Description
+++-=======================-================-================-===================================================
un  zfs                     <none>           <none>           (no description available)
un  zfs-fuse                <none>           <none>           (no description available)
ii  zfs-initramfs           0.7.13-pve1~bpo2 all              OpenZFS root filesystem capabilities for Linux - in
un  zfs-zed                 <none>           <none>           (no description available)
un  zfsutils                <none>           <none>           (no description available)
ii  zfsutils-linux          0.7.13-pve1~bpo2 amd64            command-line tools to manage OpenZFS filesystems

root@pve01:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0h44m with 0 errors on Sun Nov 10 01:08:02 2019
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdd2    ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 0h3m with 0 errors on Sun Nov 10 00:27:13 2019
config:

        NAME                                          STATE     READ WRITE CKSUM
        tank                                          ONLINE       0     0     0
          mirror-0                                    ONLINE       0     0     0
            scsi-35000cca012ac64bc                    ONLINE       0     0     0
            scsi-35000cca02142a88c                    ONLINE       0     0     0
          mirror-1                                    ONLINE       0     0     0
            scsi-35000cca025a3c304                    ONLINE       0     0     0
            scsi-35000cca025a3cbdc                    ONLINE       0     0     0
          mirror-2                                    ONLINE       0     0     0
            scsi-35000cca025aa0aa8                    ONLINE       0     0     0
            scsi-35000cca025abd454                    ONLINE       0     0     0
          mirror-3                                    ONLINE       0     0     0
            scsi-35000cca025afaf54                    ONLINE       0     0     0
            scsi-35000cca025b65a60                    ONLINE       0     0     0
          mirror-4                                    ONLINE       0     0     0
            scsi-35000cca025b65e28                    ONLINE       0     0     0
            scsi-35000cca025b78a90                    ONLINE       0     0     0
          mirror-5                                    ONLINE       0     0     0
            scsi-35000cca025b82938                    ONLINE       0     0     0
            scsi-35000cca025b830c8                    ONLINE       0     0     0
          mirror-6                                    ONLINE       0     0     0
            scsi-35000cca025b8320c                    ONLINE       0     0     0
            scsi-35000cca025b8f4dc                    ONLINE       0     0     0
        logs
          ata-INTEL_SSDSC2BA400G4_BTHV516006RB400NGN  ONLINE       0     0     0

root@pve01:~# pveperf /tank/
CPU BOGOMIPS:      166000.32
REGEX/SECOND:      1857196
HD SIZE:           2795.47 GB (tank)
FSYNCS/SECOND:     4272.08
DNS EXT:           20.36 ms
DNS INT:           0.97 ms


Code:
root@pve02:~# pveversion
pve-manager/6.0-7/28984024 (running kernel: 5.0.21-1-pve)

root@pve02:~# dpkg -l zfs*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-==========================================================
un  zfs            <none>       <none>       (no description available)
un  zfs-fuse       <none>       <none>       (no description available)
ii  zfs-initramfs  0.8.1-pve2   all          OpenZFS root filesystem capabilities for Linux - initramfs
un  zfs-test       <none>       <none>       (no description available)
ii  zfs-zed        0.8.1-pve2   amd64        OpenZFS Event Daemon
un  zfsutils       <none>       <none>       (no description available)
ii  zfsutils-linux 0.8.1-pve2   amd64        command-line tools to manage OpenZFS filesystems

root@pve02:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:12 with 0 errors on Sun Nov 10 00:24:13 2019
config:

        NAME                                               STATE     READ WRITE CKSUM
        rpool                                              ONLINE       0     0     0
          mirror-0                                         ONLINE       0     0     0
            sdb2                                           ONLINE       0     0     0
            sdc2                                           ONLINE       0     0     0
          mirror-1                                         ONLINE       0     0     0
            ata-SAMSUNG_MZ7GE960HMHP-00005_S1Y2NYAFA03529  ONLINE       0     0     0
            ata-SAMSUNG_MZ7GE960HMHP-00005_S1Y2NYAFA03469  ONLINE       0     0     0
          mirror-2                                         ONLINE       0     0     0
            sdf                                            ONLINE       0     0     0
            sdg                                            ONLINE       0     0     0

root@pve02:~# pveperf
CPU BOGOMIPS:      166042.88
REGEX/SECOND:      2028409
HD SIZE:           2547.76 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2213.35
DNS EXT:           21.32 ms
DNS INT:           0.95 ms


Code:
root@pve03:~# pveversion
pve-manager/6.0-7/28984024 (running kernel: 5.0.21-1-pve)

root@pve03:~# dpkg -l zfs*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-==========================================================
un  zfs            <none>       <none>       (no description available)
un  zfs-fuse       <none>       <none>       (no description available)
ii  zfs-initramfs  0.8.1-pve2   all          OpenZFS root filesystem capabilities for Linux - initramfs
un  zfs-test       <none>       <none>       (no description available)
ii  zfs-zed        0.8.1-pve2   amd64        OpenZFS Event Daemon
un  zfsutils       <none>       <none>       (no description available)
ii  zfsutils-linux 0.8.1-pve2   amd64        command-line tools to manage OpenZFS filesystems

root@pve03:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:17 with 0 errors on Sun Nov 10 00:24:19 2019
config:

        NAME                                          STATE     READ WRITE CKSUM
        rpool                                         ONLINE       0     0     0
          mirror-0                                    ONLINE       0     0     0
            sdb2                                      ONLINE       0     0     0
            sdc2                                      ONLINE       0     0     0
          mirror-1                                    ONLINE       0     0     0
            scsi-35000cca025b5d3f4                    ONLINE       0     0     0
            scsi-35000cca0214926a8                    ONLINE       0     0     0
          mirror-2                                    ONLINE       0     0     0
            scsi-35000cca025b8b168                    ONLINE       0     0     0
            scsi-35000cca025b5e804                    ONLINE       0     0     0
          mirror-3                                    ONLINE       0     0     0
            scsi-35000cca025a3c5e0                    ONLINE       0     0     0
            scsi-35000cca025a44640                    ONLINE       0     0     0
        logs
          ata-INTEL_SSDSC2BA400G4_BTHV513401CN400NGN  ONLINE       0     0     0

root@pve03:~# pveperf
CPU BOGOMIPS:      166025.92
REGEX/SECOND:      1826852
HD SIZE:           1589.33 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     1816.79
DNS EXT:           21.48 ms
DNS INT:           0.96 ms
[/SPOILER
 
Doing some more digging in github issues related to zfs v0.8 releases it appears there were multiple about drops in performance. Lots of people talking about SIMD. Which from what I can tell was disabled moving from v0.7 to v0.8. And even in the latest release of zfs (0.8.2) which I just upgraded node 3 to, it still isn't enabled again yet.

I can't say how important SIMD is or if it's related to the performance degradation I'm seeing in SYNC writes. Most comments about it's importance focus on RAIDZ configurations with lack of CPU resources. But I also am reading it's used for the fletcher4 algorithm for checksums (perhaps disabling checksums for a benchmark would shed some light).

FYI, node 3 is now showing ~2800 FSYNC/SEC after upgrading 0.8.1 to 0.8.2. So that is nice! Must be some other issues as well.
 
Well just looking at fletcher4 stats between Node 1 (zfs 0.7.13) and Node 3 (zfs 0.8.2), you can see a clear difference...

Code:
NODE1
------
root@pve01:~# cat /proc/spl/kstat/zfs/fletcher_4_bench
0 0 0x01 -1 0 6970133600 45976618831011140
implementation   native         byteswap
scalar           5108371042     3978045354
superscalar      5897130732     4743428027
superscalar4     5256654102     4298771505
sse2             8643578437     4764411755
ssse3            8642019260     7930623666
fastest          sse2           ssse3

NODE3
-------
root@pve03:~# cat /proc/spl/kstat/zfs/fletcher_4_bench
5 0 0x01 -1 0 6460459293 4987442131277
implementation   native         byteswap
scalar           5831526001     4623936024
superscalar      5824996442     4791869076
superscalar4     5827230976     4795584373
fastest          scalar         superscalar4

Nodenativebyteswap
18,643,578,437 (sse2)7,930,623,666 (ssse3)
35,831,526,001 (scalar)4,795,584,373 (superscalar4)

:eek: Yuck
 
Has the proxmox team patched the kernel for SIMD support? I think I'm seeing some comments on the oven kernel about patching this. If so, is it the proposed ZFS teams workaround or just patching the kernel back to the old method before the kernel team removed it?
 
The Proxmox VE 6.1 release includes ZFS 0.8.2 which still does not include the SIMD patch due to this bug. Maybe in 0.8.3 if this is sorted out.

It appears that's not the whole story. Proxmox patched their kernel with the SIMD patch that was implemented by the OpenZFS/ZOL master branch. This is their "workaround" and not reverting the kernel dev teams changes to export symbols.

See here; https://forum.proxmox.com/threads/zfs-simd-patch-in-0-8-1-pve2-causes-fpu-corruption.58627/

I upgraded Node1 to the new 6.1 release.
SIMD is recognized on all my nodes now.

Code:
root@pve01:~# cat /proc/spl/kstat/zfs/fletcher_4_bench
0 0 0x01 -1 0 6911416465 229149682236
implementation   native         byteswap
scalar           5577184337     4199946257
superscalar      6973003321     5067942416
superscalar4     5784984389     4889406470
sse2             9416358535     5196046648
ssse3            9426250487     8644266675
fastest          ssse3          ssse3

But FSYNC performance (per pveperf) is still degraded. Node 1 went from ~4200 to ~2800.

So it would appear that either there are other things causing performance regression or the workaround for SIMD just can't match the efficiency. Not sure and would love to hear from others.
 
I should also mention that I disabled checksums to test performance on Node 1 after upgrading it to pve 6.1. Remember this is a mirrored pool with a SLOG. And I'm just testing FSYNC using pveperf. The results did NOT change. So it doesn't appear the issue at least isn't related to "lac of optimization" of SIMD for checksums. I don't know what else it might impact for FSYNC to a pool that is mirrors with a SLOG. I'm thinking there are other factors at play here causing the performance drop.
 
It appears that's not the whole story. Proxmox patched their kernel with the SIMD patch that was implemented by the OpenZFS/ZOL master branch. This is their "workaround" and not reverting the kernel dev teams changes to export symbols.

Oh, I wasn't aware of that. Sorry.
 
Is there any news on this?

From my understanding the "SIMD Patch" that Proxmox integreated is to disable simd, any clarification on this?

I'm on 6.1 with zfs 0.8.2-pve2 and still far away from what I should see on performance. Huuge IO wait waste.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!