Poor performance with ZFS

Silvos

New Member
Mar 25, 2015
6
0
1
Hello,
I have an HP 380 gen9 with a a B140i Raid. I should have looked at the wiki before purchase as that controller is not supported. :(

I have turned off the UEFI boot and turned on Legacy Boot and installed Proxmox using SATA.

During the install I set up ZFS Raid1, everything installs. I can make VM's but I'm getting poor I/O
# pveperf
CPU BOGOMIPS: 45544.92
REGEX/SECOND: 1478731
HD SIZE: 3420.14 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 80.48
DNS EXT: 65.54 ms
DNS INT: 0.88 ms

Any ideas on why FSYNCS is so slow? I have 7200 RPM Sata Drives. I feel that even with the controller disabled FSYNCS should be higher. Any suggestions, on getting the current config working faster, aside from getting a new controller?
 
Fastest way to speed up is to set sync=disabled in ZFS. Other way is to make ZFS write to another hdd (better sdd) for ZIL to avoid double write.
 
this is the expected performance. if you want to use zfs for VM storage, add a SSD for cache (ZIL and L2ARC) and you will get MUCH better experience:

http://pve.proxmox.com/wiki/ZFS#Add_Cache_and_Log_to_existing_pool

For best SSD reliablity, I can recommend one of these:

Intel SSD DC S3710 Series 200GB, 2.5", SATA 6Gb/s (SSDSC2BA200G401)
Intel SSD DC S3700 Series 200GB, 2.5", SATA 6Gb/s (SSDSC2BA200G3)


Here are pveperf results from one of our testlab hosts (raidz2 with 6 x 4 TB Seagate SATA and one Intel SSD DC S3710 Series 200GB for cache)

Code:
root@proxmox:~# zpool status
  pool: space
 state: ONLINE
  scan: scrub repaired 0 in 2h31m with 0 errors on Sun Mar 22 09:18:29 2015
config:

        NAME                                                 STATE     READ WRITE CKSUM
        space                                                ONLINE       0     0     0
          raidz2-0                                           ONLINE       0     0     0
            scsi-SATA_ST4000NM0033-9Z_Z1Z3M15Q               ONLINE       0     0     0
            scsi-SATA_ST4000NM0033-9Z_Z1Z3N1H3               ONLINE       0     0     0
            scsi-SATA_ST4000NM0033-9Z_Z1Z1S6VN               ONLINE       0     0     0
            scsi-SATA_ST4000NM0033-9Z_Z1Z1RBLP               ONLINE       0     0     0
            scsi-SATA_ST4000NM0033-9Z_Z1Z1PEP7               ONLINE       0     0     0
            scsi-SATA_ST4000NM0033-9Z_Z1Z3LT1C               ONLINE       0     0     0
        logs
          scsi-SATA_INTEL_SSDSC2BA2BTTV439401WX200GGN-part1  ONLINE       0     0     0
        cache
          scsi-SATA_INTEL_SSDSC2BA2BTTV439401WX200GGN-part2  ONLINE       0     0     0


errors: No known data errors

root@proxmox:~# pveperf /space/
CPU BOGOMIPS:      57595.44
REGEX/SECOND:      1804289
HD SIZE:           14601.65 GB (space)
FSYNCS/SECOND:     2999.24
DNS EXT:           35.32 ms
DNS INT:           0.52 ms (proxmox.com)
 
What are the setting for sync, compression, and dedub in the above host?
What is the amount of RAM?

My Hardware details form the testlab (not the hardware from the thread starter):
One Intel® Xeon® E5-2620v3 2,4GHz, Mainboard Supermirco X10SRi-F - 64 GB RAM - all disks are connected directly on the mainboard SATA controllers.

here are all ZFS settings.

Code:
root@proxmox:~# zfs get all space
NAME   PROPERTY              VALUE                  SOURCE
space  type                  filesystem             -
space  creation              Sun Mar  1 13:42 2015  -
space  used                  2.79T                  -
space  available             11.5T                  -
space  referenced            2.78T                  -
space  compressratio         1.11x                  -
space  mounted               yes                    -
space  quota                 none                   default
space  reservation           none                   default
space  recordsize            128K                   default
space  mountpoint            /space                 default
space  sharenfs              off                    default
space  checksum              on                     default
space  compression           lz4                    local
space  atime                 on                     default
space  devices               on                     default
space  exec                  on                     default
space  setuid                on                     default
space  readonly              off                    default
space  zoned                 off                    default
space  snapdir               hidden                 default
space  aclinherit            restricted             default
space  canmount              on                     default
space  xattr                 on                     default
space  copies                1                      default
space  version               5                      -
space  utf8only              off                    -
space  normalization         none                   -
space  casesensitivity       sensitive              -
space  vscan                 off                    default
space  nbmand                off                    default
space  sharesmb              off                    local
space  refquota              none                   default
space  refreservation        none                   default
space  primarycache          all                    default
space  secondarycache        all                    default
space  usedbysnapshots       0                      -
space  usedbydataset         2.78T                  -
space  usedbychildren        83.2M                  -
space  usedbyrefreservation  0                      -
space  logbias               latency                default
space  dedup                 off                    default
space  mlslabel              none                   default
space  sync                  standard               default
space  refcompressratio      1.11x                  -
space  written               2.78T                  -
space  logicalused           3.07T                  -
space  logicalreferenced     3.07T                  -
space  snapdev               hidden                 default
space  acltype               off                    default
space  context               none                   default
space  fscontext             none                   default
space  defcontext            none                   default
space  rootcontext           none                   default
space  relatime              on                     local
 
@Tom I added an ssd for the zil as you suggested but everything is still slow.


Here is pveperf before adding the SSD
Code:
root@pmox01:~# pveperf
CPU BOGOMIPS:      45544.74
REGEX/SECOND:      1501308
HD SIZE:           3420.14 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     87.89
DNS EXT:           55.68 ms
DNS INT:           0.71 ms (my-domain.com)


Adding the SLOG drive to the pool...
Code:
root@pmox01:/# zpool add -f rpool log scsi-SATA_SAMSUNG_MZ7TD25S17LNSADC16305

Checking the pool's status, I can see the drive and it looks good.
Code:
root@pmox01:/# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:


    NAME                                       STATE     READ WRITE CKSUM
    rpool                                      ONLINE       0     0     0
      mirror-0                                 ONLINE       0     0     0
        sda3                                   ONLINE       0     0     0
        sdb3                                   ONLINE       0     0     0
    logs
      scsi-SATA_SAMSUNG_MZ7TD25S17LNSADC16305  ONLINE       0     0     0


errors: No known data errors

Running pveperf and its pretty abysmal :(
Code:
root@pmox01:/# pveperf
CPU BOGOMIPS:      45544.74
REGEX/SECOND:      1481754
HD SIZE:           3420.14 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     114.16
DNS EXT:           58.37 ms
DNS INT:           0.71 ms (my-domain.com)

SATA info I'm getting link up on 6.0 Gbps so that looks good.
Code:
root@pmox01:/# dmesg | grep SATA
ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0xb impl SATA mode
ata1: SATA max UDMA/133 abar m2048@0x92c00000 port 0x92c00100 irq 103
ata2: SATA max UDMA/133 irq_stat 0x00400040, connection status changed irq 103
ata4: SATA max UDMA/133 irq_stat 0x00400040, connection status changed irq 103
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Info on the controller.
Code:
root@pmox01:/# lspci -nn | grep SATA
00:1f.2 SATA controller [0106]: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] [8086:8d02] (rev 05)

rpool settings.
Code:
root@pmox01:/# zfs get all rpool
NAME   PROPERTY              VALUE                  SOURCE
rpool  type                  filesystem             -
rpool  creation              Sat Mar 28 18:23 2015  -
rpool  used                  234G                   -
rpool  available             3.34T                  -
rpool  referenced            144K                   -
rpool  compressratio         1.75x                  -
rpool  mounted               yes                    -
rpool  quota                 none                   default
rpool  reservation           none                   default
rpool  recordsize            128K                   default
rpool  mountpoint            /rpool                 default
rpool  sharenfs              off                    default
rpool  checksum              on                     default
rpool  compression           on                     local
rpool  atime                 off                    local
rpool  devices               on                     default
rpool  exec                  on                     default
rpool  setuid                on                     default
rpool  readonly              off                    default
rpool  zoned                 off                    default
rpool  snapdir               hidden                 default
rpool  aclinherit            restricted             default
rpool  canmount              on                     default
rpool  xattr                 on                     default
rpool  copies                1                      default
rpool  version               5                      -
rpool  utf8only              off                    -
rpool  normalization         none                   -
rpool  casesensitivity       sensitive              -
rpool  vscan                 off                    default
rpool  nbmand                off                    default
rpool  sharesmb              off                    default
rpool  refquota              none                   default
rpool  refreservation        none                   default
rpool  primarycache          all                    default
rpool  secondarycache        all                    default
rpool  usedbysnapshots       0                      -
rpool  usedbydataset         144K                   -
rpool  usedbychildren        234G                   -
rpool  usedbyrefreservation  0                      -
rpool  logbias               latency                default
rpool  dedup                 off                    default
rpool  mlslabel              none                   default
rpool  sync                  standard               local
rpool  refcompressratio      1.00x                  -
rpool  written               144K                   -
rpool  logicalused           818M                   -
rpool  logicalreferenced     15.5K                  -
rpool  snapdev               hidden                 default
rpool  acltype               off                    default
rpool  context               none                   default
rpool  fscontext             none                   default
rpool  defcontext            none                   default
rpool  rootcontext           none                   default
rpool  relatime              off                    default
root@pmox01:/#


edit...

I partitioned the ssd and added l2arc
Code:
    NAME                                             STATE     READ WRITE CKSUM
    rpool                                            ONLINE       0     0     0
      mirror-0                                       ONLINE       0     0     0
        sda3                                         ONLINE       0     0     0
        sdb3                                         ONLINE       0     0     0
    logs
      scsi-SATA_SAMSUNG_MZ7TD25S17LNSADC16305-part1  ONLINE       0     0     0
    cache
      scsi-SATA_SAMSUNG_MZ7TD25S17LNSADC16305-part2  ONLINE       0     0     0

It's still slow
Code:
CPU BOGOMIPS:      45544.74
REGEX/SECOND:      1524869
HD SIZE:           3420.14 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     116.29
DNS EXT:           59.11 ms
DNS INT:           0.72 ms (my-domain.com)

I scp'd the win 10 iso
Code:
Windows10_TechnicalPreview_x64_EN-US_9926.iso      100% 4019MB  23.6MB/s   02:50
^that seems slow to me

If I disable zil

Code:
root@pmox01:/# zfs set sync=disabled rpool
root@pmox01:/# pveperf
CPU BOGOMIPS:      45544.74
REGEX/SECOND:      1497608
HD SIZE:           3420.14 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     9533.95
DNS EXT:           61.35 ms
DNS INT:           0.72 ms (my-domain.com)

with the zil disabled and the slog / l2arc removed from rpool this is my scp copy time
Code:
Windows10_TechnicalPreview_x64_EN-US_9926.iso  100% 4019MB  52.2MB/s   01:17

What am I doing wrong? I have a base subscription is there anything extra in the stable repo that would help me?
 
Last edited:
ZIL - ZFS log system for sync writes. Works at maximum possible speed. External ZIL avoid double write to the same HDD.
ARC - primary ZFS cache for data and meta-data.
L2ARC - secondary external cache. Need warm up time (depend on load can be few days) before it gives some performance. Eat some ARC.

Need speed for write ? Use pool sync=disable
Need speed and safety for write ? Use external ZIL with low latency SSD

Need speed for read ? Make big ARC

Need speed for read and write ? ARC and ZIL

Performance monitoring tool : https://github.com/zfsonlinux/zfs/tree/master/cmd/arcstat

My example:

ARC size = 10G (meta limit 5G)

Meta : `arcstat -f time,mread,mhit,mh%,mmis,mm% 1`

timemreadmhitmh%mmismm%
08:37:143310000
08:37:15272710000
08:37:1627527510000
08:37:1725525510000
08:37:1823323310000
08:37:19101010000
08:37:20181810000
08:37:21333310000


ARC : `arcstat -f time,c,arcsz,read,hits,hit%,miss,miss% 1`

timecarcszreadhitshit%missmiss%
08:45:0310G10G257209766624
08:45:0410G10G1.1K7346932830
08:45:0510G10G1.1K8377527324
08:45:0610G10G3912516414035
08:45:0710G10G5884357315326
08:45:0810G10G654387597043
08:45:0910G10G16016010000
08:45:1010G10G4622745918840
08:45:1110G10G2901515213947

arc_summary.py

Code:
ZFS Subsystem Report                Sun Mar 29 08:51:34 2015
ARC Summary: (HEALTHY)
    Memory Throttle Count:            0

ARC Misc:
    Deleted:               2.38m
    Recycle Misses:      154.47k
    Mutex Misses:        283
    Evict Skips:          283

ARC Size:                          100.00%    10.00    GiB
    Target Size: (Adaptive)       100.00%    10.00    GiB
    Min Size (Hard Limit):        0.04%    4.00    MiB
    Max Size (High Water):        2560:1    10.00    GiB

ARC Size Breakdown:
    Recently Used Cache Size:    0.01%    1.24    MiB
    Frequently Used Cache Size:    99.99%    10.00    GiB

ARC Hash Breakdown:
    Elements Max:                1.51m
    Elements Current:           99.38%    1.50m
    Collisions:                   26.83m
    Chain Max:                   11
    Chains:                       437.55k

ARC Total accesses:                         150.23m
    Cache Hit Ratio:          84.35%    126.72m
    Cache Miss Ratio:         15.65%     23.51m
    Actual Hit Ratio:         81.07%     121.79m

    Data Demand Efficiency:        71.85%    82.09m
    Data Prefetch Efficiency:       35.40%    292.23k

    CACHE HITS BY CACHE LIST:
      Most Recently Used:             13.55%    17.17m
      Most Frequently Used:           82.56%   104.62m
      Most Recently Used Ghost:      1.39%      1.76m
      Most Frequently Used Ghost:    16.48%    20.88m

    CACHE HITS BY DATA TYPE:
      Demand Data:             46.55%    58.99m
      Prefetch Data:            0.08%     103.44k
      Demand Metadata:        49.56%    62.80m
      Prefetch Metadata:       3.81%      4.83m

    CACHE MISSES BY DATA TYPE:
      Demand Data:             98.28%    23.11m
      Prefetch Data:           0.80%     188.79k
      Demand Metadata:         0.58%     136.19k
      Prefetch Metadata:       0.34%    79.67k


File-Level Prefetch: (HEALTHY)
DMU Efficiency:                        386.99m
    Hit Ratio:             87.54%    338.77m
    Miss Ratio:            12.46%    48.22m

    Colinear:                48.22m
      Hit Ratio:             0.10%    49.76k
      Miss Ratio:            99.90%    48.17m

    Stride:                    332.70m
      Hit Ratio:             99.98%    332.65m
      Miss Ratio:            0.02%    53.77k

DMU Misc: 
    Reclaim:                48.17m
      Successes:            5.47%    2.63m
      Failures:             94.53%    45.54m

    Streams:               6.13m
      +Resets:             0.01%    602
      -Resets:             99.99%    6.13m
      Bogus:               0
 
  • Like
Reactions: Maxime Laplante
As long as you are ok with data loss during a crash, sync=disabled is ok.

The zil will only be used when the process is doing a synchronous writes. If the application is doing asynchronous, it will not use the zil. Don't expect a zil to speed up a scp process or anything like that.

I honestly wouldn't touch zfs with a 10 foot stick with sync=disabled if you care about your data.

You can also verify if the zil getting used by watching the following.

zpool iostat -v poolname 1 20

That should give you a 1 second output for 20 seconds. If the zil is getting used, you will see data getting written, if not, then you know its not getting used.

I have been down this road quite a bit with ZFS!
 
  • Like
Reactions: chrone
As long as you are ok with data loss during a crash, sync=disabled is ok.

In any crash you can loose data. ZFS is build to not corrupt any data. As of lost you can loose last 5-10 seconds of data.

I use ZFS with PROXMOX for a long time and I had few crashed because of Linux kernel fatal errors and ZIL would not help in this case.
As of power lost make sure you have backup line (UPS or other stuff).

You can also verify if the zil getting used by watching the following.

zpool iostat -v poolname 1 20

That should give you a 1 second output for 20 seconds. If the zil is getting used, you will see data getting written, if not, then you know its not getting used.

I have been down this road quite a bit with ZFS!

Sorry but I can not find ZIL usage here.



summary : ZIL can help you to speed up synchronous writes and save last few seconds of data (for big data few seconds means nothing except for small files in 5 seconds period)
 
In any crash you can loose data. ZFS is build to not corrupt any data. As of lost you can loose last 5-10 seconds of data.

I use ZFS with PROXMOX for a long time and I had few crashed because of Linux kernel fatal errors and ZIL would not help in this case.
As of power lost make sure you have backup line (UPS or other stuff).



Sorry but I can not find ZIL usage here.



summary : ZIL can help you to speed up synchronous writes and save last few seconds of data (for big data few seconds means nothing except for small files in 5 seconds period)

Yep but moving to sync=disabled increases that window and really is not recommended at all. If suggestions like that are made, its worth mentioning the side effects.

It really comes down to the application in use.
 
Last edited:
On a bare-metal untuned zfs on raid 1 - 2 x ssd pm853T
I get this
FSYNCS/SECOND: 320.12

On a bare-metal
I get this untuned zfs on raid 1 - 2 x ssd micron m500
FSYNCS/SECOND: 84.27

On a 3.35 install on mdadm raid 1 - ext3 - consumer ssd (mushkin deluxe x 2)
FSYNCS/SECOND: 2845.59


These are all SSDs, no HDDs and they range from 32GB E3 - 64GB E5 supermicros
 
  • Like
Reactions: chrone
On a bare-metal untuned zfs on raid 1 - 2 x ssd pm853T
I get this
FSYNCS/SECOND: 320.12

On a bare-metal
I get this untuned zfs on raid 1 - 2 x ssd micron m500
FSYNCS/SECOND: 84.27

On a 3.35 install on mdadm raid 1 - ext3 - consumer ssd (mushkin deluxe x 2)
FSYNCS/SECOND: 2845.59


These are all SSDs, no HDDs and they range from 32GB E3 - 64GB E5 supermicros

EXT3 do not do any data validation or checksumming. Standard ZFS with checksum, sync=standard and no external ZIL - do double write for synchronize writes.
 
sync=disable gave me 28000 sync/s

EXT3 do not do any data validation or checksumming. Standard ZFS with checksum, sync=standard and no external ZIL - do double write for synchronize writes.
I understand but the perf is like 10% of ext3
 
ZFS is a huge file system and it do a lot of thinks. Thats way ZFS have 2 types of cache, sync writes log and others stuff.
 
@sxlderek: open two terminal windows.
In one of them type zpool iostat -v 1
In the other one start the pveperf and check if there is any slog device activity (the one you added).

When done, stop the first command and type iostat -kx 1. Start "pveperf" again in second.

Please paste the output of the first terminal during pveperf runs here.
 
My ZFS is also performing really slow. The most interesting is I have almost the same configuration as the proxmox testlab has:

My Hardware details form the testlab (not the hardware from the thread starter):
One Intel® Xeon® E5-2620v3 2,4GHz, Mainboard Supermirco X10SRi-F - 64 GB RAM - all disks are connected directly on the mainboard SATA controllers.

Same Intel drive:
Intel SSD DC S3700 Series 200GB, 2.5", SATA 6Gb/s (SSDSC2BA200G3)

Only the drives are 3Tb Seagates and have 64Mb Cache instead of 128Mb Cache and zfs is the root filesystem:
Code:
[FONT=courier new]pool: rpool
 state: ONLINE
  scan: scrub repaired 0 in 1h43m with 0 errors on Sat May 23 03:43:02 2015
config:

        NAME                                                 STATE     READ WRITE CKSUM
        rpool                                                ONLINE       0     0     0
          raidz2-0                                           ONLINE       0     0     0
            ata-ST3000DM001-1ER166_W500V5P2-part3            ONLINE       0     0     0
            ata-ST3000DM001-1ER166_W500V5P4-part3            ONLINE       0     0     0
            ata-ST3000DM001-1ER166_W500TF38-part3            ONLINE       0     0     0
            ata-ST3000DM001-1ER166_W500TF0G-part3            ONLINE       0     0     0
            ata-ST3000DM001-1ER166_W500THXN-part3            ONLINE       0     0     0
            ata-ST3000DM001-1ER166_W500TEVZ-part3            ONLINE       0     0     0
        logs
          ata-INTEL_SSDSC2BA200G4_BTHV50530396200MGN-part1   ONLINE       0     0     0
        cache
          scsi-SATA_INTEL_SSDSC2BA2BTHV50530396200MGN-part2  ONLINE       0     0     0

errors: No known data errors
[/FONT]

But my performance is way slower:
Code:
[FONT=courier new]CPU BOGOMIPS:      57600.72
REGEX/SECOND:      1823340
HD SIZE:           10876.31 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     128.89
DNS EXT:           49.44 ms
DNS INT:           1.38 ms  [/FONT]

Looks like zfs as root make it slow?? Or is vzdump every night killing/poluting the ZFS ARC?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!