ZFS as root performance tweaking - basic configuration

KevinH

Member
Aug 31, 2017
8
0
6
39
After four months of using Proxmox as a homelab (simple VMs like a webserver, mailserver, hosting some dedicated game servers for friends) I finally have some time to delve into optimization and performance for ZFS. My main issue at this point is very heavy wear on the disk and not being able to figure out exactly where it's coming from or what to do about it. I'm beginning to realise, I think, this is just the nature of ZFS and consumer grade SSD's like mine are just not intended for running Proxmox (or any system) on ZFS as a root file system.

Added are the disk parameters and smartctl output.

This is the current ashift:
Code:
zpool get ashift rpool
NAME   PROPERTY  VALUE   SOURCE
rpool  ashift    12      local

What I get from reading online is that 13 might be a better fit if the disk has 8k sectors, but 12 should be adequate for my specific disk. Is this correct?


Code:
smartctl -A /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.44-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       4119
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       156
177 Wear_Leveling_Count     0x0013   098   098   000    Pre-fail  Always       -       23
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   069   058   000    Old_age   Always       -       31
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   099   099   000    Old_age   Always       -       272
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       23
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       6793613168

Using a script I found here: https://askubuntu.com/questions/865792/how-can-i-monitor-the-tbw-on-my-samsung-ssd

Calculating the total amount of writes would accumulate to 3.163 TB, could this be correct??
The mean write rate would be: MB/hr: 805.404

The drive guarantees 4.800 TBW so I'm getting there. Since this is just a test setup I'm not really worried. But... I wish I checked the day after and not four months ;p

This entry suggests settings a small record size (16K), setting logbias to throughput and enabling lz4 compression would help, which I did. (no SLOG and no large files).
https://serverfault.com/questions/9...on-nvme-ssd-in-raid1-to-avoid-rapid-disk-wear

I've disabled the pve-ha-lrm, pve-ha-crm and corosync services.
I've tested with all VMs disabled, still it delivers a whopping 122 GB of data in half an hour, doing nothing.

atop, iotop and zpool iostat don't show abnormal values, just he opposite.

It doesn't help that my main language is Dutch instead of English, so when reading about ZFS and trying to grasp the way it works, there seems to be a lot to take in and I'm not really sure what's what.

Am I doing something wrong or is this just normal behaviour? Is there some procedure, method / list of steps I can consult in setting up Proxmox on a root ZFS with a 500GB Samsung that doesn't wear it out in half a year or am I better of buying a spinning disk with this setup? Pls help :eek:

I've disabled the box for the time being.

Code:
hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number:       Samsung SSD 860 EVO 500GB
        Serial Number:      S4XBNF1M942919H
        Firmware Revision:  RVT03B6Q
        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
        Used: unknown (minor revision code 0x005e)
        Supported: 11 8 7 6 5
        Likely used: 11
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:    16514064
        LBA    user addressable sectors:   268435455
        LBA48  user addressable sectors:   976773168
        Logical  Sector size:                   512 bytes
        Physical Sector size:                   512 bytes
        Logical Sector-0 offset:                  0 bytes
        device size with M = 1024*1024:      476940 MBytes
        device size with M = 1000*1000:      500107 MBytes (500 GB)
        cache/buffer size  = unknown
        Form Factor: 2.5 inch
        Nominal Media Rotation Rate: Solid State Device
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum
        R/W multiple sector transfer: Max = 1   Current = 1
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                SET_MAX security extension
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
           *    64-bit World wide name
                Write-Read-Verify feature set
           *    WRITE_UNCORRECTABLE_EXT command
           *    {READ,WRITE}_DMA_EXT_GPL commands
           *    Segmented DOWNLOAD_MICROCODE
           *    Gen1 signaling speed (1.5Gb/s)
           *    Gen2 signaling speed (3.0Gb/s)
           *    Gen3 signaling speed (6.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Phy event counters
           *    READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
           *    DMA Setup Auto-Activate optimization
                Device-initiated interface power management
           *    Asynchronous notification (eg. media change)
           *    Software settings preservation
                Device Sleep (DEVSLP)
           *    SMART Command Transport (SCT) feature set
           *    SCT Write Same (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
           *    reserved 69[4]
           *    DOWNLOAD MICROCODE DMA command
           *    SET MAX SETPASSWORD/UNLOCK DMA commands
           *    WRITE BUFFER DMA command
           *    READ BUFFER DMA command
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
 
Last edited:
If no one has any direct reply, could you instead refer me to some material I can study or maybe another website/forum where I can ask any ZFS related questions?
 
I think, this is just the nature of ZFS and consumer grade SSD's like mine are just not intended for running Proxmox (or any system) on ZFS as a root file system.

Yes. non-enterprise SSDs are not built for this. ZFS makes the problem worse, but the problem is the SSD. We discussed this often in the forums, last time on the german one.
 
There's a thing called write amplification on zfs, using trim (implemented in zfs 0.8) will help significantly.

I've tries using samsung ssd's for my home lab as well but performance isn't great. Like LnxBil said, consumer ssd's aren't built for zfs.
My Intel s3500 480gb ssd's dropped from 30% to 28-29% in 2 years, compared to 20% in 1 year for my 1050gb Crucial MX300.

I'm a fellow Dutchie, if you want to do some more discussing in dutch just hit me up.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!