100 times write amplification using ZFS

welcometors

New Member
Nov 2, 2023
8
0
1
Sorry for reposting as last one got "Awaiting approval before being displayed publicly."

I'm running proxmox on 2 NVME 512GB sticks with zfs raid1 pool (ashift=12). I have written a small script that will measure number of writes per hour (in MB) using smartctl command (src: Github).
I have a single VM (OPNsense) that is writing around 70k per minute (showing in VM summary tab) which is less than 5MB / hour.

Now if I have VM off then my script (smartctl) reports 85MB/hour writes (it's lower because I disabled HA and bunch of other stuff) . And after turning on the VM, it jumps to 608MB/hour (around 500MB/hour more). I know that my SSD will still last 10+ years. I'm not worried about the SSD wear, I want to understand what is causing 100x write amplification and how is this even possible (5MB/hour becoming 500). What am I doing wrong?

Screenshot 2023-11-25 at 23.25.39.png


#zpool get all
Code:
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           464G                           -
rpool  capacity                       2%                             -
rpool  altroot                        -                              default
rpool  health                         ONLINE                         -
rpool  guid                           16229665517427xxxxxx           -
rpool  version                        -                              default
rpool  bootfs                         rpool/ROOT/pve-1               local
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      -                              default
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupratio                     1.00x                          -
rpool  free                           453G                           -
rpool  allocated                      10.6G                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  0%                             -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  checkpoint                     -                              -
rpool  load_guid                      16910426424055xxxxxx           -
rpool  autotrim                       off                            default
rpool  compatibility                  off                            default
rpool  feature@async_destroy          enabled                        local
rpool  feature@empty_bpobj            active                         local
rpool  feature@lz4_compress           active                         local
rpool  feature@multi_vdev_crash_dump  enabled                        local
rpool  feature@spacemap_histogram     active                         local
rpool  feature@enabled_txg            active                         local
rpool  feature@hole_birth             active                         local
rpool  feature@extensible_dataset     active                         local
rpool  feature@embedded_data          active                         local
rpool  feature@bookmarks              enabled                        local
rpool  feature@filesystem_limits      enabled                        local
rpool  feature@large_blocks           enabled                        local
rpool  feature@large_dnode            enabled                        local
rpool  feature@sha512                 enabled                        local
rpool  feature@skein                  enabled                        local
rpool  feature@edonr                  enabled                        local
rpool  feature@userobj_accounting     active                         local
rpool  feature@encryption             enabled                        local
rpool  feature@project_quota          active                         local
rpool  feature@device_removal         enabled                        local
rpool  feature@obsolete_counts        enabled                        local
rpool  feature@zpool_checkpoint       enabled                        local
rpool  feature@spacemap_v2            active                         local
rpool  feature@allocation_classes     enabled                        local
rpool  feature@resilver_defer         enabled                        local
rpool  feature@bookmark_v2            enabled                        local
rpool  feature@redaction_bookmarks    enabled                        local
rpool  feature@redacted_datasets      enabled                        local
rpool  feature@bookmark_written       enabled                        local
rpool  feature@log_spacemap           active                         local
rpool  feature@livelist               enabled                        local
rpool  feature@device_rebuild         enabled                        local
rpool  feature@zstd_compress          enabled                        local
rpool  feature@draid                  enabled                        local
 
Enterprise grade SSD? Without the power-loss protection all sync writes, especially small random ones, will cause massive write amplification.
And are you running OPNsense with ZFS or UFS? ZFS on top of ZFS will amplify terribly.
 
Enterprise grade SSD? Without the power-loss protection all sync writes, especially small random ones, will cause massive write amplification.
And are you running OPNsense with ZFS or UFS? ZFS on top of ZFS will amplify terribly.
I have enterprise grade for other PVE but this one uses consumer Crucial P1 500 GB CT500P1SSD8 ssd. Which doesn't have power loss protection.

I'm using UFS for OPNsense as ZFS on ZFS seemed bad idea.
 
>I have a single VM (OPNsense) that is writing around 70k per minute (showing in VM summary tab) which is less than 5MB / hour.

what do you measure inside the VM ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!