Hi,
I've got a total write amplification of around 18. Can someone hint me how to improve this?
If I run iostat on the host and sum up the writes of all vdevs, all the VMs combined are writing around 1 MB/s of data.
The vdevs are stored on a raidz1 consisting of 5x Intel S3710 SSDs (sdc to sdg) and all the SSDs combined are writing around 10 MB/s of data.
I use smartctl to monitor the host writes and nand writes of each drive and for every 1 GB of data written to the SSD the SSD is writing around 1.8GB of Data to the NAND.
So in the end the 1MB/s of real data from the guests are multipling up to 18 MB/s written to the flash.
Host Config:
Pool is raidz1 of 5 SSDs without LOG or Cache device. Atime deactivated for the pool, compression is set to LZ4, no deduplication, ashift of 12, sync is standard. On the pool is a encrypted dataset. This dataset contains all of the zvols. All partitions are alinged to 1MB. SSDs are reported as logical size of 4k.
VM config:
Storage controller is Virtio SCSI single. All virtual disks got discard, io thread and ssd emulation enabled. Cache mode is set to "no cache". Format is Raw and block size should be the 8k default.
Guest config:
Virtio guest is installed. Virtual drives are formated as ext4 and mounted with "noatime" and "nodiratime" options. fstrim is run once a week via cron. /tmp is mounted via ramfs.
Is there anything I made wrong?
18MB/s is 568 TB per year which is really a lot of data because the VMs are just idleing plain debians without heavy use. Only 3 VMs got real applications running (Zabbix, Graylog, Emby). I chose 5x S3710 SSDs because they got a combined TBW of 18.000TB and should last some time but saving some writes would be nice non the less.
I only found 1 optimization doing a big difference and that was changing the cache mode from "no cache" to "unsafe" so the VMs can't so sync writes which will cut the write amplification in half. But that isn't really a good thing if something crashes and some DBs gets corrupt or something like that.
I've got a total write amplification of around 18. Can someone hint me how to improve this?
If I run iostat on the host and sum up the writes of all vdevs, all the VMs combined are writing around 1 MB/s of data.
The vdevs are stored on a raidz1 consisting of 5x Intel S3710 SSDs (sdc to sdg) and all the SSDs combined are writing around 10 MB/s of data.
I use smartctl to monitor the host writes and nand writes of each drive and for every 1 GB of data written to the SSD the SSD is writing around 1.8GB of Data to the NAND.
So in the end the 1MB/s of real data from the guests are multipling up to 18 MB/s written to the flash.
Code:
root@Hypervisor:/var/log# iostat 600 2
Linux 5.4.60-1-pve (Hypervisor) 09/06/2020 _x86_64_ (16 CPU)
...
avg-cpu: %user %nice %system %iowait %steal %idle
4.66 0.00 5.90 0.02 0.00 89.42
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
nvme1n1 0.00 0.00 0.00 0 0
nvme0n1 0.00 0.00 0.00 0 0
sdg 129.56 0.81 1918.89 484 1151332
sdb 4.94 0.00 25.70 0 15417
sdh 0.00 0.00 0.00 0 0
sdf 128.64 0.81 1918.76 488 1151256
sda 4.95 0.05 25.70 32 15417
sdd 129.78 0.83 1917.61 500 1150564
sde 129.89 0.81 1917.23 488 1150340
sdc 130.13 0.87 1916.58 520 1149948
md0 0.00 0.00 0.00 0 0
md1 4.06 0.05 25.13 32 15080
dm-0 4.06 0.05 25.13 32 15080
dm-1 4.06 0.05 29.87 32 17920
dm-2 0.00 0.00 0.00 0 0
zd0 0.69 0.00 8.03 0 4820
zd16 0.58 0.00 6.45 0 3868
zd32 13.13 0.89 278.59 536 167156
zd48 0.62 0.00 6.90 0 4140
zd64 0.58 0.00 6.53 0 3920
zd80 0.00 0.00 0.00 0 0
zd96 0.00 0.00 0.00 0 0
zd112 0.10 0.01 0.53 8 320
zd128 0.00 0.00 0.00 0 0
zd144 0.00 0.00 0.00 0 0
zd160 0.00 0.00 0.00 0 0
zd176 0.00 0.00 0.00 0 0
zd192 0.00 0.00 0.00 0 0
zd208 0.00 0.00 0.00 0 0
zd224 0.00 0.00 0.00 0 0
zd240 0.00 0.00 0.00 0 0
zd256 0.00 0.00 0.00 0 0
zd272 0.00 0.00 0.00 0 0
zd288 0.00 0.00 0.00 0 0
zd304 0.00 0.00 0.00 0 0
zd320 0.00 0.00 0.00 0 0
zd336 0.00 0.00 0.00 0 0
zd352 0.00 0.00 0.00 0 0
zd368 0.00 0.09 0.00 56 0
zd384 0.00 0.00 0.00 0 0
zd400 51.87 0.16 717.30 96 430380
zd416 0.58 0.00 6.32 0 3792
zd432 0.58 0.00 6.39 0 3832
zd448 0.67 0.00 8.11 0 4868
zd464 0.60 0.00 6.36 0 3816
Host Config:
Pool is raidz1 of 5 SSDs without LOG or Cache device. Atime deactivated for the pool, compression is set to LZ4, no deduplication, ashift of 12, sync is standard. On the pool is a encrypted dataset. This dataset contains all of the zvols. All partitions are alinged to 1MB. SSDs are reported as logical size of 4k.
VM config:
Storage controller is Virtio SCSI single. All virtual disks got discard, io thread and ssd emulation enabled. Cache mode is set to "no cache". Format is Raw and block size should be the 8k default.
Guest config:
Virtio guest is installed. Virtual drives are formated as ext4 and mounted with "noatime" and "nodiratime" options. fstrim is run once a week via cron. /tmp is mounted via ramfs.
Is there anything I made wrong?
18MB/s is 568 TB per year which is really a lot of data because the VMs are just idleing plain debians without heavy use. Only 3 VMs got real applications running (Zabbix, Graylog, Emby). I chose 5x S3710 SSDs because they got a combined TBW of 18.000TB and should last some time but saving some writes would be nice non the less.
I only found 1 optimization doing a big difference and that was changing the cache mode from "no cache" to "unsafe" so the VMs can't so sync writes which will cut the write amplification in half. But that isn't really a good thing if something crashes and some DBs gets corrupt or something like that.
Last edited: