Best practice - zfs (ssd, optane, spinning disk)

lacosta

Active Member
Jun 4, 2013
28
3
43
Slovakia
Hello, I currently have:

2x SSD DC S3700 200GB
4x 1TB 10K seagate
1x optane 900P

What is best combination ?

2x SSD DC for promox OS - mirror
4x 1TB as raidz1 - vm, lxc
1x optane 900P for DB vm, lxc (psql etc.)

or

2x SSD DC for promox OS - mirror
4x 1TB as raidz1 for all lxc, vm
1x optane 900P only for zil, l2arc, slog

or

?

LXC is zabbix, mail server, dns .. VM docker mail server + solr
Memory avg usage is 20GB RAM
HW: LSI HBA + 48GB RAM
 
Hi,

generall at virtualizing an L2ARC is not needed because you have only hot data.
The L2ARC use it referencing memory from the ARC space and ~48 GB ARC should be enough.

What you can do is to use the Optane as ZIL and also as fast DB storage.
Normally this is not recommended but the Optane can handle this and NVMe is built for parallel workloads.
 
  • Like
Reactions: lacosta
Hey!

From the 2 options - the first one clearly makes more sense. What i'd personally try is having as many VMs on the PCIe SSD as possible. If i were you, i'd probably sell the Optane drive and go for a bigger and bit older enterprise PCIe drive, that'd be 1.2 - 2.4TB.

As a couple of side notes:
  • Why do you want to have the spinning drives in raidz1? I'd suggest at least raidz2. And since you may have VMs on it, i'd go with raid10, so that the speed would not become a bottleneck.
  • Having ZIL / L2ARC will not really help you since there is not much you can cache when dealing with VMs.
  • Normally Proxmox can fit into 20GB of space. Having 2 DC in a mirror you can clearly fit a couple of additional VMs on them.
  • You can use the space on the spinning drives for backup as well.
Hope it helps.
 
  • Like
Reactions: lacosta
  • Why do you want to have the spinning drives in raidz1? I'd suggest at least raidz2. And since you may have VMs on it, i'd go with raid10, so that the speed would not become a bottleneck.
So instead of e.g. 6x 2 TB in RAIDz2 (= 8 TB space) you suggest 6x 2 TB in RAID10 (= 6 TB space) because performance of the pool for read/write is better then?

  • Having ZIL / L2ARC will not really help you since there is not much you can cache when dealing with VMs.
When write cache is disabled for the VM disks, AFAIK a SLOG device (ZIL) should greatly improve performance.

  • You can use the space on the spinning drives for backup as well.
Probably better to use another pool as backup destination, so that data is read from VM pool and written to backup pool.
 
So instead of e.g. 6x 2 TB in RAIDz2 (= 8 TB space) you suggest 6x 2 TB in RAID10 (= 6 TB space) because performance of the pool for read/write is better then?
Correct. The speed should improve significantly. For Raid6 you can expect x4 read speed and x1 write speed (where x1 is speed of 1 drive). For Raid 10 you can expect x6 read speed and x3 write speed, meaning +50% read speed and +200% write speed compared to Raid6. What you lose is 2TB of space and 1 disk fault tolerance (2 disks can fail in Raid6, while only 1 can fail in Raid10).

When write cache is disabled for the VM disks, AFAIK a SLOG device (ZIL) should greatly improve performance.
To be honest, i doubt it.
The ZFS Intent Log is a logging mechanism where all the of data to be written is stored, then later flushed as a transactional write. It is typically stored on a fast device such as a SSD, for writes smaller than 64kB the ZIL stores the data on the fast device, but for larger sizes the data is not stored in the ZIL, only the pointers to the synced data is stored.

Due to this not all applications can take advantage of of the ZIL. Database applications have the largest benefit of a ZIL (MySQL,PostgreSQL,Oracle) as well as NFS and iSCSI. Typical moving around of data around the file system will not see much of a benefit.
source

As for L2ARC cache, it's mostly useful for big datasets. In most other cases the RAM (and ARC cache) will be used. As always, the correct answer is "it depends", but, based on my personal experience, if you really want to speed VMs, you need to keep them on a dedicated PCIe drive, rather than trying to optimise the array of spinning drives.

Probably better to use another pool as backup destination, so that data is read from VM pool and written to backup pool.
I am sorry if it was poorly stated, but NEVER keep backups on the SAME array, or even the same SERVER, or even in the same LOCATION. What i meant is that you can use a Raid of spinning drives for backup of faster VMs that reside on SSDs, while also being able to use the Raid to store slower VMs, that will be backed up on a NAS or elsewhere.
 
Hi,

To be honest, i doubt it.
source

My understanding was, that all sync writes go to ZIL (SLOG) first. See: https://www.ixsystems.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/ (especially: "Why sync writes matter, especially on a fileserver"). Also VMs can be configured in Proxmox to only do sync writes for higher data safety. Therefore a SLOG should help.

As for L2ARC cache, it's mostly useful for big datasets. In most other cases the RAM (and ARC cache) will be used. As always, the correct answer is "it depends", but, based on my personal experience, if you really want to speed VMs, you need to keep them on a dedicated PCIe drive, rather than trying to optimise the array of spinning drives.

I am sorry if it was poorly stated, but NEVER keep backups on the SAME array, or even the same SERVER, or even in the same LOCATION. What i meant is that you can use a Raid of spinning drives for backup of faster VMs that reside on SSDs, while also being able to use the Raid to store slower VMs, that will be backed up on a NAS or elsewhere.
What I've meant is doing a VM backup on another local pool. This should be quite fast. Aftwards you could move it away to some other server.
 
Correct. The speed should improve significantly. For Raid6 you can expect x4 read speed and x1 write speed (where x1 is speed of 1 drive). For Raid 10 you can expect x6 read speed and x3 write speed, meaning +50% read speed and +200% write speed compared to Raid6. What you lose is 2TB of space and 1 disk fault tolerance (2 disks can fail in Raid6, while only 1 can fail in Raid10).

Here is a link comparing some ZFS RAID options: https://calomel.org/zfs_raid_speed_capacity.html
Interesting numbers (without lz4 compression):
1x 4TB, single drive, 3.7 TB, w=108MB/s , rw=50MB/s , r=204MB/s
6x 4TB, 3 striped mirrors, 11.3 TB, w=389MB/s , rw=60MB/s , r=655MB/s
6x 4TB, raidz2 (raid6), 15.0 TB, w=429MB/s , rw=71MB/s , r=488MB/s
 
@Toxik
I'd be really careful with these benchmarks, since there are plenty of things that can come into play - the CPU power, the RAM size and speed, the drives type (SATA, SAS, PCIe), speed and latency. It makes no sense that write speed is better on raidz2 than on raid10 and it makes no sense that the read speed on raid10 with 6 drives is only x3.5 better than on a single drive. In any case, i can see the possibility on this taking place if the hardware / software is not properly optimised.

For instance, the setup for the tests is reeeeeeally underpowered. There is one E5-2630 CPU (quite weak), only 16GB of RAM (very small ARC cache, i assume), 7200rpm SAS drives connected through a HBA controller (controller can be a bottleneck easily and i assume that 1 drive speeds may have been improved by ARC cache, while less impacted when raid was tested).

What i did on my servers was basically test each and every setup one by one and determine what works best. I suggest you do the same.
 
Last edited:
  • Like
Reactions: Toxik

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!