Question about SLOGs and VM pools in ZFS.

terry5008

New Member
Sep 6, 2023
8
0
1
USA
If I have a spinning rust ZFS pool of RAID10 Ironwolves that I'm running VMs off of, will an SLOG help performance any? I know the L2ARC does, but I'm not sure about the writes.
 
Usually SLOG and L2ARC won'T help much except you got a special use case where you know you really need it. For L2ARC for example when running a 100GB DB but you mainboard is capped at 64GB RAM. Rule of thumb is "Buy more RAM. When RAM is maxed out, consider L2ARC."
And SLOG will only help with sync writes and will do nothing for async writes. And usually nearly all of your writes should be async writes unless you primarily run some DBs.

What helps best are special devices to store the metadata, so all those small random reads+writes won't hit the slow HDDs.

But no matter what you do, your HDDs will never come close to SSD performance.
 
What helps best are special devices to store the metadata, so all those small random reads+writes won't hit the slow HDDs.
I thought creating a SLOG and L2ARC on SSD was the special device for metadata. What are you referring to when you say "special device"?
 
I thought creating a SLOG and L2ARC on SSD was the special device for metadata.
No. SLOG is sync write cache. L2ARC is an additional layer of way slower but bigger read cache which also reduces your RAM so your fast ARC will shrink. You basically trade few super fast RAM read cache for more but slow SSD read cache. Depending on the workload this could even make your pool slower.
What are you referring to when you say "special device"?
https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954
 
Last edited:
  • Like
Reactions: leesteken
Most of that explanation might as well have been written in ancient Greek.

So, the special class performance optimizing vdevs that can be added to a pool are as follows:
L2ARC = Basically a read cache on SSD(not terribly helpful because RAM is better)
SLOG = Which relocates the ZIL to SSD(only good for DB traffic)
Special = Metadata(and maybe small files) on SSD.

And it is your contention that the only one of these that will make VMs on spinning rust perform better is "special".
Correct?
 
Most of that explanation might as well have been written in ancient Greek.

So, the special class performance optimizing vdevs that can be added to a pool are as follows:
L2ARC = Basically a read cache on SSD(not terribly helpful because RAM is better)
SLOG = Which relocates the ZIL to SSD(only good for DB traffic)
Special = Metadata(and maybe small files) on SSD.

And it is your contention that the only one of these that will make VMs on spinning rust perform better is "special".
Correct?
Very gereralized, yes. HDDs suck at all random reads and sync writes. Many of those come from metadata so its a good thing to not have them on the spinning rust.
 
Last edited:
The only problem with the example on the level1techs forum is that it provisions the entire SSD.

The example I found(Victor Bart on Youtube) for over-provisioning SSDs for SLOG and L2ARC had me first create the partitions I wanted with the CFDISK utility and then do an ADD for the device IDs of those partitions(partition type was "Solaris /usr & Apple ZFS"). Will it be the same for a "special" vdev?
 
Yes, useing partitions as special vdevs will work. just keep in mind that special vdevs are not cache. Lose them and all data on those HDDs will be lost. So you usually want the same redundancy/reliability as your normal vdevs (your HDDs) and I wouldn't share a SSD with SLOG and special vdevs as the SLOG really shreds SSDs.
 
Last edited:
  • Like
Reactions: Whatever
I can recommend to use a small (16 GB) Intel Optane for SLOG which has the most IOPS and WILL significantly increase all sync write in your environment. The point where it shines the most ist running any system update with e.g. apt, which uses a lot of sync write and sync calls and you will "see and feel" the speed, especially with harddisks. Yet besides those points, you will not see sooo much of them. Monitor your SLOG usage (e.g. with dstat) you will see how much or more precisely how little you write to your SLOG in general. For general harddisk speed, it can be fast. I use 22 disks over 3 SAS HBA controllers (6 channels) in a stripped mirror setup with two enterprise SSDs as metadata (special device) and one optane (I don't care about my last 5s of data in case of a loss). Bulk load maxes out at 2,5 GB/s, which is pretty fast. Random I/O is of course slower, but also good enough for most things.

I also tried L2ARC and it is not faster than special device in my book, not even close.
 
I can recommend to use a small (16 GB) Intel Optane for SLOG which has the most IOPS and WILL significantly increase all sync write in your environment. The point where it shines the most ist running any system update with e.g. apt, which uses a lot of sync write and sync calls and you will "see and feel" the speed, especially with harddisks. Yet besides those points, you will not see sooo much of them. Monitor your SLOG usage (e.g. with dstat) you will see how much or more precisely how little you write to your SLOG in general. For general harddisk speed, it can be fast. I use 22 disks over 3 SAS HBA controllers (6 channels) in a stripped mirror setup with two enterprise SSDs as metadata (special device) and one optane (I don't care about my last 5s of data in case of a loss). Bulk load maxes out at 2,5 GB/s, which is pretty fast. Random I/O is of course slower, but also good enough for most things.

I also tried L2ARC and it is not faster than special device in my book, not even close.
When experimenting with different devices for use with SLOG and metadata, do these devices need to be taken offline before being removed from the pool? Or does the "zpool remove" command do this automatically?
 
Yes, useing partitions as special vdevs will work. just keep in mind that special vdevs are not cache. Lose them and all data on those HDDs will be lost. So you usually want the same redundancy/reliability as your normal vdevs (your HDDs) and I wouldn't share a SSD with SLOG and special vdevs as the SLOG really shreds SSDs.
Is SSD shredding an issue if you only have about 3% of the SSD partitioned(massive overprovisioning)?
 
When experimenting with different devices for use with SLOG and metadata, do these devices need to be taken offline before being removed from the pool? Or does the "zpool remove" command do this automatically?
a remove should be sufficient (it is for other types), yet I haven't done it yet. If you're not sure and want to 100% save, stop all applications reading/writing from/to it.
 
Is SSD shredding an issue if you only have about 3% of the SSD partitioned(massive overprovisioning)?

Its not about how much space of the SSD you allocate, its about the amount of data you write to it. The bigger the SSD and the less space you want to use, the more NAND cells can fail before the SSD gets unusable. What defines how many writes each cell can handle before failing is the type of NAND. From worst to best is QLC, TLC, MLC, eMLC, SLC where rougly each step up up tripples the durability. So a QLC cell can handle 1000 P/E cycles and SLC 100000 P/E cycles. So a QLC SSD would need to be 100x bigger than a SLC SSD to compensate the the bad durability and only if you don't fill that QLC SSD more than 1%).

The other big thing is the power-loss protection which only those enterprise SSDs got. Without it there will be horrible write amplification when doing sync writes (and this is the only thing a SLOG does) so don't wonder if the SSD is failing another 10 to 100 times faster without it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!