Newbie questions about ZIL/SLOG and L2ARC

Antaris

Member
May 14, 2019
9
0
21
49
Hi, i am new to Proxmox VE. I've put 4 x 1TB nearline SAS drives in a Dell Precision T5500 on a PERC H200 and installed the pve on them in RAID10. I feel them a little slow and want to accelerate them with PCIe SSD accelerator SUN F20. The Prox sees 4 x 24GB SSDs and i want to use 2 of them in RAID1 for ZIL/SLOG, 1 for L2ARC and the last one as hot spare. Also how to determine the sizes of ZIL/SLOG and L2ARC? Thanks in advance for the help.
 
Hey!

I suggest you exploring these 3 articles:
As a personal note, please keep in mind that ZFS is a software RAID substitution, meaning that it works with bare hardware. The fact that you have 24GB SSDs means that you, if i'm not mistaking, can use at most one for ZIL and another one for L2ARC. The other 2 SSDs may be useless.

Also keep in mind that the size of SSDs may be a limiting factor in caching huge data sets. E.g. when copying a 50GB file like a VM, the ZIL can hold up to 24GB, meaning that the other half will still go slower. Another aspect is the RAM size, which holds the ARC cache. So if you have plenty of RAM and still need more cache, 24GB may not be enough. To sum up, i'd experiment with the SSDs for now, but i'd plan getting a larger one in the future.

Update: make sure to remember about IOPS. The whole purpose of caching is having quick access and small request queue. If IOPSs are poor, the system will not be as snappy. If i were you, i'd explore PCIe SSDs.
 
  • Like
Reactions: Fathi
Sun Flash Accelerator F20 is PCIE card that achieves 100K 4K IOPS and 1.1GB/sec:
Google it, because i still can't post links.
It has 4 Marvell SLC SSDs on it. They are 32GB overprovisioned as 24GB. I just need the commands to make RAID1 from 2 of them and pass it to main pool as SLOG device. And a way to make another one as hot spare in case of failure.
About the size of the ZIL/SLOG fot a system with 2TB RAID10 is far lower than 24GB. Something about 2-4GB...
Another idea is to make another RAID10 from all of them and use it as SLOG for double performance.
In this post: https://forum.proxmox.com/threads/zfs-raid-10-vs-sw-raid-10.46777/#post-221946 @RobFantini shows a tank with mirrored logs...
 
Last edited:
Hi,
First of all you must need to know that if you want to use a L2ARC, you will consume more RAM from the host(metadata for all data that is cached on L2ARC). So ... bigger L2ARC -> more RAM to use. A better ideea is to make some tests using different L2ARC size. Use command line tool to see how good is your cache with arc_summary. Sometimes, you can get very good results if you setup to have only metadata of a VM in L2ARC(as a example, a VM used for backup, where you will use rsync for a very large number of files). In other cases you will get bad perforormance as the case when you operate with large files who are write mostly but with very few reads , because it will fill up your L2ARC and ARC but almost never read(so the size remaining for write op will be highly reduced -> bad performance for write op on your pool).

Regarding ZIL/SLOG, the first step is to run arc_summary and see :

Most Recently Used Ghost: 0.01% 2.33k
Most Frequently Used Ghost: 0.05% 20.80k

In this example, ZIL/SLOG will not be a big improvement if I will use a dedicated SLOG device. Mostly SLOG is good if your workload use SYNC write, like any DB case, or maybe a NFS server.

Good luck!
 
ZIL is used for sync write before flushing data to pool. Data flush happens every ~5 seconds. If your system don`t do a lot sync writes then you don`t need huge ZIL.

How big ZIL must be?
Calculate like this ZIL_MAX_WRITE_SPEED * FLUSH_TIME.

What happens then ZFS pool works at max load?
ZFS just stops accepting new data for writes while current data in cache is not flushed.

What does it means?
ZFS works different from traditionals file systems. It do a lot of others things and it require time for it. If you track the speed of data write ( not average calculation ) you can see the wave of speed.
 
So if Sun F20 achieves 100K 4K IOPS and 1.1GB/sec when writing to all 4 modules the size for logs(ZIL) must not be larger than 5.5GB. If modules are used in RAID10 then the volume must not exceed 2.75GB.
Is that correct?

ps: i found a commant to add all 4 modules in RAID0 as logs, but don't know how to limit the size...
In my case the command is: zpool add -f rpool log /dev/sde /dev/sdf /dev/sdg /dev/sdh

btw. arc_summary is very usefull. Is there such tool for ZIL\logs?
 
Last edited:
You don`t need to limit ZIL in size. I just gave you are calculation of recommended ZIL size. It can be larger.

How ZIL works?
Then ZFS gets sync write request ZFS will put the data to ZIL and into write cache (RAM). After the data is flushed to pool from cache ZFS marks in ZIL about success data write. Then you sync data comes ZFS will overwrite old data in ZIL with new. What does it means? ZFS will not write data to ZIL continuance but starts from begging. It hurts SSD cells.

If you want to limit ZIL you can give partition, not all disk to ZIL. As RAID ZIL it can prevent saving data from crashed ZIL.
 
Thanks about the warning about SSD wearing. These modules (Sun F20) are dirt cheap on EBAY. Modules are SLC cells(high endurance) and are overprovisioned with 8GB (33%). They are marked on the board as 32GB but visible are only 24GB. This is a homelab for experiments, so i intend to broke it all possibile ways and rebuild it to learn. :) Thanks again for your time and good advices.
 
As @Nemesiz pointed out, you dont need much log. the formula is correct but actually not in a practical sense- its the BOTTLENECK speed x Flush time. If your data is completely generated on the same host in memory, then the bottleneck speed is the ZIL write speed, but in practice its usually dependent on incoming network speed. on a 1gbit network connection that means ~100MB/s * 5 seconds or 500MB. 2-5GB is usually overkill, and would be sufficient even for a 10gb network.

L2ARC is almost always pointless, unless you have a LOT of ram serving a LOT of IO (and even then you'd probably better off just with ARC in ram.)

In your shoes I would use my F20 as a RAID10 for VM boot devices, and use the spinners for slow storage/nfs. 48GB sounds like very little but most linux servers can comfortably live on 4-8GB of primary mounts.
 
I decided to go in a risk model in a home lab, and for now i use 4 DOM SSDs all in RAID0. Little CrystalDisk Mark results from a W7 VM on 300GB thin drive:
-----------------------------------------------------------------------
CrystalDiskMark 5.2.0 x64 (C) 2007-2016 hiyohiyo
Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

Sequential Read (Q= 32,T= 1) : 5825.215 MB/s
Sequential Write (Q= 32,T= 1) : 4266.846 MB/s
Random Read 4KiB (Q= 32,T= 1) : 387.327 MB/s [ 94562.3 IOPS]
Random Write 4KiB (Q= 32,T= 1) : 187.457 MB/s [ 45765.9 IOPS]
Sequential Read (T= 1) : 1782.281 MB/s
Sequential Write (T= 1) : 1915.560 MB/s
Random Read 4KiB (Q= 1,T= 1) : 23.860 MB/s [ 5825.2 IOPS]
Random Write 4KiB (Q= 1,T= 1) : 29.745 MB/s [ 7262.0 IOPS]

Test : 1024 MiB [C: 12.8% (38.3/299.5 GiB)] (x5) [Interval=5 sec]
Date : 2019/06/04 23:03:02
OS : Windows 7 Professional SP1 [6.1 Build 7601] (x64)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!