Over provision ZFS SLOG?

justjosh

Member
Nov 4, 2019
93
0
11
58
Hi,

What is the official way to overprovision a SLOG for ZFS in proxmox? Can I just create a small partition with gpart and do zfs add <pool> log <partition>?

Thanks!
 
Yes you can do it that way. Did it myself exactly like you have described.
 
I dont get your question.
Nvme is not different to any other block device.
You have a physical device and either throw that completely into the pool or you partition it and assign the partition to the pool.
 
Are you certain your NVMe is designed to the heavy rewrites happening on the SLOG device?
If you are using consumer grade hardware the lifetime of your SSD will be around 1-4 weeks depending..

You would probably also want to mirror your SLOG device. It is no fun to loose the pool due to a single SSD failure. These cases have been seen. Fixed since pool version 19

Also is your SSD battery backed? You will need that in case of poweroutage.
 
Last edited:
  • Like
Reactions: Dunuin
Are you certain your NVMe is designed to the heavy rewrites happening on the SLOG device?
If you are using consumer grade hardware the lifetime of your SSD will be around 1-4 weeks depending..

You would probably also want to mirror your SLOG device. It is no fun to loose the pool due to a single SSD failure. These cases have been seen. Fixed since pool version 19

Also is your SSD battery backed? You will need that in case of poweroutage.
I'm using a 1TB PM953 which is rated for 1.3DWPD. It has PLP. I'm intending to overprovision, using only 50GB of the drive. So that should increase lifespan by around 20x?
 
I dont get your question.
Nvme is not different to any other block device.
You have a physical device and either throw that completely into the pool or you partition it and assign the partition to the pool.
NVMe drives have the main nvme0 drive which is sometimes divided into multiple namespaces like nvme0n1 nvme0n2 etc. Am I supposed to use gpart to create a small partition over a single nvme0n1 that fills the whole drive or am I supposed to create a small namespace nvme0n1 and leave the rest in nvme0n2?
 
. I'm intending to overprovision, using only 50GB of the drive.
The term you are looking for is "short stroking". This means you are making the usable space much smaller as the whole disk. This can either done by partition or by drive configuration itself.
I think 1.3 DWPD on a 1TB drive should be sufficient already depending on your workload and what the setup is planned to do.

Imho Your math is wrong though. Because it doesn't matter where the writes go. If you have 50 GB and write 1.3 TB a day your drive will wear out the same as if it is 1TB in size and you write the same amount.

The only thing changes is that your 1.3DWPD from 1TB will be like 26 DWPD on 50GB.
But that really doesn't change anything from the amount you can write to your device. The only thing why I would do short-stroking in your case: SLOG just doesn't need much capacity and additionally by this approach you might help the device to be able to access fresh/clean cells which helps easing internal processes. 50gb is even too much from what I know. But you could use the rest of the capacity as a cache device for the pool which might speed up things.

About the namespaces. Wasn't aware of that, thank you. It seems this is something similar to a partition (but on the physical device) - seems an equivalent approach to my partition configuration (on sata ssd I have to admit).

Best regards
 
I'm using a 1TB PM953 which is rated for 1.3DWPD. It has PLP. I'm intending to overprovision, using only 50GB of the drive. So that should increase lifespan by around 20x?

A slog device usually does not need to be much bigger than 10-20GB as they only need to hold few seconds of data before they are written to the underlying storage devices disks or ssd.

The problem is that the ZIL afaik does not support TRIM
NVMe drives have the main nvme0 drive which is sometimes divided into multiple namespaces like nvme0n1 nvme0n2 etc. Am I supposed to use gpart to create a small partition over a single nvme0n1 that fills the whole drive or am I supposed to create a small namespace nvme0n1 and leave the rest in nvme0n2?


apt-get install nvme-cli
nvme<TAB>

You will find a nvme delete-ns command which you can use to delete the ns's you dont need.

As has already been stated, ZIL does not need much as it only holds a few seconds of writes. The rest of your question has been answered I believe, otherwise hit me up again, I just woke up and haven't had breakfast yet ;)
 
A slog device usually does not need to be much bigger than 10-20GB as they only need to hold few seconds of data before they are written to the underlying storage devices disks or ssd.

The problem is that the ZIL afaik does not support TRIM



apt-get install nvme-cli
nvme<TAB>

You will find a nvme delete-ns command which you can use to delete the ns's you dont need.

As has already been stated, ZIL does not need much as it only holds a few seconds of writes. The rest of your question has been answered I believe, otherwise hit me up again, I just woke up and haven't had breakfast yet ;)
Appreciate the reply but that's actually not my question. My question was should the partition be a full partition of a separate namespace or a slice of the full namespace that covers the whole drive?
 
One would need to know how the underlying hardware operates.
From my expectation it shouldn't matter and behave the same.
 
Would they work out of the box?
Anyone tried those in real live?
I'm quite tempted to get one...
 
Some fun stats I found on reddit from a user who has been using a RMS-200 quite alot of time more than I have :p 115 PB written...
When he installed the card it had 49PB written. 1 year later... 115!!


=== START OF SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)

Critical Warning: 0x00

Temperature: 55 Celsius

Available Spare: 0%

Available Spare Threshold: 0%

Percentage Used: 0%

Data Units Read: 95,514,382,172 [48.9 PB]

Data Units Written: 225,318,828,690 [115 PB]

Host Read Commands: 1,148,541,024

Host Write Commands: 2,657,674,621

Controller Busy Time: 462,032

Power Cycles: 30

Power On Hours: 13,840

Unsafe Shutdowns: 24

Media and Data Integrity Errors: 0

Error Information Log Entries: 1



Error Information (NVMe Log 0x01, max 63 entries)

No Errors Logged
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!