Opinions | ZFS On Proxmox

Hi,



No. SLOG is for any SYNC write operations on zfs pool.



ZFS is a volume manager, who can expose his own FS(like ext4, ntfs) and a block-device/zvols (like a disk, eg /dev/sdX ). So on the same pool,
you can have many datasets(can be used like any other FS, creating files and folders) and also many zvols(like /dev/sdX).

Good luck / Bafta!

Thanks for the additional clarification :)

Are you aware of any reference article for calculating special device capacity ?
I'm not sure if we wish to go down the path of Dedupe just meta data and small blocks.
From a VM perspective would there even be a use case for small blocks or am i looking at it the wrong way?

""Cheers
G
 
HI,

I would go only with SLOGs and maybe L2ARC cache(if your NVME is really fast), and I will obsserve how will be the performance. After this period, you will see if you really need special devices. Dedupe can be use with decent performance if you have a lot of RAM(dedupe DB must fit entirely in your RAM), your data does not change very often.... and most important you must be 100% that your data is deduplicable.

Good luck / Bafta
 
  • Like
Reactions: velocity08
HI,

I would go only with SLOGs and maybe L2ARC cache(if your NVME is really fast), and I will obsserve how will be the performance. After this period, you will see if you really need special devices. Dedupe can be use with decent performance if you have a lot of RAM(dedupe DB must fit entirely in your RAM), your data does not change very often.... and most important you must be 100% that your data is deduplicable.

Good luck / Bafta

Good advice ill report back once the new server has landed :)

""Cheers
G
 
First, I agree with all what @guletz said and in addition:

The charm of special allocation class is that it has different classes, the main booster is the metadata class, which is covered on the wiki. It'll drastically improve your performance with respect to all metadata activities. The amount is relative to your stuff, so having a lot of files will need more metadata. I have no "rule of thumb" for you, but I use 128G devices for my 16 TB pool and still plenty of stuff left. If you have datasets, that need the extra amount of performance, you can also but them in the special allocation class (also in the aforementioned article).

Another class would be the dedup, so that the dedup-table is stored on the faster special class of devices. It needs the same amount of same as it would be in your pool, but it is normally on much faster devices, so that the penalty for not having it in RAM is less huge as it would be with the normal pool vdevs.

If you want to test how well your data is deduplicable, you can check your data by simulating a deduplication table: zdb -S <poolname>.
 
  • Like
Reactions: velocity08
First, I agree with all what @guletz said and in addition:

The charm of special allocation class is that it has different classes, the main booster is the metadata class, which is covered on the wiki. It'll drastically improve your performance with respect to all metadata activities. The amount is relative to your stuff, so having a lot of files will need more metadata. I have no "rule of thumb" for you, but I use 128G devices for my 16 TB pool and still plenty of stuff left. If you have datasets, that need the extra amount of performance, you can also but them in the special allocation class (also in the aforementioned article).

Another class would be the dedup, so that the dedup-table is stored on the faster special class of devices. It needs the same amount of same as it would be in your pool, but it is normally on much faster devices, so that the penalty for not having it in RAM is less huge as it would be with the normal pool vdevs.

If you want to test how well your data is deduplicable, you can check your data by simulating a deduplication table: zdb -S <poolname>.

Awesome information thanks @LnxBil really appreciate you taking the time to go into more detail.

Are you currently using Dedupe?

We are torn between having a few stand alone ZFS hosts and deploying a Ceph cluster, ZFS looks like it will be easier for our team to manage then Ceph, obviously both platforms have their + & - with scale, HA and simplicity.

Do you know of any specific formula for calculating Cache, or Special device vDev drive capacity requirements?

Looking at Optane the most affordable is the 375 GB @$2,000 AUD approx.

this seems to be the only type thats going to provide better latency than our capacity drives.

thoughts?

""Cheers
G
 
Do you know of any specific formula for calculating Cache, or Special device vDev drive capacity requirements?


HI,

For cache, a stupid "how do I do it" is to start with a small ammount like 0.5-1GB L2 ARC /1TB zpool data, run your VMs, and see what is the result(arc_summary, is a good tool for this, see cache hit ratio and how much space will use ) . Then adjust your L2 ARC size acordinaly(+/-). For each test/value, use at least a week, and do not reboot the server during this period.

Also take in account that, that what-ever-is in L2-ARC will also consume your RAM, so keep an open eye on this.


ZFS looks like it will be easier

Do not count on this. zfs is a "huge animal", and you need some time to understood his special needs ;).


Good luck / Bafta !
 
  • Like
Reactions: velocity08
HI,

For cache, a stupid "how do I do it" is to start with a small ammount like 0.5-1GB L2 ARC /1TB zpool data, run your VMs, and see what is the result(arc_summary, is a good tool for this, see cache hit ratio and how much space will use ) . Then adjust your L2 ARC size acordinaly(+/-). For each test/value, use at least a week, and do not reboot the server during this period.

Thank you @guletz appreciate you taking the time to reply.

I think i should be clearer with my question RE Cache drive.

I'm looking at the 375 GB Optane as a caching drive for SLOG/ LARC, from what i've been reading the SLOG can be approximately 1/4 the capacity of RAM even less.

The question really pertains to what size capacity Optane will be suitable, if we go for 375 GB will this be too small?

Its fine to do testing after the purchase but by then the money has already been spent and when the smallest capacity Optane is $2k AUD that's not a small investment.

hope that makes more sense.

""Cheers
G
 
from what i've been reading the SLOG can be approximately 1/4 the capacity of RAM even less

The size of SLOG is at most the max of what your pool can write in 5 seconds(default time when any ARC data are flushing to the disks)
Note that not all write operations are done in sync mode(only sync write op will go to SLOG)


Good luck /Bafta
 
Last edited:
Its fine to do testing after the purchase but by then the money has already been spent and when the smallest capacity Optane is $2k AUD that's not a small investment.


Hi again,

Why do you not test with a "consummer" SSD, so you can find the size that it will fit your load. Then you can know how much space you will need, and buy the desired size for Optane/what-ever ?

Good luck /Bafta
 
  • Like
Reactions: velocity08
Are you currently using Dedupe?

No, not worth the overhead. I tried it a few years back, but the benefits where not match the ram requirements.

Do you know of any specific formula for calculating Cache, or Special device vDev drive capacity requirements?

Looking at Optane the most affordable is the 375 GB @$2,000 AUD approx.

Yes, Optane is great for a SLOG device, but you normally only buffer 5 seconds (or whatever is your default zfs_txg_timeout), so a very, very small port of your 375 GB Optane will be used for caching. You only cache sync writes, so that you really don't need that much. Even if you would write with a GByte per second, you would only need 5 GB of SLOG.
 
  • Like
Reactions: velocity08
Do you mind share how you do that?
Hi, sorry for the delay in replying.

I googled how to use folder2ram and zram from the Armbian setups plus looked into the debian zram scripts.

I downloaded the zram-config_0.5_all.deb and installed it, and tweaked it to suit running more zram devices as compressed swap devices, plus /tmp and /var/log as compressed zram devices. Then I used scripts from Armbian - https://github.com/armbian/build/blob/master/packages/bsp/common/usr/lib/armbian/armbian-ramlog and https://github.com/armbian/build/bl.../common/usr/lib/armbian/armbian-truncate-logs to set up the ramlog and keep them in check if something runs away spamming the logs.

As long as you're careful and identify the correct hdd backing folder for the ramlog, this works well, copying in the logs from the hdd on startup and writing them out on shutdown, plus writing out copies at a configurable interval.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!