SSD Cache

ZeusyBoy

New Member
Jul 28, 2024
1
0
1
I've heard a bit on YouTube about ssd caching, and I'm wondering if it's possible in proxmox. In my case, I have 7 600gb HDDs set up in a raid 6 array, and would like to set up a 1tb ssd for cache.
 
It's Linux, so any guide that describes it would work to some extend. It is completely not supported, so you're on your own.
 
proxmox supports zfs. zfs supports "caching" drive.

Code:
zpool add $ZC_NAME cache nvme-...
zpool add $ZC_NAME log nvme-...
 
It is completely not supported, so you're on your own.

What does that even mean for a non-subscription user, the "not supported"? Of course he cannot get support that he does not pay for. :) Just saying, because to a lot of people something not supported means "it would not work" which is not the case.
 
  • Like
Reactions: waltar
What does that even mean for a non-subscription user, the "not supported"? Of course he cannot get support that he does not pay for.
He will most certainly not get support for bcachefs even if he pays for it.

:) Just saying, because to a lot of people something not supported means "it would not work" which is not the case.
I wasn't aware of that. I meant 'support' as in 'help'. What word would describe it better? I just used the word like it is used on the english proxmox ve homepage.
 
I've heard a bit on YouTube about ssd caching, and I'm wondering if it's possible in proxmox. In my case, I have 7 600gb HDDs set up in a raid 6 array, and would like to set up a 1tb ssd for cache.
If you RAID is on zfs zpool, you can add disk cache z your zpool
Code:
zpool add your pool cache /dev/sdxxxx
add/remove vdev cache is non destructive
 
He will most certainly not get support for bcachefs even if he pays for it.

Well, strictly speaking that is also not true (in the "help" sense) as e.g. I would be happy to help here with bcache (not fs).

I wasn't aware of that. I meant 'support' as in 'help'. What word would describe it better?

I am not saying I have a better vocabulary, but I do not like certain (ambiguous) terms. The most neutral term would be probably non-standard - if one wants to emphasise the precariousness of the situation of running a setup like that, sure one can say untested.

Some features when marked as "preview" or "experimental" (lots of words used synonymously, even though to me it means more "subject to change" than unreliable), nothing wrong with using that. If docs explicitly mention something is discouraged, also no issue (at least it implies it is something they'd rather not see you doing, but it's possible to do). If something does not work, to me that's unsupported - e.g. CPU does only support up to such amount of RAM in certain configurations - then fair enough, that's a clear cut. There are also things which are e.g. "undocumented" altogether (but work). But if something is documented and mature solution and one is combining it with another such, I do not see any problem with that.

I just used the word like it is used on the english proxmox ve homepage.

Then again, maybe it's just me, right? I don't like other words used there too. E.g. "no-subscription" repo which does not communicate well that it's actually "testing" while the so-called testing repo should have been unstable instead. :rolleyes:
 
Well, strictly speaking that is also not true (in the "help" sense) as e.g. I would be happy to help here with bcache (not fs).
Again, I meant the official support. I always look through the enterprise class glasses.
Sure there are people on the interwebs helping. I ran flashcache years ago also very successfull with Proxmox VE without any (software) problems.


Then again, maybe it's just me, right? I don't like other words used there too. E.g. "no-subscription" repo which does not communicate well that it's actually "testing" while the so-called testing repo should have been unstable instead. :rolleyes:
Yes, nomenclature again. I can see your point, yet I would argue that the no-subscription repo is more stable than testing in the Debian sense of the difference. I think they have internally another repository that is actually the "real" unstable.


If you RAID is on zfs zpool, you can add disk cache z your zpool
ZFS L2ARC is not going to be a huge help. That's my experience and and others reported the same.

The best performance gain with a mix of HDDs and SSDs is to use the SSDs as a special device and put the metdata on there and control with the dataset property with special_small_blocks, which block you would also get like to be on the SSDs. Then another device (very fast IOPS) as a SLOG device, e.g. 16 GB Intel optane. Use the same redundancy for the special devices as for the data devices, this is technically a RAID0-like setup. If you loose the special device, everything will be gone.
 
ZFS L2ARC is not going to be a huge help. That's my experience and and others reported the same.

I am a bit surprised on this one. I can only imagine this would be because you have lots of random writes going on at all times.

The best performance gain with a mix of HDDs and SSDs is to use the SSDs as a special device and put the metdata on there and control with the dataset property with special_small_blocks, which block you would also get like to be on the SSDs.

But then again, this needs a mirror of the SSDs, adding a sole one (unlike L2ARC cache) would be madness.

Then another device (very fast IOPS) as a SLOG device, e.g. 16 GB Intel optane.

Optanes are EOL and the cost was always such that I wondered If may have as well had the pool be SSD only instead. Again I would consider this only in a mirror.

Use the same redundancy for the special devices as for the data devices, this is technically a RAID0-like setup. If you loose the special device, everything will be gone.

I might be wrong but costs wise nowadays it would still eat away the savings on the 6 spinning drives.
 
ZFS L2ARC is not going to be a huge help. That's my experience and and others reported the same.
If I had the option of putting in just one SSD, I would do it as a cache and not think too much about it. I don't see any particular disadvantages of this solution.
 
If I had the option of putting in just one SSD, I would do it as a cache and not think too much about it. I don't see any particular disadvantages of this solution.

The other thing is, this is often zero additional cost as you can e.g. have 256G SSD in a machine idling on nothing else than Debian install which requires 10G. The ZFS cache dev could be separate partition of the same and it can literally fail anytime, it's just read cache.
 
Optanes are EOL and the cost was always such that I wondered If may have as well had the pool be SSD only instead. Again I would consider this only in a mirror.
My NVMe 16 GB Intel Optane costs no 30 euros and is perfectly fast:

Code:
min/avg/max/mdev = 59.5 us / 117.0 us / 226.7 us / 45.6 us

The slowest speed on the Optane is on par with the fastest times of my enterprise SSD. This is a very huge improvement, which can be seen, e.g. by applying debian updates. Each transaction therein is usually sycned so that you see an improvement, the update is much faster, albeit there is not much written.

I am a bit surprised on this one. I can only imagine this would be because you have lots of random writes going on at all times.
It wasn't worth it. Compared to bcachefs it was almost not noticeable.
 
  • Like
Reactions: ucholak
My NVMe 16 GB Intel Optane costs no 30 euros and is perfectly fast:

Code:
min/avg/max/mdev = 59.5 us / 117.0 us / 226.7 us / 45.6 us

Optane is was of course very fast, but it's not sold made anymore ...

EDIT: Wait a minute, how is that 16G helpful for the OP?
 
Last edited:
Optane is was of course very fast, but it's not sold made anymore ...

EDIT: Wait a minute, how is that 16G helpful for the OP?
SLOG will improve the (sync write) performance of a disk pool significantly and a small 30 Euro Optane is a nobrainer.
 
SLOG will improve the (sync write) performance of a disk pool significantly and a small 30 Euro Optane is a nobrainer.
Ok, I will just break it down - and see where we disagree:

1. I cannot find anything in stock now, but one item that was last selling off in that price range at 16G capacity was:
https://ark.intel.com/content/www/u...es-16gb-m-2-80mm-pcie-3-0-20nm-3d-xpoint.html

2. PCIe 3.0 x2, but the max seq r/w 900 / 145 MB/s, the IOPS looks nice, but nothing special nowadays.

3. The OP mentioned "1TB SSD for cache", so I assumed L2ARC, not really SLOG.

4. For L2ARC, 16G is tiny and even cheaper SSDs will provide much better job in terms of bandwidth AND capacity. It will also increase RAM usage, but that's another story.

Now for the last part, even I do not think OP was after this:

5. For SLOG, I do not think I would recommend to run it other than in a mirror (feel free to tell me I should not care - I understand power loss is not a problem, but a corrupt one is, as rubbish would get flushed undected over time). That means it would occupy 2 M.2s and cost 60. But the product has sequential write less than a modern 7K HDD, so it would be just for the IOPS. But is there really that many random writes?
 
But is there really that many random writes?
That depends on your workload, yet I can say that working with a harddisk pool feels much, much faster with an SLOG and special metadata. L2ARC was not noticeable in our 100 TB pool and we decomissioned it in favor our slog/special device which wasn't available at the time we build this array.

PCIe 3.0 x2, but the max seq r/w 900 / 145 MB/s, the IOPS looks nice, but nothing special nowadays.
Of course it's not special nowadays for enterprise U.2 drives, yet not for the price point. If you're running a harddisk pool, you may not have the bucks for fast enterprise NVMe. If you have ... go with it and use two of them for slog and special device (partitioned one). sizing the slog depends on the sequential write performance * 5 seconds (default flush time), more is never used unless you change settings.

But the product has sequential write less than a modern 7K HDD, so it would be just for the IOPS. But is there really that many random writes?
It's the IOPS you're after ... and in ZFS, sequential is out of the picture in a fragmented disk pool pool.
 
I know you have much more experience with ZFS than I do, but let me a bit picky (or wrong?):
sizing the slog depends on the sequential write performance * 5 seconds (default flush time), more is never used
In my understanding there might be up to three TXGs active - and "active" for me means they occupy the storage and/or(?) Ram for the data they are handling in that moment.

Cited from delphix.com/blog/zfs-fundamentals-transaction-groups :

"... There are three active transaction group states: open, quiescing, or syncing. At any given time, there may be an active txg associated with each state; each active txg may either be processing, or blocked waiting to enter the next state. There may be up to three active txgs, ..."

This looks like it would possibly need space for up to 3*5 seconds...
 
It's the IOPS you're after ... and in ZFS, sequential is out of the picture in a fragmented disk pool pool.

Hang on, the random writes into the pool that the slog will be useful for - the slog itself acts as sequential file object ... so I do not think I am wrong looking at seq numbers on any ZIL candidate device.
 
I know you have much more experience with ZFS than I do, but let me a bit picky (or wrong?):

In my understanding there might be up to three TXGs active - and "active" for me means they occupy the storage and/or(?) Ram for the data they are handling in that moment.

I have not idea how the implementation looks TODAY, but I remember - and I found it again! [1] - that there's way more going on when it comes how it's all flushed (within the given timeframe).

[1] https://blogs.oracle.com/solaris/post/the-new-zfs-write-throttle
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!