VM Virtual HDD Cache Settings for SSD-Backed ZFS?

Sep 1, 2022
240
46
33
40
Hello,

I'm running Proxmox nodes with storage pools for VMs that are SSD backed, with Discard enabled in Proxmox and TRIM enabled in the VMs where that's an option. There are no spinning rust drives involved anywhere in my VM storage.

I'm still at the stage of creating test VMs from tutorials, so I want to make sure I understand the virtual disk cache settings before I get too deep into this.

  1. By default, there is no cache.
  2. I've seen it suggested that we should enable caching for metadata, but I don't see an option for that.
  3. Options I do have are: direct sync, write through, write back, write back (unsafe), and no cache (the default I'm using now).
I'm fairly certain I don't want to use any of the available options, from what I've read, and should stick with no cache. Correct? My primary concern is VM storage for OS and non-persistent data.

I don't have a "metadata" option. Am I supposed to? What does do it do? Do I need to figure out how to make it appear?

I've found 15 posts on this across the internet that give 20 different opinions, so I wanted to come here to hopefully get some clarity. I'd really appreciate any help. Thanks!
 
This is the 3rd post from you I answer and let's just say: You're asking the right questions and I like it.

As you said, ZFS pool, answers are only valid for that!

1. By default, there is no cache.
Yes, you don't want to cache stuff multiple times. You already have your block cache inside of your guest OS, which is the only one your guest OS knows of (read and write if applicable). Then you have ZFS ARC, which caches the blocks in addition to that (only read cache - see the answer to your question 2) and the VM cache is an additional layer that could be enabled to cache it further, e.g. also cache writes and also in a unsafe fashion like cache sync writes. Having a VM cache will need additional space on your PVE host that is counted (but not shown) in addition your VM memory and make the planning of VM memory usage very unpredicable.

Also the I/O answer times can vary dramatically with multi-tier caching setups so that you VM gets one block in 1us and others in 1ms depending on the cache situation in ARC or VM cache.

2. I've seen it suggested that we should enable caching for metadata, but I don't see an option for that.
Without having seen the actual claim, I can only speculate that this relies to metadata caching inside of ZFS. You can set primarycache for each zfs dataset/zvol seperately or just set it for the parent dataset and get the setting via inheritance and it'll eliminate the additional ARC caching step I describes in the answer 1 which will reduce the 2-tier caching to only 1-tier caching, the one inside of your guest OS. This minimized your caching and therefore the memory waste, but will slow down your system, especially for system you often work with and cannot cache their data itself. It's a tradeof between caching data inside of your VM and your PVE host. I'd always go with more ram inside of your VM to cache its data, because it known what it needs and "intelligent caching" is always better than caching everything from the outside.

Just to give a counter example: If you restart your guest OS in a loop, you will benefit from caching it from the outside, because the inside resets its cache on each reboot. That's the best example I know of in which you really see the difference while using PVE host caching.

3. Options I do have are: direct sync, write through, write back, write back (unsafe), and no cache (the default I'm using now).
In addition the 1: If you have enough RAM (multiple times what you need), you can set additional caching methods in order to superjuice your VMs, but in a highly packed environment, I'd just go with the default (and in ZFS ALWAYS with no cache due to the ARC).
 
  • Like
Reactions: ZipTX
Awesome; I appear to have stumbled into doing the right thing (for once). :)

Thanks again for your help, and for these quite detailed answers. It really sounds like I won't ever need to worry about straying from the defaults unless I'm doing something very unusual.

I don't think I'll mess with the host-level metadata caching, even though your description of what it's doing made sure that I actually understand it now. :) This isn't exactly the fastest system--the mobile Ryzen 5900HX is speedy enough, but the RAM is only 32GB DDR4-2400. I don't want to do anything that's going to slow it down.

Reading this did convince me to go ahead and max out the memory to 64 GB. I think that's going to be the next (and last) upgrade I do to this little thing.
 
It really sounds like I won't ever need to worry about straying from the defaults unless I'm doing something very unusual.
Yes, exactly. I use the other caching methods for things that I explicetly need to be as fast as possible like testing stuff out that is by no means production-critical. As already said, speeding up reboot cycles is one of the things I use if for. Another is to virtualize a multi-tier ZFS environment, where I simulate different response time devices.

Reading this did convince me to go ahead and max out the memory to 64 GB. I think that's going to be the next (and last) upgrade I do to this little thing.
Yes, you can NEVER have enough RAM :p
 
  • Like
Reactions: SInisterPisces
Yes, exactly. I use the other caching methods for things that I explicetly need to be as fast as possible like testing stuff out that is by no means production-critical. As already said, speeding up reboot cycles is one of the things I use if for. Another is to virtualize a multi-tier ZFS environment, where I simulate different response time devices.


Yes, you can NEVER have enough RAM :p

Any idea how much the timing speed matters on otherwise equivalent RAM for PVE?
I'm running DDR4-3200 SODIMMS (laptop memory), and my choices for upgrade from my preferred vendor are CL22 and CL16. I mean, lower is better, but lower is considerably more expensive, too.

I'm equipped with a Ryzen 5900HX with integrated GPU, which I plan to pass through to a VM for low end gaming, so I think the lower timings would be worth it?
 
Any idea how much the timing speed matters on otherwise equivalent RAM for PVE?
For the CL stuff, I really don't know. I've never seen this with enterprise ram, there you normally only have the rank and the clock speed and it has to match what you're want to do. Higher ranks normally mean you cannot populate all slots, but the higher rank modules have often more capacity (e.g. 128/256/512/1024 GB). The speed should be maxed out with respect to the CPUs capabilities and populating more slots often reduces the overall clock speed, so fastest systems have a small number (e.g. 6) ram modules with high clock rates.

With the same module but different clock rates (e.g. 800 vs. 1600 MHz) the difference is noticeable and measureable with memory allocation tests, but it is not that big. The thing is that you can always optimize and tweak with even more expensive hardware. Nowadays, you also have to consider time-of-arrival of most of the components so you have to find (or approximate) the optimal point with respect to componets, price and time-of-arrival.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!