ZFS ARC vs VM page cache

Bubbagump210

Member
Oct 1, 2020
53
33
23
45
It occurs to me that I may be wasting RAM due the fact that there are two caches at play. There is the ARC at the Proxmox layer. Then, my VMs have a kernel buffer/page cache caching reads for whatever file system they use … ext4, XFS, etc. Does anyone disable the page cache within their VMs. I’m having a hard time thinking why you would want to have the same data cashed at two different layers?

Edit: some more Googling and duh, this is one thing KSM solves.
 
Last edited:
Edit: some more Googling and duh, this is one thing KSM solves.
Can you explain that? I would have thought that only a small part of the cached data in RAM is actually well deduplicatable.

And as far as I know there is no way to disable caching inside a linux guest and completely disabling it would also not be a good idea as the guest OS knows best what to cache and what not. I learned its best to not give your guests more RAM than they actually really need. If the guests are running at the limits with just a small amout of RAM that isn'T used by processes then there is not that much free RAM that can be wasted for caching.
 
I found an academic paper on this double cache issue. Their argument was KSM partially solves this as if block XYZ is cached in ARC and cached in the VM’s RAM/page cache, it should be swept up as a duplicate by KSM.

I think your approach is right and what I’ve always done, but this occurred to me the other day as a “why is this double cache thing not talked about more”. I suspect it’s not talked about in VMWare as VMFS caches nothing to my knowledge and with Linux KSM is a good enough fix?
 
Last edited:
I found an academic paper on this double cache issue. Their argument was KSM partially solves this as if block XYZ is cached in ARC and cached in the VM’s RAM/page cache, it should be swept up as a duplicate by KSM.

I think your approach is right and what I’ve always done, but this occurred to me the other day as a “why is this double cache thing not talked about more”. I suspect it’s not talked about in VMWare as VMFS caches nothing to my knowledge and with Linux KSM is a good enough fix?
Not everyone is using KSM as this will allow attacks where a guest could read the RAM (and so passwords ans so on) of other guests or even the host. So in cases where you can't trust the software running in your guests KSM should be disabled.

Atleast here KSM isn't deduplicating that much. A very big part of my 64GB RAM should be used by ARC and Win/Linux caches but most of the time KSM is only showing 7-9GB of deduplicated RAM.
 
I typically get a similar 10% boost too. In any event, I’d be curious if someone intimate with this sort of thing could weigh in as at least on the surface having the same data in ARC and a VM’s page cache, especially at scale, would be a waste.
 
it's an interesting question...

KSM only works with anonymous page cache which isn't the same as ARC which operates on file basis - see here https://www.kernel.org/doc/html/latest/admin-guide/mm/ksm.html

to be honest I don't use ZFS with proxmox but I use it with freenas and used it with nexenta and I always set primarycache=metadata for virtual machines 'cause for zfs it just one big raw file which will never fit anyway in ARC.
 
Another thing no one has mentioned is having KSM enabled when using ZFS destroys performance.

https://github.com/openzfs/zfs/issues/12813

I've experienced this first hand, I have two NVMe drives in a mirror and a windows VM could get 3.5GB/s read in sequential in crystal disk mark which is half of the expected performance. After disabling KSM my reads went to 12GB/s. It was being read out of ARC both times.

I wasn't smart enough to capture screencap but I can go back and do so.
 
I always set primarycache=metadata for virtual machines 'cause for zfs it just one big raw file which will never fit anyway in ARC.
How do you do that in ProxMox? I am somewhat new to all of this and am currently preparing for my first ever implementation of ZFS on ProxMox.
 
I always set primarycache=metadata for virtual machines 'cause for zfs it just one big raw file which will never fit anyway in ARC
I see that differently. You don't have to fit a whole virtual disk in ARC. ARC will cache single records on block level, so it will just cache the most important blocks of an virtual disk. ARC will cache the data of the most recently used blocks, most often used blocks and will even predict what might be requested next (for example useful for sequential reads) and prefetch that data before it is actually requested.
How do you do that in ProxMox?
zfs set primarycache=metadata rpool

See: https://openzfs.github.io/openzfs-docs/man/8/zfs-set.8.html
 
Last edited:
I’m having a hard time thinking why you would want to have the same data cashed at two different layers?
IMHO the biggest difference here is that the caches in the VM are controlled by the guest OS. So the guest OS is aware of it. ARC is on a different layer and may cache different things which make sense from the ZFS perspective. This can but does not necessarily be the same.
Same applies to all storage technologies. The difference here is that in your case it all eats up PVE memory. So you might consider limiting arc usage (if you have not done this already)
My 20 cents
 
Also: Don't enable any KVM/QEMU/VM caching strategies in the GUI and let the default "no cache" be the only choice you have.
On this note: if I pass through an HBA, will proxmox still use zfs caching for the devices?
edit: my intuition says no, but just wanna make sure.
 
If you do the passthrough of the HBA the entire responsibility lies within the Guest-OS. PVE does not even "see" that anymore.
So ZFS Caching (assuming you use the attached devices within a zpool) is up to the guest. But it happens there. It is not disabled - so the guest might/will need more memory
 
If you do the passthrough of the HBA the entire responsibility lies within the Guest-OS. PVE does not even "see" that anymore.
So ZFS Caching (assuming you use the attached devices within a zpool) is up to the guest. But it happens there. It is not disabled - so the guest might/will need more memory
And the guest won't be able to use ballooning because of the PCI(e) passthrough.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!