options for a system wide read cache? is anyone utilizing something for this?

zenowl77

Active Member
Feb 22, 2024
237
44
28
wondering if anyone out there has a read cache setup that is system wide and includes vms, to cache most frequently read files all over proxmox and vms to boost performance?

not talking about the VM cache options such as writethrough, writeback, etc or L2arc which doesnt actually cache the files themselves.

but an actual read cache.
i have a whole 1TB sata SSD, proxmox is installed on a 1TB NVME and i have a 256GB NVME used for swap/vm swap / pagefile, etc along with ZFS l2arc

but i have 2x10TB, 2x8TB and 1x12TB drives and i use a lot of the files on one of the 10tb drives and the 12tb drive in VMs, some of which are things like AI models and they can take a long time to load from disk, i usually put files on the SSDs for faster access but i would like to use the SSDs for other things too besides AI models and if i dont put the models on the SSDs it can take 5-10min or more to load one into ram, which can be annoying when switching around between models to testing different models on a prompt.

so the best solution seems to me that i should be using one of the SSDs as a read cache. i have considered primocache for my main windows VM since a lot of the things i do with ai models is in windows so it could work, but that wouldn't really work so great for the rest of proxmox or anything, plus its just for one vm and id have to re-register it if i switched my windows VMs around which i sometimes do, so that just seems like a pain.

i was reading a few posts online including this one: Proxmox - LVM SSD-Backed Cache

but i was wondering what everyone else is using and if anyone already worked out the best options
 
If you're stuck with spinning disk, make sure you're using fast ones. Toshiba N300 is really good and NAS-rated, just don't go with the 16TB or above if you want things to finish in less than 24 hours.

Otherwise, if you have spinning-disk ZFS then invest in a couple of 512G-1T used Enterprise 2.5-inch SATA drives (per pool) and put them on an external disk rack. (This is fairly easy if you have Usb-C and a UPS.) You can mirror them as a Special block device for your spinner pool and allocate metadata + small files on them. This will also speed up scrubs.

https://search.brave.com/search?q=zfs+special+vdev

https://klarasystems.com/articles/openzfs-understanding-zfs-vdev-types/


NOTE: Essential to mirror the special dev (AND use 2 different make/model of SSD so they don't wear out around the same time with identical wear patterns) bc if that whole special dev dies, so does the pool.

https://www.amazon.com/dp/B0CX14DR9T

https://www.amazon.com/dp/B0D28Q187R

/ The internals are pretty much identical, just the external case is different - and availability may differ, so I gave you a couple of choices. Be aware it is not a hotswap case, you have to power down and take it apart to place/replace disks.

There is another item on ebay " 5 bays SATA SAS D Drive Bay Hard Drive Disk Case Enclosure 2.5"/3.5" USA " that can do SAS / SATA and is hotswap, but needs either 4-5 SATA ports or a SAS HBA card and 4-cable breakouts to hookup + 2x Molex power; you can hotwire a standard PC power supply to work with it by jumping 2 holes with a paper clip or similar, google it. Get two of them and 4x Molex power and you can do 8 disks + still have 2x slots left over for spares.

Another option if you want inexpensive lots-of-disk-slots (15)x 3.5-inch on ebay is " EMC Expansion Array Jbod Disk Array Shelf W/ 15x SAS SATA Trays Dell HP 6GB CHIA " for under $300, but again you would need an actively-cooled SAS HBA in IT mode to utilize it. That one includes the cable tho. Fully populated it runs ~160Watts IIRC.

The thing is though, the Linux OS takes care of caching for you for the most part, you can see it in ' top ' - so the more RAM you have the more will be cached. Special vdev and L2arc are the caching speedups that I know of for ZFS specifically, but there may be some sysctl stuff you can tune as well for caching. Maybe if you do a find / ls -lR before loading your models it might help, I do this with cron every other weekday to keep l2arc populated.
 
thank you for the suggestions, i appreciate it. i have considered going with a special block device but i did not really want to risk it since i cannot afford to grab extra disks or really set it up properly in my current setup.

currently all my disks are HGST enterprise except one 8TB is a WD red, the HGST enterprise disks average 250mb/s usually and work very well. although the read rates tend to be slower inside VMs vs the NVME drives and within the vms it seems im not getting even 100mb/s half the time with really high latency. (sometimes its great other times i have seen it spike to 50,000ms....)

for spinning disks the 2x 10TB + 1x 12TB HGST drives are in the system then i have the 2x 8TB drives in an Acasis dual bay 3.5in external enclosure.
61lDkcdGOGL.jpg


not quite as nice as those in some ways but it is good and i like the design of it. the 2x8TB are mostly my file backup drives to keep ISOs and original / important files, installers, drivers, etc, things to be saved but not often accessed. i usually get around 150-180mb/s off of those which is good over USB.

but the caching i am looking to do is caching frequently access files from specifically one 10tb and the 12tb drives that are used within the VMs to the NVME/SSD drives so that i get a speed boost on reading the most commonly accessed files, i do have 96gb of ram but that is nothing when most of that is in use and there are multiple terabytes of files, probably at least 400GB worth accessed frequently, which is really perfect for an SSD to cache to be read from SSD rather than directly from disk.
 
wondering if anyone out there has a read cache setup that is system wide and includes vms, to cache most frequently read files all over proxmox and vms to boost performance?

not talking about the VM cache options such as writethrough, writeback, etc or L2arc which doesnt actually cache the files themselves.

but an actual read cache.
i have a whole 1TB sata SSD, proxmox is installed on a 1TB NVME and i have a 256GB NVME used for swap/vm swap / pagefile, etc along with ZFS l2arc

but i have 2x10TB, 2x8TB and 1x12TB drives and i use a lot of the files on one of the 10tb drives and the 12tb drive in VMs, some of which are things like AI models and they can take a long time to load from disk, i usually put files on the SSDs for faster access but i would like to use the SSDs for other things too besides AI models and if i dont put the models on the SSDs it can take 5-10min or more to load one into ram, which can be annoying when switching around between models to testing different models on a prompt.

so the best solution seems to me that i should be using one of the SSDs as a read cache. i have considered primocache for my main windows VM since a lot of the things i do with ai models is in windows so it could work, but that wouldn't really work so great for the rest of proxmox or anything, plus its just for one vm and id have to re-register it if i switched my windows VMs around which i sometimes do, so that just seems like a pain.

i was reading a few posts online including this one: Proxmox - LVM SSD-Backed Cache fnf

but i was wondering what everyone else is using and if anyone already worked out the best options
I tried building a system-wide read cache setup on Proxmox to speed up access to large AI models, and honestly, it's been a mixed bag. ZFS L2ARC doesn't cut it for big, frequently-accessed files, and SSD-backed LVM caching adds complexity without always delivering consistent speed boosts across VMs. The most practical gain came from targeted caching inside key VMs (like using PrimoCache in Windows), but system-wide magic still feels like a work-in-progress hack.
 
Hmm, 50k ms response time could be drive spinning back up from sleep, I don't put my spinners to sleep at all with hdparm unless I'm about to remove a disk.

For spinners that host VM vdisks, you can try ' blockdev --setra 2048 /dev/sdX ' per-disk (won't survive a reboot unless you put it in rc.local) to increase the readahead. The default is only 256

Worst case, you might get a couple of usb3 Samsung T7s and set them up as Special vdev, just wait a week or so to attach (not add!) the 2nd drive / make it a mirror so the wear patterns aren't identical and that should give you a few days to RMA a replacement if one dies. Altho having a spare on hand is also better to minimize exposure