Is a SSD cache worthwhile for VM disk performance?

I gave it a spin today.

Installation - trivally easy. Install build-essential, dkms, git and the kernel headers. Instrtuctiosn from source:

https://github.com/stec-inc/EnhanceIO/blob/master/Install.txt

Setup by far the easier and less error prone compared to dm-cache. I love that it sets up udev rules for auto mounting of the cache on reboot - with dmcache and flashcache you have to write your own init scripts. Being able to cache existing partions on the fly with no prep is *exteremely* usefull. Overall it feels much safe than fiddling with dm-cache, also status and stats are easy to extract.

Performance - tricky to measure and I'm using it to speed up gluster disks which complicates things. If you hav suggests for bench marks I'd appreciate it.

I ran tests several times so as to populate the cache. Inital reads were limited by sata disk i/o (150 MB/s), but subsequent reads would be up to 400 MB/s.
dd if

I was trying crystaldiskmark inside a VM. Read preformance increased by about 300%, raw write perfomance actually dropped, even with writeback enabled. By random read/write was greatly improved.

OTOH actual application usage varied. A long build process still took around 10 min, Eclipse (Java IDE) startup time stayed the same.


Interesting article here:

http://www.sebastien-han.fr/blog/2014/10/06/ceph-and-enhanceio/

Suggests using it in writethrough mode in combination with an external ssd journal for improved write performance.

Also I never thought of using the /dev/disk/by-id links, much safer than /sda etc.

Thanks for the detailed report, we will try EnhanceIO in the near future and post our experience here.

I have a couple of questions that came to mind reading your writeup:

1. Did you use LRU or FIFO?

2. If you were to backup your entire server every night, I assume FIFO would have to rebuild the entire cache next morning, but an LRU cache would still contain useful data? Do we know anything about the "recentness" in LRU, do you think it spans a few days?

3. If you were to combine EnhanceIO-writethrough with an external SSD-based journal for your ext4 filesystem, could you do it on the same SSD (different partitions for journal and cache)? How would you partition this SSD (read cache relative to write journal)?

4. Based on your recent experience with ZFS, which way would you go? EnhanceIO writethrough + external SSD journal for ext4 or ZFS with L2ARC/ZIL on SSD?
 
Last edited:
1. Did you use LRU or FIFO?

FIFO initially, then LRU - LRU felt like it gave bettere results.

2. If you were to backup your entire server every night, I assume FIFO would have to rebuild the entire cache next morning, but an LRU cache would still contain useful data? Do we know anything about the "recentness" in LRU, do you think it spans a few days?

Not sure unfortunately, it was a concern. Also wether the backups woul "wear out" the SSD

3. If you were to combine EnhanceIO-writethrough with an external SSD-based journal for your ext4 filesystem, could you do it on the same SSD (different partitions for journal and cache)? How would you partition this SSD (read cache relative to write journal)?

I did try this with ext4, using the one SSD. 10GB for the Journal, 50GB for EnhanceIO (read/write)

Seemed quite fast, but generated data corruption. I suspect the combination of an external journal and a block cache is not good. Quite scary to contemplate.

4. Based on your recent experience with ZFS, which way would you go? EnhanceIO writethrough + external SSD journal for ext4 or ZFS with L2ARC/ZIL on SSD?

ZFS + L2ARC and ZIL on ssd.
- Performance was good. The L2ARC algoritihms are far more sophisticated than the moes for bcache or EnhanceIO, long term usage should benefit
- A delight to work with, so flexible and clear. Great command line tool set, very easy to see what is going on. The ability to add multiple SSD's fo cache is amazing.
- ZFS. Amazing powerful soft raid setup. Snapshots. Backups, Send/Recv, Compression, Deduplication.
 
Just spamming my summary post from the list:

Thought I'd do a quick summary of my results - very subjective really, so take
with a pinch of salt.

As I suspected, raw disk becnhmarks, either from within the VM (Crystal
DiskMark) or on the host (dd, bonnie++) while interesting, arent' a very good
guide to actual application performance improvements. In the end I ran a
number of std application usage timings - a long build process, Eclipse
startup and builds, UI responsiveness.

With all the different cache strategies I tried, disk bencharks gave widely
varying results, while app benchmarking were all very similar.

dm-Cache:
- Complicated to setup
- Fiddly and error prone to manage
- Needs custom init scripts
- No auto ignoring of bulk reads/writes (disk cloning, backups)
- I managed to destroy the underlying file system while attempting to flush
and dismount the write cache.
- Did give good results for reads/writes

bcache
- User tools have to be compiled and installed
- can't be used with existing file systems
- No auto ignoring of bulk reads/writes (disk cloning, backups)
- Needs custom init scripts
- 3.10 kernel version is hopelessly buggy. Trashed the file system
- No tools for uninstalling. Required a hard result and then I had to use dd
to overwrite the partition table to remove the cache store. I then blacklisted
the module

EnhanceIO
- Has to be compiled and installed
- *can* be used with existing file systems
- Can be created/edited/destroyed on the fly
- No auto ignoring of bulk reads/writes (disk cloning, backups)
- persistent between reboots (udev rules)
- Good results on reads/writes
- Unfortunately when I combined it with an external ext4 journal I got data
corruption

ZFS (zfsonlinux.org)
- Used the kernel module
- has a repo, but required kernel headers, build-essentials and dkms.
- Builtin support for journal and read caching using multiple SSD's
- ZFS! with all the zfs goodies, raid, striping, snapshots, backups, pool
managment.
- Auto ignoring of bulk reads/writes (disk cloning, backups)
- good tools for management and reporting disk/cache stats and errors
- I restricted it to 1GB RAM on Hosts
- Good results on reads/writes
- No support for O_DIRECT so I had to disable glusters io cache, which is
recommenced anyway for virtual stores.


ZFS was the clear winner - ease of management, reliability, flexibility. Its
going to make expanding the stores so much easier in the future.

ZFS + Gluster + SSD's caches seems to be a winner for shared HA storage to me.