Can we use lvm cache in Proxmox 4.x?

e100 · Jan 15, 2016

Anyone tried setting up lvm cache? Its a fairly new cache tier based on dm-cache
Theoretically we should be able to add an SSD cache to any logical volume that Proxmox has created for VM disks.

It supports writethrough and writeback cache modes.
With writethrough no data is lost if the cache device fails.
You can read all about it in the "lvmcache" man page.

This appears to be a good how-to on setting it up in debian Jessie so I can only assume it would work in Proxmox 4.x
http://www.bradfordembedded.com/2015/03/lvmcache/

Article with some simple benchmarks:
http://www.nor-tech.com/the-benefits-of-ssd-caching/

LnxBil · Jan 16, 2016

Technically it should be possible. I used flashcache with Proxmox 3.4 for a while, yet I do not know if this is a supported configuration or not.

dietmar · Jan 16, 2016

I personally prefer ZFS for such setups, because it is easier to manage and provides more features.

LnxBil · Jan 16, 2016

I switched to to ZFS on the machine I used flashcache before. Yet, it has more features (that's why I'm using it right now), but ZFS is much slower - order of magnitude than LVM-backed, crypted flashcache! At least it feels that way. It is a Dell notebook with 32 GB SSD (all L2ARC, because I do not have many SYNC-writes) and 500 GB SATA.

dietmar · Jan 16, 2016

LnxBil said:
but ZFS is much slower - order of magnitude than LVM-backed, crypted flashcache!

ZFS is really fast if you use feasible hardware (not a dell notebook).

e100 · Jan 16, 2016

Do you plan to some day support live migration using the new zfs sync feature?

If that was possible I'd use zfs instead of DRBD for some VMs.

LnxBil · Jan 16, 2016

@dietmar: I felt in love with ZFS, but I never encountered "fast". I tried 4 different systems and have a big machine equipped with 6x Enterprise SSD with ZFS and it is not "really fast", ext4 is still a magnitude faster. Maybe, my hardware is too cheap, it's one low 5 figures (without SSD) per 1 HE, but still.

I only wanted to say that I gained better experiences with SSD caching devices for LVM than for ZFS. Laptop booted and felt faster.

Melanxolik · Jan 16, 2016

But zfs can't possible on cluster storage between nodes

dietmar · Jan 17, 2016

e100 said:
Do you plan to some day support live migration using the new zfs sync feature?

This is an option for the future. But CRIU needs to get more mature ...

maxprox · Mar 22, 2016

e100 said:
Anyone tried setting up lvm cache? Its a fairly new cache tier based on dm-cache
Theoretically we should be able to add an SSD cache to any logical volume that Proxmox has created for VM disks.
.....

Yes, that also my question. For a new server Setup with hardware Raid (Raid 10 with 4x HDDs) and two Intel SSDs, I want to use the SSDs for caching (dm-cache). I would therefore repeat the question
Is there someone who already has experience with LVM / dm-cache and Proxmox 4.x?
Can describe someone how he implemented that?
EDIT:
In this case an important question for me is:
if I perform a new setup, with dm-cache, can I be sure that no next Proxmox update all destroyed again?
Is that a stable feature?

kind regards,
maxprox

LnxBil · Mar 22, 2016

I did it last week and it works flawless. VERY fast ZFS with ONE disk.

It is "hacked" in, because you need to start it manually before ZFS starts. There is IMHO no automatic configuration available in Debian Jessie hat this moment, so it is of course not supported by Proxmox - but If you're familiar with Debian, it's very easy to implement.

LnxBil · Mar 22, 2016

BTW: my next step is to use hardware-accelerated encryption to build a fully-encrypted ZFS with DM-Cache on SSD.

maxprox · Mar 22, 2016

LnxBil said:
BTW: my next step is to use hardware-accelerated encryption to build a fully-encrypted ZFS with DM-Cache on SSD.

it's in german: http://falkhusemann.de/blog/2014/01/proxmox-ve-mit-software-raid-und-full-disk-encryption/

At first I will work with ZFS, but now I have get a motherboard with LSI hardware RAID onboard. Therefore, I will realize the Raid10 via hardware controller.
Is it possible - after a clean installation of Proxmox (without ZFS) - then add two SSDs as dm-cache?
Similar to the following instructions ?
https://rwmj.wordpress.com/2014/05/22/using-lvms-new-cache-feature/

LnxBil · Mar 22, 2016

maxprox said:
it's in german: http://falkhusemann.de/blog/2014/01/proxmox-ve-mit-software-raid-und-full-disk-encryption/

Thank you for the link, yet i'm familiar with every kind of encryption in Linux, yet I have not tried ZFS-on-crypted-DM-Cached-on-crypted-disk. Without having read the article, It is possible to do the setup of the provided link with the stock Debian installer since at least Debian Wheezy. So no news to me.

maxprox said:
At first I will work with ZFS, but now I have get a motherboard with LSI hardware RAID onboard. Therefore, I will realize the Raid10 via hardware controller.
Is it possible - after a clean installation of Proxmox (without ZFS) - then add two SSDs as dm-cache?
Similar to the following instructions ?
https://rwmj.wordpress.com/2014/05/22/using-lvms-new-cache-feature/

Short answer: of course! Everything is possible in Linux, most of it only without a reboot :-D

Longer answer:
If you only install on a very small volume and let the rest untouched, you can then create a dm-cached device on top of LVM.

It will be easier (and performance-speaking: faster) if you install Debian Jessie as you like with one partition (4 GB) for Proxmox, one for swap with e.g. 16 GB and one partition for the rest. This rest is then used for creating the dm-cache afterwards. Then install Proxmox on top of Jessie and do the rest. Do it only if you're familiar with the topic and do try-runs in Proxmox as KVM-VM itself with snapshots along the way to redo if you're unsure how to work with this things.

maxprox · Mar 23, 2016

okay, thanks for the answer,
I have enough time and will try and test it this week

maxprox · Mar 28, 2016

LVM cache / dm-cache
My HowTo
With the hardware described in this post:
https://forum.proxmox.com/threads/zfs-or-hardware-raid-new-hardware-setup.26586/
I decided to use the following setup.
First of all I used the onboard LSI hardware raid controller, AVAGO 3108 MegaRAID
Some tools for Debian Jessie:
1. sources.list:

Code:

 cat /etc/apt/sources.list
...
deb http://hwraid.le-vert.net/debian jessie main

2. install:
megacli, megactl which contains megasasctl
maby have a look at thomas krenn or what you like:
https://www.thomas-krenn.com/de/wiki/MegaCLI
3. use it for a short overview:

Code:

megasasctl -B -t -v

I build a raid10 based on the four 2TB SAS HDDs.
Then I installed Proxmox as usual without ZFS (maxroot = 23 GB; maxswap = 4 GB, ext4).

Code:

=> lsblk
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda            8:0    0   3.7T  0 disk
├─sda1         8:1    0  1007K  0 part
├─sda2         8:2    0   127M  0 part
└─sda3         8:3    0   3.7T  0 part
  ├─pve-root 252:0    0    27G  0 lvm  /
  ├─pve-swap 252:1    0     8G  0 lvm  [SWAP]
  └─pve-data 252:2    0   3.6T  0 lvm  /var/lib/vz

After that I installed the Intel Enterprise SSDs
and with the same raid controller I build a raid0.

Code:

 => parted -l
Model: AVAGO SMC3108 (scsi)
Disk /dev/sda: 4000GB
(Hardware Raid10 (4x2TB SAS))
.....
Disk /dev/sdb: 199GB
(Hardware Raid0 (2x100SSD))

According to the redhat admin doku you can use a physical volume, like a device as my raid0, without
a partition table and without a partition to use it for lvm2.
"If you are using a whole disk device for your physical volume, the disk must have no partition table."
"You can remove an existing partition table by zeroing the first sector with the following command:"

Code:

# dd if=/dev/zero of=PhysicalVolume bs=512 count=1

Setup /dev/sdb as a physical volume:

Code:

# pvcreate /dev/sdb

# lvmdiskscan
  ...
  /dev/sdb  [  185.31 GiB] LVM physical volume
  ...
  1 LVM physical volume whole disk

As you know the name of the proxmox VG (volume group) is "pve".
Very important for using dm-cache is, both logical volumes for data
and for the cache have to be in the same volume group ("pve").
For that reason the existing volume group has to be extended
with the new cache device.

Code:

# vgscan
"Found volume group "pve" using metadata type lvm2"

# vgextend pve /dev/sdb
" Volume group "pve" successfully extended"

You can controll it with "vgdisplay"

Code:

before:
# vgdisplay

VG Name  pve
  Metadata Areas  1
  Metadata Sequence No  4
  VG Access  read/write
  VG Status  resizable
  MAX LV  0
  Cur LV  3
  Open LV  3
  Max PV  0
  Cur PV  1
  Act PV  1
  VG Size  3.64 TiB
  PE Size  4.00 MiB
  Total PE  953567
  Alloc PE / Size  949472 / 3.62 TiB
  Free  PE / Size  4095 / 16.00 GiB
  VG UUID  QIzoZv-EoMX-ZWvR-LRj0-Eofo-o68H-i0vjMz

afterwards:
# vgdisplay
VG Name  pve
...
  Metadata Areas  2
  Metadata Sequence No  5
...
  Cur PV  2
  Act PV  2
  VG Size  3.82 TiB
  PE Size  4.00 MiB
  Total PE  1001006
  Alloc PE / Size  949472 / 3.62 TiB
  Free  PE / Size  51534 / 201.30 GiB
  VG UUID  QIzoZv-EoMX-ZWvR-LRj0-Eofo-o68H-i0vjMz

Now we produce the important cache LV. There are two different cache LVs:
A - data LV, named CacheDataLV in my setup
B - the cache metadata LV, named CacheMetaLV in my setup
have a look at "man lvmcache"

My PV (2 x 100GB SSDs) has a size = 185 GB, I will use aboud 0,5 GB / 512 MB as CacheMetaLV
and 160 GB for CacheDataLV. Nowhere I found an information that you have to calculate
the exact values, therefor I used estimated values.

Code:

# lvcreate -n CacheDataLV -L CacheSize VG FastPVs
and
# lvcreate -n CacheMetaLV -L MetaSize VG FastPVs

For me:

Code:

# lvcreate -n CacheDataLV -L 160G pve /dev/sdb
" Logical volume "CacheDataLV" created."
# lvcreate -n CacheMetaLV -L 0.5G pve /dev/sdb
"Logical volume "CacheMetaLV" created."

The important step we need to do is to "engage" the data cache LV
and metadata cache LV in a single LV called cache pool",
a logical volume of type cache-pool.

Code:

# lvconvert --type cache-pool --cachemode writethrough --poolmetadata VG/lv_cache_meta VG/lv_cache

For me:

Code:

# lvconvert --type cache-pool --cachemode writethrough --poolmetadata pve/CacheMetaLV pve/CacheDataLV
"  WARNING: Converting logical volume pve/CacheDataLV and pve/CacheMetaLV to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Do you really want to convert pve/CacheDataLV and pve/CacheMetaLV? [y/n]: y
  Converted pve/CacheDataLV to cache pool."

with the following command you can see the result:

Code:

# lvs -a -o +devices
  LV  VG  Attr  LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices
  CacheDataLV  pve  Cwi---C--- 160.00g  CacheDataLV_cdata(0)
  [CacheDataLV_cdata] pve  Cwi------- 160.00g  /dev/sdb(0)
  [CacheDataLV_cmeta] pve  ewi------- 512.00m  /dev/sdb(40960)
  data  pve  -wi-ao----  3.59t  /dev/sda3(8960)
  [lvol0_pmspare]  pve  ewi------- 512.00m  /dev/sda3(949472)
  root  pve  -wi-ao----  27.00g  /dev/sda3(2048)
  swap  pve  -wi-ao----  8.00g  /dev/sda3(0)

As you can see there is a renaming (_cdata; _cmeta) as descript in the redhat documentation
Befor the conversion the result of above command was:

Code:

 ...
  CacheDataLV  pve  -wi-a----- 160.00g
  CacheMetaLV  pve  -wi-a----- 512.00m
  data  pve  -wi-ao----  3.59t
  ...

have also a look at the attribute. (yes, "C" is for Cached or not ;-)

The last step is the allocation of the cache pool to the meaning data LV
(named "data" in proxmox)
Create the cache logical volume by combining the cache pool logical volume with the origin "data" logical volume.

Code:

# lvconvert --type cache --cachepool VG/lv_cache VG/lv

For me:

Code:

# lvconvert --type cache --cachepool pve/CacheDataLV pve/data
" Logical volume pve/data is now cached."

And with that, we are done. We can now continue using the pve logical
volume, but from now on as a cached volume using the cache space on the
SSD.
Now you can see the successfully cached proxmox LV "data"

Code:

# lvs -a -o +devices
  LV  VG  Attr  LSize  Pool  Origin  Data%  Meta%  Move Log Cpy%Sync Convert Devices
  [CacheDataLV]  pve  Cwi---C--- 160.00g  0.00  3.97  100.00  CacheDataLV_cdata(0)
  [CacheDataLV_cdata] pve  Cwi-ao---- 160.00g  /dev/sdb(0)
  [CacheDataLV_cmeta] pve  ewi-ao---- 512.00m  /dev/sdb(40960)
  data  pve  Cwi-aoC---  3.59t [CacheDataLV] [data_corig] 0.00  3.97  100.00  data_corig(0)
  [data_corig]  pve  owi-aoC---  3.59t  /dev/sda3(8960)
  [lvol0_pmspare]  pve  ewi------- 512.00m  /dev/sda3(949472)
  root  pve  -wi-ao----  27.00g  /dev/sda3(2048)
  swap  pve  -wi-ao----  8.00g  /dev/sda3(0)

main sources:
https://access.redhat.com/documenta...Administration/lvm_cache_volume_creation.html
and:
http://blog-vpodzime.rhcloud.com/?p=45
and the manpages, primarily man lvmcache

Now I have to test it.
For objections and suggestions I am always grateful

best regards,
maxprox

maxprox · Mar 31, 2016

First fio test results are posted here
https://forum.proxmox.com/threads/zfs-or-hardware-raid-new-hardware-setup.26586/#post-134331

VolkerK · Dec 11, 2016

Hi Maxprox,

thanks for the detailed documentation!
I'm new to Proxmox and I'm thinking of replacing our current homegrown KVM/Libvirt-solution during a necessary hardware replacement by Proxmox. The hew hardware will have an NVMw SSD for caching purposes.

I could install the dm-cache the way you described. Technically, it's possible.
But since we'd like to buy a support subscription, which is only available for bare metal proxmox, the question that I have is:

Will break your recipe my support?

Regards
Volker

maxprox · Dec 11, 2016

Important to know is the reason for this setup:
I get a server with two enterprise SSDs and a good hardware raid controller....
If I can decide the server and it's hardware "Keep it simple and stupid" I work mostly without a hardware raid controller, I prefer a Server with lot of RAM and then I would create a ZFS Raid 10 with or without SSD ...
Regards,
maxprox

Magneto · Aug 17, 2017

maxprox said:
LVM cache / dm-cache
My HowTo
With the hardware described in this post:
https://forum.proxmox.com/threads/zfs-or-hardware-raid-new-hardware-setup.26586/
I decided to use the following setup.
First of all I used the onboard LSI hardware raid controller, AVAGO 3108 MegaRAID
Some tools for Debian Jessie:
1. sources.list:

Code:

cat /etc/apt/sources.list ... deb http://hwraid.le-vert.net/debian jessie main

2. install:
megacli, megactl which contains megasasctl
maby have a look at thomas krenn or what you like:
https://www.thomas-krenn.com/de/wiki/MegaCLI
3. use it for a short overview:

Code:

megasasctl -B -t -v

I build a raid10 based on the four 2TB SAS HDDs.
Then I installed Proxmox as usual without ZFS (maxroot = 23 GB; maxswap = 4 GB, ext4).

Code:

=> lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 3.7T 0 disk ├─sda1 8:1 0 1007K 0 part ├─sda2 8:2 0 127M 0 part └─sda3 8:3 0 3.7T 0 part ├─pve-root 252:0 0 27G 0 lvm / ├─pve-swap 252:1 0 8G 0 lvm [SWAP] └─pve-data 252:2 0 3.6T 0 lvm /var/lib/vz

After that I installed the Intel Enterprise SSDs
and with the same raid controller I build a raid0.

Code:

=> parted -l Model: AVAGO SMC3108 (scsi) Disk /dev/sda: 4000GB (Hardware Raid10 (4x2TB SAS)) ..... Disk /dev/sdb: 199GB (Hardware Raid0 (2x100SSD))

According to the redhat admin doku you can use a physical volume, like a device as my raid0, without
a partition table and without a partition to use it for lvm2.
"If you are using a whole disk device for your physical volume, the disk must have no partition table."
"You can remove an existing partition table by zeroing the first sector with the following command:"

Code:

# dd if=/dev/zero of=PhysicalVolume bs=512 count=1

Setup /dev/sdb as a physical volume:

Code:

# pvcreate /dev/sdb # lvmdiskscan ... /dev/sdb [ 185.31 GiB] LVM physical volume ... 1 LVM physical volume whole disk

As you know the name of the proxmox VG (volume group) is "pve".
Very important for using dm-cache is, both logical volumes for data
and for the cache have to be in the same volume group ("pve").
For that reason the existing volume group has to be extended
with the new cache device.

Code:

# vgscan "Found volume group "pve" using metadata type lvm2" # vgextend pve /dev/sdb " Volume group "pve" successfully extended"

You can controll it with "vgdisplay"

Code:

before: # vgdisplay VG Name pve Metadata Areas 1 Metadata Sequence No 4 VG Access read/write VG Status resizable MAX LV 0 Cur LV 3 Open LV 3 Max PV 0 Cur PV 1 Act PV 1 VG Size 3.64 TiB PE Size 4.00 MiB Total PE 953567 Alloc PE / Size 949472 / 3.62 TiB Free PE / Size 4095 / 16.00 GiB VG UUID QIzoZv-EoMX-ZWvR-LRj0-Eofo-o68H-i0vjMz afterwards: # vgdisplay VG Name pve ... Metadata Areas 2 Metadata Sequence No 5 ... Cur PV 2 Act PV 2 VG Size 3.82 TiB PE Size 4.00 MiB Total PE 1001006 Alloc PE / Size 949472 / 3.62 TiB Free PE / Size 51534 / 201.30 GiB VG UUID QIzoZv-EoMX-ZWvR-LRj0-Eofo-o68H-i0vjMz

Now we produce the important cache LV. There are two different cache LVs:
A - data LV, named CacheDataLV in my setup
B - the cache metadata LV, named CacheMetaLV in my setup
have a look at "man lvmcache"

My PV (2 x 100GB SSDs) has a size = 185 GB, I will use aboud 0,5 GB / 512 MB as CacheMetaLV
and 160 GB for CacheDataLV. Nowhere I found an information that you have to calculate
the exact values, therefor I used estimated values.

Code:

# lvcreate -n CacheDataLV -L CacheSize VG FastPVs and # lvcreate -n CacheMetaLV -L MetaSize VG FastPVs

For me:

Code:

# lvcreate -n CacheDataLV -L 160G pve /dev/sdb " Logical volume "CacheDataLV" created." # lvcreate -n CacheMetaLV -L 0.5G pve /dev/sdb "Logical volume "CacheMetaLV" created."

The important step we need to do is to "engage" the data cache LV
and metadata cache LV in a single LV called cache pool",
a logical volume of type cache-pool.

Code:

# lvconvert --type cache-pool --cachemode writethrough --poolmetadata VG/lv_cache_meta VG/lv_cache

For me:

Code:

# lvconvert --type cache-pool --cachemode writethrough --poolmetadata pve/CacheMetaLV pve/CacheDataLV " WARNING: Converting logical volume pve/CacheDataLV and pve/CacheMetaLV to pool's data and metadata volumes. THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) Do you really want to convert pve/CacheDataLV and pve/CacheMetaLV? [y/n]: y Converted pve/CacheDataLV to cache pool."

with the following command you can see the result:

Code:

# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices CacheDataLV pve Cwi---C--- 160.00g CacheDataLV_cdata(0) [CacheDataLV_cdata] pve Cwi------- 160.00g /dev/sdb(0) [CacheDataLV_cmeta] pve ewi------- 512.00m /dev/sdb(40960) data pve -wi-ao---- 3.59t /dev/sda3(8960) [lvol0_pmspare] pve ewi------- 512.00m /dev/sda3(949472) root pve -wi-ao---- 27.00g /dev/sda3(2048) swap pve -wi-ao---- 8.00g /dev/sda3(0)

As you can see there is a renaming (_cdata; _cmeta) as descript in the redhat documentation
Befor the conversion the result of above command was:

Code:

... CacheDataLV pve -wi-a----- 160.00g CacheMetaLV pve -wi-a----- 512.00m data pve -wi-ao---- 3.59t ...

have also a look at the attribute. (yes, "C" is for Cached or not ;-)

The last step is the allocation of the cache pool to the meaning data LV
(named "data" in proxmox)
Create the cache logical volume by combining the cache pool logical volume with the origin "data" logical volume.

Code:

# lvconvert --type cache --cachepool VG/lv_cache VG/lv

For me:

Code:

# lvconvert --type cache --cachepool pve/CacheDataLV pve/data " Logical volume pve/data is now cached."

And with that, we are done. We can now continue using the pve logical
volume, but from now on as a cached volume using the cache space on the
SSD.
Now you can see the successfully cached proxmox LV "data"

Code:

# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [CacheDataLV] pve Cwi---C--- 160.00g 0.00 3.97 100.00 CacheDataLV_cdata(0) [CacheDataLV_cdata] pve Cwi-ao---- 160.00g /dev/sdb(0) [CacheDataLV_cmeta] pve ewi-ao---- 512.00m /dev/sdb(40960) data pve Cwi-aoC--- 3.59t [CacheDataLV] [data_corig] 0.00 3.97 100.00 data_corig(0) [data_corig] pve owi-aoC--- 3.59t /dev/sda3(8960) [lvol0_pmspare] pve ewi------- 512.00m /dev/sda3(949472) root pve -wi-ao---- 27.00g /dev/sda3(2048) swap pve -wi-ao---- 8.00g /dev/sda3(0)

main sources:
https://access.redhat.com/documenta...Administration/lvm_cache_volume_creation.html
and:
http://blog-vpodzime.rhcloud.com/?p=45
and the manpages, primarily man lvmcache

Now I have to test it.
For objections and suggestions I am always grateful

best regards,
maxprox

How well is this working, more than a year later?

Can we use lvm cache in Proxmox 4.x?

Renowned Member

Distinguished Member

Proxmox Staff Member

Distinguished Member

Proxmox Staff Member

Renowned Member

Distinguished Member

Well-Known Member

Proxmox Staff Member

Renowned Member

Distinguished Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Renowned Member

New Member

Renowned Member

Well-Known Member

We value your privacy