Why a Shared or Cluster Filesystem is better than LVM for DRBD

tcit · Jul 28, 2012

I would like to list some cases where I feel files are better than LVMs for virtual machines and why.

I have seen numerous posts on this forum about the use of clustered filesystems (GFS, GFS2, OC2FS, etc.) either standalone or on top of DRBD. For all of them the answer is basically that LVM + DRBD is the "best" option. I fully understand the reasons (some of which are not technical) for this approach by the Proxmox team and do not disagree with many.

However, for many uses, LVM is vastly inferior to the use of files for VMs. Here are some significant reasons why

No sparse files / holes! This is the absolute biggest issue with LVM. It is the base for most of the other issues I will list below (virtual disks are often 50% or less in use. LVM requires full allocation of the space. This doesn't seem too bad until you consider the items below)
Every backup operation will spend most of its time on empty space. On larger KVMs, this can be HOURS multiplied by the number of machines. In some cases, it is the difference of being able to finish all backups in one night or not.
Every restore operation will spend most of its time on wasted space. For disaster recovery, hours can be critical. Imagine a machine with only 8GB in use but you have to wait for 80GB to transfer. That's 10x too long!
All this extra time copying empty disk space means twice the wear and tear on your disks. This means more cost replacing hardware, and more time wasted maintaining it, more heat, more electricity, etc..
Flexibility to use other VM types without conversion and with only 1 operation instead of 2. An example would be using the VMWare conversion tool in one step to deposit a VMDK straight to the Proxmox where it can be mounted immediately, instead of going through a second lengthy operation. There are many cases where qemu-img or dd + registry patching simply do not succeed but third party tools that produce VMDK or other files do. Again, this flexibilty can be a life-saver and a time-saver (which is also a cost saver).
Flexible backup options. With files you can simply sparse copy a .raw file (e.g. take an extra DRBD node offline temporarily like a "snapshot") and put it on a volume usable by proxmox (e.g. /var/lib/vz/dump). This has 2 huge benefits. 1) In a disaster, you can simply move the file to the proper "images" folder (which takes less than 1 second) and fire it up! 2) The backup can also be used to restore individual files by loop mounting the raw file directly. This is not possible with the Proxmox backups because they are in a .tar and can require hours to restore for larger disks, not seconds. Having to untar also means you need ~2x the space.
Complexity. Let's be honest....LVM groups, volumes, extents, etc. can be every bit as complicated to deal with as a filesystem.
LVM snapshots are known to be slow. This only adds to all the problems relating to sparsity.

In summary, most of the issues have to deal with efficiency and time. LVM is very inefficient and slow in many regards.

I hope the Proxmox team will reconsider the idea of having at least one possible filesystem that you could put on top of DRBD instead of LVM. It would really open up some options with huge implications. I realize there are pros/cons to anything, but for many, a clustered (or at least a concurrent rw) filesystem on top of DRBD would be a vastly superior for speed of maintenance and flexible options. If there is already a filesystem that you can easily put on top of DRBD for shared node access on Proxmox 2, please let me know. Cheers!

tom · Jul 28, 2012

I do not agree in a lot of your statements, some are just not right. Did you ever tested clustered filesystem on DRBD/Proxmox VE? if yes, provide benchmark and do not forget to take a look on the features of the storage stack regarding HA, Live-backups and migrations.

The statement: its "superior for speed of maintenance and flexible" looks like an prejudice, without real facts and testing. but if you provide examples, I will to tests here.

tcit · Jul 30, 2012

tom said:
I do not agree in a lot of your statements, some are just not right. Did you ever tested clustered filesystem on DRBD/Proxmox VE? if yes, provide benchmark and do not forget to take a look on the features of the storage stack regarding HA, Live-backups and migrations.

The statement: its "superior for speed of maintenance and flexible" looks like an prejudice, without real facts and testing. but if you provide examples, I will to tests here.

Tom, I am not referring to performance when it comes to VM execution. The conditions I refer to are:

Majority of VMs use KVM
KVM VMs use an average of 32-400GB
Filesystems of VMs are maintained with an average of around 50%, and sometimes even more free space
Business has need of flexibility to transfer VMs in and out of the cluster somewhat frequently, often from other virtualization technologies
Business desires fast disaster recovery that does not require a 2nd restore operation that requires full transfer of all the data again

I hope that qualifies the conditions I'm referring to where my ideas would be an advantage. Please keep LVM around (there's no bias here), I'm simply asking for additional options that are optimal under different conditions.

Allow me to give a little background. I have been using Proxmox on over 50 different servers since version 1.7 first came out. Some have used filesystems, some LVMs. In my own experience this has been the case, I have tested the speed of backups, disk transfers, and disaster recovery using both filesystems and LVM.

The results are what you would expect. Here's and example:

copying a sparse file with 80GB in use and a 250GB virtual disk took about 28 min. at about 49 MB/sec (only copied 80GB used space)
dumping an LVM with 80GB in use and a 250GB virtual disk took about 85 min. at about 50 MB/sec (copied the full 250GB)

I am simply saying that for some uses a shared filesystem would provide advantages that LVM would not. In other case LVM is a better choice. I would like to see both as an option if possible. Here is a very simple test case to illustrate the aspects of speed I am referring to:

1) Set up a proxmox with 1TB of LVM storage. Make 10 KVM virtual machines, with about 100GB virtual drive. Inside each machine, create about 30 - 50GB of data. This would be an accurate average for what I have seen across our 50 Proxmox servers for disk usage.
2) Set up a backup of all machines to occur overnight to an NFS share or an iSCSI target (doesn't matter too much as long as your consistent for both tests)
3) Record how long the backup took.
4) Repeat steps 1-2 but this time use a 1TB folder mount with whichever filesystem is to be tested (as long as the filesystem supports holes/sparse files, that's the goal here).
5) Record how long the backup took.

I seriously doubt any of the normal filesystems would cause the backups to be 2-4x slower. It's simple logic. The LVM will blindly copy the 50-70% free space we typically find on the majority of virtual machines. A sparse copy operation does not.

That being said, I think getting a benchmark would be great to see just how big the gap is under these conditions. Again, I realize if people are using mostly OpenVZ and not KVM, or if their virtual disks tend to be small and mostly full, there would be less of an issue. But, why not allow both, and have our cake and eat it too.

tom · Jul 30, 2012

your assumption that a clustered file system performs similar to a clustered file system is probably not true.
but try it, you can use any Debian supported file system - and report back.

e100 · Jul 31, 2012

It is wrong to assume that performance you see on a local filesystem will be the same on a cluster filesystem.

You are right sparse files are great and not doing IO to unused blocks saves time backing up and restoring.

But that is only a small aspect of the overall picture.

The performance of the VM will be much worse with clustered filesystem than DRBD and LVM.

Also, it is unlikely that you will be able to perform a snapshot backup with a cluster filesystem.

The DRBD folks have directions on setting up GFS2 in the DRBD manual.
Glusterfs is not hard to install, there are directions in this forum too.
Give GFS and gluster a try and do some benchmarks.
I am confident you will find that the hassle, reduced reliability and horrible performance will outweight any benefits a clustered filesystem might bring.

That is why most of us respond that "DRBD with LVM is the best option"

Glusterfs, CEPH and Sheepdog look promising as a replacement for DRBD + LVM, but last time I tested them they still had the horrible performance problem.

tcit · Jul 31, 2012

That's too bad; it sounds like you've already tried it with the common options. I had high hopes that there would be at least one filesystem that could fit the bill. Part of what spurred my interest was actually Lustre filesystem (see http://pve.proxmox.com/pipermail/pve-user/2011-June/002176.html). It shows that Lustre was at least used at some point with Proxmox, and another post on DRDB's site shows it was also used with Lustre (http://wiki.lustre.org/index.php/DRBD_and_Lustre). There was definitely some performance degradation, but it would be nice to just try it and see for myself and see if it's acceptable. It sounds like my concept is all fine and great, but a filesystem that can pull it off may be lacking from the Linux world right now.

I wish they had posted exactly how they compiled Lustre into the Proxmox kernel. The article at http://wiki.debian.org/Lustre references using a "make-kpkg --added-patches=lustre --initrd --added-modules=lustre binary-arch modules" command, but I'm not sure how/where to translate that into the Proxmox kernel Makefile. Any tips would be appreciated.

Also, thanks to both Tom and e100 for providing feedback.

e100 · Aug 1, 2012

Gluster is most likely going to be the easiest to setup and best performing solution for a cluster filesystem at this time.
Last time I tested glusterfs was about 8 months ago, I remember that 3.3 was supposed to have some performance improvements so it might be better now.

I think it would be worthwhile to test it, if you are interested:

Look at this thread for directions on how to install gluster:
http://forum.proxmox.com/archive/index.php/t-7355.html

Also, use the latest 3.3 version which can be found here:
http://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/5.0.3/

Search

Search

Why a Shared or Cluster Filesystem is better than LVM for DRBD

tcit

New Member

tom

Proxmox Staff Member

tcit

New Member

tom

Proxmox Staff Member

e100

Renowned Member

tcit

New Member

e100

Renowned Member

We value your privacy