FS over LVM over DRBD?

achekalin

Member
Jun 17, 2011
74
0
6
Hello,

I use Proxmox in cluster configuration and use DRBD (poor man's shared storage) to use live migration feature (I use KVM).

In fact, the VM images are stored on LVM VG, not on "real" file system, so I just can not create disk image of type different from "raw". Lack of filesystem is something I don't like, these raw VM images are big, are hard to backup from CLI - so I decided to use filesystem over VG over DRBD.

I've just set up that scheme, and it works fine (why shouldn't it?) but what I doubt is that VG over DRBD scheme is something that recommended by Proxmox wiki, so may it be that I missed something about using filesystem on top of VG?

By the way, I need to deploy VM which image is .vmdk file (VMWare). I believe .qcow2 format is better and also more native for KVM. Should I convert vmdk to qcow2, or use it as is?
 
Hello,

I use Proxmox in cluster configuration and use DRBD (poor man's shared storage) to use live migration feature (I use KVM).
Hi,
if you do DRBD with the right speed (10GB-Ethernet, Infiniband, dolphin-NICS)it's has nothing to do with poor man shared storage... (this is iscsi much more).
In fact, the VM images are stored on LVM VG, not on "real" file system, so I just can not create disk image of type different from "raw". Lack of filesystem is something I don't like, these raw VM images are big, are hard to backup from CLI - so I decided to use filesystem over VG over DRBD.
Oops,
with this configuration you lost most of the advantage of this solution. And only safe, if you use an clusterfilesystem. With an normal filesystem it's extreme dangerous for all of you data!
I've just set up that scheme, and it works fine (why shouldn't it?) but what I doubt is that VG over DRBD scheme is something that recommended by Proxmox wiki, so may it be that I missed something about using filesystem on top of VG?
see above. You only need a filesystem to easy see the disk-images? lvm has pretty tools for that (lvdisplay and so on)
By the way, I need to deploy VM which image is .vmdk file (VMWare). I believe .qcow2 format is better and also more native for KVM. Should I convert vmdk to qcow2, or use it as is?
The fastest (and best) is to use logical volumes on a lvm-storage - this use the VM disk as raw-device in the logical volume. No overhead and pure data (can handle with dd).
The next best is raw on a filesystem - you have one more layer between real disk and VM.
qcow2 adds one layer more - but has also some efforts (snapshots).
vmdk-files makes only sense if you want to be able to move fast back to vmware - but this can also done with all other format (one converting).

Udo
 
Last edited:
Hi,
Oops,
with this configuration you lost most of the advantage of this solution. And only safe, if you use an clusterfilesystem. With an normal filesystem it's extreme dangerous for all of you data!
Udo

...i'd say 'it's also safe in case it is only mounted on one node at a time'... ;-)

in fact, we're running something like that, but not for KVM (we like the LV devices) - but each node gets a XFS fs on a LV (on DRBD) to also have OpenVZ containers with an easy fail-over mechanism. Runs quite well so far (ok, for XFS some patches were needed to shift to XFS quotas)...
 
Thank you guys for your replies to my question, now I got a lot of things to think of.

First, I need to say I use really 'poor man' PCs as cluster hosts, the reason is that before invest in cluster scheme we at company where I work decided to try to build clusted out of very simply PCs (for test). So I got two systems each of AMD Phenom II X4 + 8 Gb of RAM + 500 Gb SATA drive for OS and 1 Tb SATA drive for VM images. Each PC have 1 Gb NIC at motherboard and another cheap D-Link 1 Gb NIC at PCI slot.

I put 1 Tb disks in DRBD, using link between D-Link's NICs as sync link.

CPU is not a prolem at all, Phenoms are very good as per speed/cost ratio, RAM is not a problem also (I basically run 3-4 FreeBSD- and Linux-based VMs, all with with KVM). But what makes me sad is DRBD speed. I understand I do not use any BBU-backed RAID arrays, but even at my test system I expected at least much more sync speed (currently when I copy anything to VM disk from outside, say when I do scp to VM, I see write speed of 4-5 Mb, which is too slow for any serious usage).

As for now, it looks pretty unreasonable to buy at least two branded servers (say, HP DL 180) equipped with good RAIDs (HP 410i w/BBPU + 512 Mb of cache), just to check it DRBD will run any faster ($4k per server x2 = $8k - too much for simple "let's test it"). I also doubt if 10G NIC will help me much, as 5 Mb/sec not at all looks like 1G of NIC link speed.

This was the reason I think of if I need any filesystem over LVM layer (may there be any better results with FS as FS will cache something at its level).

What then I missed? I like Proxmox, but speed of cluster sync is too low yet...
 
DRBD can run much faster, see my setup below:

I have quite a few two node Proxmox clusters using DRBD.
We used a good ASUS Desktop board, Phenom X6 CPU, ECC RAM (16GB), I highly recommend ECC, saved me from a few crashes already!

For storage we put an Areca 1880ix12 with 4GB cache RAM and BBU with a total of twelve 250GB Raid edition WD disks.
All of this is stuffed into a generic rackmount case with generic hot swap bays, I hate proprietary HP/Dell/supermicro stuff, makes it hard to incrementally upgrade things in the future.
We used a dual port Intel card for DRBD replication, bonded those two ports.
The onboard lan connects us to the network, on a few servers I installed a single port Intel card and disabled the onboard, the Intel card performed much better than the onboard.

From within a single VM to a single disk I have seen up to 80MB/sec.
If a VM has multiple disks, I can write to each disk at the same time at about 60MB/sec for each additional disk.
Not sure why I can not get the whole speed writing to a single disk but I think it has to do with something in KVM.
It maxes out at about 210MB/sec and can sustain that speed for as long as you want to keep writing.
I am quite confident that if I had a 10Gb interconnect it would go much faster.

I use virtio with cache=none for all my guests windows or linux, those two changes helped drastically.

You can get away with a good desktop board and CPU but fast disk IO costs money.

I see one of your other concerns is making images of the large LVM volumes.
In windows guests I use Sdelete to zero free space, in linux I use dd to write zeros into a file to do the same.
When you backup using compression all that free space compresses very well.
 
Oh, that's impressive! But wait, I can't get an idea: my disks are at most idle, my CPU is low loaded too, but DRBD won't use any of these resources. So why adding some Gigs of RAID cache and dozen of disks given you that much difference?

I mean, my host machines are looks like they capable to do a lot of things beside what they do now, and I see no problem for DRBD to use that free resources. Of course, good RAID and BBU is good, but SATA itself isn't too slow interface, anyway, and I don't see even close speed to what SATA can show.
 
Oh, that's impressive! But wait, I can't get an idea: my disks are at most idle, my CPU is low loaded too, but DRBD won't use any of these resources. So why adding some Gigs of RAID cache and dozen of disks given you that much difference?

I mean, my host machines are looks like they capable to do a lot of things beside what they do now, and I see no problem for DRBD to use that free resources. Of course, good RAID and BBU is good, but SATA itself isn't too slow interface, anyway, and I don't see even close speed to what SATA can show.

obviously there are two possible sources for your problem: Either DRBD is not capable, or your setup has some issues. Since there are quite a few out there who are really confident with DRBD, i'd start with the 2nd choice... ;-)
Perhaps you should start with some basic checks like pveperf (basic I/O capacity of each node) or iperf (basic I/O capacity of replication network between nodes)...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!