proxmox cluster with DRBD

frater · Mar 19, 2011

My goal is to create a cluster of 2 cheap servers in raid 1 with DRBD to replace a high-end server with SCSI-raid and expensive reliable components.
I have a feeling you can get more reliability this way for even less money.

I assembled 2 core i5 machines (8 GB RAM) with each a 150 GB raptor and a 1 TB seagate SATA disk. Creating the cluster was quite easy and I'm currently waiting for 2 extra ethernet adapters to put those 2 disks in RAID 1 with DRBD.

I still don't know what I can expect from proxmox when this cluster of 2 proxmox-nodes is working in combination with DRBD. Whilst waiting for these 2 ethernet adapters I have a couple of question of which I hope to get an answer.

Will it be possible to get full redundancy when 1 of the nodes fails? Is this redundancy seemless and transparant? If not, how much downtime will I get and do I need to manually intervene?

I'm a bit overwhelmed by these different techniques and still have to find out how to combine these (proxmox, heartbeat, iSCSI and DRBD) and how they interact.
Should I create 1 big LVM-partition on each system and then make 1 DRBD-blockdevice of it or is it wiser to create separate blockdevices?

udo · Mar 20, 2011

frater said:
My goal is to create a cluster of 2 cheap servers in raid 1 with DRBD to replace a high-end server with SCSI-raid and expensive reliable components.
I have a feeling you can get more reliability this way for even less money.

I assembled 2 core i5 machines (8 GB RAM) with each a 150 GB raptor and a 1 TB seagate SATA disk. Creating the cluster was quite easy and I'm currently waiting for 2 extra ethernet adapters to put those 2 disks in RAID 1 with DRBD.

I still don't know what I can expect from proxmox when this cluster of 2 proxmox-nodes is working in combination with DRBD. Whilst waiting for these 2 ethernet adapters I have a couple of question of which I hope to get an answer.

Will it be possible to get full redundancy when 1 of the nodes fails? Is this redundancy seemless and transparant? If not, how much downtime will I get and do I need to manually intervene?

hi,
with pve 1.x you must do handwork - eg. for kvm-guests, backup the config on the other node and restore them (if one node is loss) to /etc/qemu-server and start the vm again. pve 2.x bring ha.

I'm a bit overwhelmed by these different techniques and still have to find out how to combine these (proxmox, heartbeat, iSCSI and DRBD) and how they interact.
Should I create 1 big LVM-partition on each system and then make 1 DRBD-blockdevice of it or is it wiser to create separate blockdevices?

i preferr two drbd-devices - one for each node. If one node had an issue, the other node use both devices. Because you can easily get an split-brain situation, and if you have on one device open VMs from both nodes, you can`t sync. Then you must first save all vms from one side, resync and restore... nothing for production.
Take a look at the speed (with 10GB it`s ok).

Udo

frater · Mar 24, 2011

If I have these 2 disks in sync I assume I would need to run NFS on top of that and use the common IP to access it. Will this work?
If I have vm1 on node1 and vm2 on node2 and node1 breaks down, will vm2 continue to run?

I still don't understand what proxmox needs as shared storage to be able to do an online migration. Is it typically an NFS-share?

udo · Mar 24, 2011

frater said:
If I have these 2 disks in sync I assume I would need to run NFS on top of that and use the common IP to access it. Will this work?

????
Perhaps - but this is definitily not the recommend way!
Why do you want to use nfs ontop of drbd? If you want to share from two nodes the files, you need also an cluster-fs... overkill...

If I have vm1 on node1 and vm2 on node2 and node1 breaks down, will vm2 continue to run?

yes

I still don't understand what proxmox needs as shared storage to be able to do an online migration. Is it typically an NFS-share?

nfs is one possibility. You need storage which is accessible from all nodes. The easy way is nfs (but i don't know how it's performs).
Then you can use lvm-storage on FC-San, iScsi-San or DRBD.

Will say, you create (i prefer two - one for each server) an drbd-device and ontop of this drbd-device is an lvm volume group. Both nodes can see the logical volumes of the vg.

Read the wiki-entry about drbd and play a little bit with this and you will see.

Udo

frater · Mar 24, 2011

When I have access again to that machine and experimenting a bit and following some wiki's I will probably understand it better.
I think I'm expecting things that are only available when 2.0 arrives. Maybe 2.0 isn't even possible the things I would like it to do. I assume I will get flamed if I ask when the first beta will be made available. I read the roadmap which says Q2, but that starts next week (with a window of 3 months) ;-)

Does Xen have features comparable to the upcoming 2.0 version or is 2.0 a quantum leap ahead?

As said I want to make a cluster solution with cheap components that will continue to work if 1 of the nodes completely fails. This machine can then be completely replaced. RAID has the 'I' in it which means inexpensive, but most of the times expensive reliable components are used.

udo · Mar 24, 2011

frater said:
When I have access again to that machine and experimenting a bit and following some wiki's I will probably understand it better.
I think I'm expecting things that are only available when 2.0 arrives. Maybe 2.0 isn't even possible the things I would like it to do. I assume I will get flamed if I ask when the first beta will be made available. I read the roadmap which says Q2, but that starts next week (with a window of 3 months) ;-)

Does Xen have features comparable to the upcoming 2.0 version or is 2.0 a quantum leap ahead?

As said I want to make a cluster solution with cheap components that will continue to work if 1 of the nodes completely fails. This machine can then be completely replaced. RAID has the 'I' in it which means inexpensive, but most of the times expensive reliable components are used.

Hi,
the I in RAID is since a long time independent and not inexpensive (this is used in "the early days"). See http://en.wikipedia.org/wiki/Raid
It's also a question what do you compare. If you build your own equipement you can get very powerfull IO-Systems for much less than a system like NettApp, HDS, EMC...
But this is not very cheap. If you use cheap things (disks, raidcontroller) you will not get an powerfull raid (i'm absolutly sure about this).
In german we say this is the different between convenient and cheap (i don't know if this make sense in english).

I can't say anything abaut xen - i don't know it's really. And yes if you need real ha you must wait for pve2.0. Many people here waiting impatient for the beta...
But in the meantime you can simply use shared storage and save the config-files even to the other nodes. So you are able to move the vm-configs to /etc/qemu-server and start the vm. The downtime is not fine but not very long (if you know about the failure and have access to the server).

Udo

frater · Mar 25, 2011

Thanks for all the info.
I will refrain from further questions until I have some more hands-on experience so I can ask less stupid questions

udo said:
Hi,
the I in RAID is since a long time independent and not inexpensive (this is used in "the early days"). See http://en.wikipedia.org/wiki/Raid

So you follow marketing people?

:

http://en.wikipedia.org/wiki/Raid said:
Marketers representing industry RAID manufacturers later attempted to reinvent the term to describe a redundant array of independent disks

Inexpensive comes back in other definitions like this one http://en.wikipedia.org/wiki/Redundant_Array_of_Inexpensive_Nodes

udo said:
It's also a question what do you compare. If you build your own equipement you can get very powerfull IO-Systems for much less than a system like NettApp, HDS, EMC...
But this is not very cheap. If you use cheap things (disks, raidcontroller) you will not get an powerfull raid (i'm absolutly sure about this).
In german we say this is the different between convenient and cheap (i don't know if this make sense in english).

I just believe more in redundancy than quality. The magic is in choosing the right balance and efficient monitoring.

I do have a feeling I should separate the DRBD-cluster from the Proxmox-cluster in a production environment.
Is it currently (1.7) even possible to create a proxmox-cluster & DRBD-cluster with only 2 machines? The DRBD-raid is still only a block device and therefore only available as exclusive storage. How is it typically turned into shared storage so I can do an on-line migration...?
BTW. if you think I should do more homework first and/or work with it some more, I will understand this and I appreciate all the time you took for answering my questions.

udo · Mar 25, 2011

frater said:
Thanks for all the info.
I will refrain from further questions until I have some more hands-on experience so I can ask less stupid questions

So you follow marketing people? :

no! certainly not

Inexpensive comes back in other definitions like this one http://en.wikipedia.org/wiki/Redundant_Array_of_Inexpensive_Nodes
I just believe more in redundancy than quality. The magic is in choosing the right balance and efficient monitoring.

right, with such things you will get "safe" data. But if you want get more then 300-400MB/s i think you can't reach this with such solution.

I do have a feeling I should separate the DRBD-cluster from the Proxmox-cluster in a production environment.
Is it currently (1.7) even possible to create a proxmox-cluster & DRBD-cluster with only 2 machines? The DRBD-raid is still only a block device and therefore only available as exclusive storage. How is it typically turned into shared storage so I can do an on-line migration...?

This is the advantage to use drbd+pve on two host which build on cluster.
You have:
block-device - drbd - (lvm) volume-group - many logical volumes
on both nodes. The lvm-storage is marked as shared, because both nodes can reach the content.
If one vm migrate the lv is deactivation on one node and activation on the other node. Both sides are in sync trough drbd.
Like i wrote before: i suggest two drbd-volumes, one for each server (primarily) to avoid trouble if the drdb aren't in sync and you have lv open from each side (then you can't easily resync).

BTW. if you think I should do more homework first and/or work with it some more, I will understand this and I appreciate all the time you took for answering my questions.

yes, it will help if you just start to play with two nodes...

Udo

e100 · Mar 28, 2011

You have:
block-device - drbd - (lvm) volume-group - many logical volumes
on both nodes. The lvm-storage is marked as shared, because both nodes can reach the content.
If one vm migrate the lv is deactivation on one node and activation on the other node. Both sides are in sync trough drbd.
Like i wrote before: i suggest two drbd-volumes, one for each server (primarily) to avoid trouble if the drdb aren't in sync and you have lv open from each side (then you can't easily resync).

After having lived through a split-brain with pve I would highly recommend having two DRBD volumes.
I know I am glad I followed that suggestion!

When I setup my machines I pretty much followed the how-to in the wiki including following the advice of having two DRBD volumes.
I have a total of five two node clusters using DRBD for replication in production, oldest one has been operational for about a year now.

Until sheepdog or similar is supported pve + DRBD(with two volumes) works great.

Search

Search

proxmox cluster with DRBD

frater

New Member

udo

Distinguished Member

frater

New Member

udo

Distinguished Member

frater

New Member

udo

Distinguished Member

frater

New Member

udo

Distinguished Member

e100

Renowned Member