Which storage model for 2-Node Cluster?

samwayne

New Member
Jun 21, 2013
16
0
1
Hello,

I have a simple non-production 2-Node Cluster setup with offline migration working fine at the moment. Each Node has two separate Hard 500GB Disks (NO RAID), the first current has Proxmox installed and the 2nd is empty at the moment. Each node will only have about 3 - 4 VMs running at the most so it will be very low traffic.

Public NIC: 100Mbp/s
Private: 1Gbp/s (The cluster uses this for communication between nodes)

I want to be able to the following:

- Live migration
- HA for VMs

So I believe I need some sort of Distributed Storage in order to be able to achieve Live migration and perhaps HA for the VMs. Since the cluster is so small I prefer to have the Distributed storage working on the same nodes but I am seeing lots of conflicting info regarding Sheepdog not being ready for production and some people against DRBD.

So what are my options?

Again, I am aware that for HA I need 3 servers but at the moment I am just trying to get a feel for Proxmox. Once I am comfortable I will upgrade to 4 Nodes each with 4 Disks with Hardware Raid 10 for production in the very near future. I will also need to get a Proxmox Support Contract so for the time being please bear with me.

Thanks in advance

Samwayne
 
Hello,

I have a simple non-production 2-Node Cluster setup with offline migration working fine at the moment. Each Node has two separate Hard 500GB Disks (NO RAID), the first current has Proxmox installed and the 2nd is empty at the moment. Each node will only have about 3 - 4 VMs running at the most so it will be very low traffic.

Public NIC: 100Mbp/s
Private: 1Gbp/s (The cluster uses this for communication between nodes)

I want to be able to the following:

- Live migration
- HA for VMs

So I believe I need some sort of Distributed Storage in order to be able to achieve Live migration and perhaps HA for the VMs. Since the cluster is so small I prefer to have the Distributed storage working on the same nodes but I am seeing lots of conflicting info regarding Sheepdog not being ready for production and some people against DRBD.

So what are my options?

Again, I am aware that for HA I need 3 servers but at the moment I am just trying to get a feel for Proxmox. Once I am comfortable I will upgrade to 4 Nodes each with 4 Disks with Hardware Raid 10 for production in the very near future. I will also need to get a Proxmox Support Contract so for the time being please bear with me.

Thanks in advance

Samwayne

Hi, the main problem with sheepdog, it that they are currently a lot of change between release, new internal formats,etc... I think a 1.0 release is targetted for the end of year. But it's already stable, you just need to be carefull on update.

I think you can also try ceph, It should work without problem on proxmox kernel.

(ceph/sheepdog : you need 3 hosts minimum)
 
Hi, the main problem with sheepdog, it that they are currently a lot of change between release, new internal formats,etc... I think a 1.0 release is targetted for the end of year. But it's already stable, you just need to be carefull on update.

I think you can also try ceph, It should work without problem on proxmox kernel.

(ceph/sheepdog : you need 3 hosts minimum)

Hi Spirit, I was aware that ceph needed 3 hosts minimum but wasn't aware it was the same for Sheepdog. So would you be in agreement that iit looks like the most feasible option would be NFS Storage with DRBD?
 
Last edited:
OK...I followed http://pve.proxmox.com/wiki/DRBD and was able to get DRBD running and was able to create a VM and do an offline migration in about 1 sec. However, the time I attempted an online migration I am got some errors.

Code:
Jun 28 12:26:35 starting migration of VM 100 to node 'node1' (10.10.4.100)
Jun 28 12:26:35 copying disk images
Jun 28 12:26:35 starting VM 100 on remote node 'node1'
Jun 28 12:26:37 starting migration tunnel
Jun 28 12:26:37 starting online/live migration on port 60000
Jun 28 12:26:37 migrate_set_speed: 8589934592
Jun 28 12:26:37 migrate_set_downtime: 0.1
Jun 28 12:26:39 migration status: active (transferred 164932913, remaining 50704384), total 1082523648)
Jun 28 12:26:41 migration speed: 256.00 MB/s - downtime 69 ms
Jun 28 12:26:41 migration status: completed
Jun 28 12:26:41 ERROR: VM 100 not running
Jun 28 12:26:41 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@10.10.4.100 qm resume 100 --skiplock' failed: exit code 2
Jun 28 12:26:44 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems

However, the second time it was fine:

Code:
Jun 28 12:35:19 starting migration of VM 100 to node 'node1' (10.10.4.100)
Jun 28 12:35:19 copying disk images
Jun 28 12:35:19 starting VM 100 on remote node 'node1'
Jun 28 12:35:20 starting migration tunnel
Jun 28 12:35:20 starting online/live migration on port 60000
Jun 28 12:35:20 migrate_set_speed: 8589934592
Jun 28 12:35:20 migrate_set_downtime: 0.1
Jun 28 12:35:22 migration status: active (transferred 290445585, remaining 0), total 1082523648)
Jun 28 12:35:24 migration speed: 256.00 MB/s - downtime 51 ms
Jun 28 12:35:24 migration status: completed
Jun 28 12:35:27 migration finished successfuly (duration 00:00:08)
TASK OK

Any idea why I got the errors initially? Could it be that I tried to do a live migration too soon after creating the VM?
 
OK...I followed http://pve.proxmox.com/wiki/DRBD and was able to get DRBD running and was able to create a VM and do an offline migration in about 1 sec. However, the time I attempted an online migration I am got some errors.

Code:
Jun 28 12:26:35 starting migration of VM 100 to node 'node1' (10.10.4.100)
Jun 28 12:26:35 copying disk images
Jun 28 12:26:35 starting VM 100 on remote node 'node1'
Jun 28 12:26:37 starting migration tunnel
Jun 28 12:26:37 starting online/live migration on port 60000
Jun 28 12:26:37 migrate_set_speed: 8589934592
Jun 28 12:26:37 migrate_set_downtime: 0.1
Jun 28 12:26:39 migration status: active (transferred 164932913, remaining 50704384), total 1082523648)
Jun 28 12:26:41 migration speed: 256.00 MB/s - downtime 69 ms
Jun 28 12:26:41 migration status: completed
Jun 28 12:26:41 ERROR: VM 100 not running
Jun 28 12:26:41 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@10.10.4.100 qm resume 100 --skiplock' failed: exit code 2
Jun 28 12:26:44 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems

However, the second time it was fine:

Code:
Jun 28 12:35:19 starting migration of VM 100 to node 'node1' (10.10.4.100)
Jun 28 12:35:19 copying disk images
Jun 28 12:35:19 starting VM 100 on remote node 'node1'
Jun 28 12:35:20 starting migration tunnel
Jun 28 12:35:20 starting online/live migration on port 60000
Jun 28 12:35:20 migrate_set_speed: 8589934592
Jun 28 12:35:20 migrate_set_downtime: 0.1
Jun 28 12:35:22 migration status: active (transferred 290445585, remaining 0), total 1082523648)
Jun 28 12:35:24 migration speed: 256.00 MB/s - downtime 51 ms
Jun 28 12:35:24 migration status: completed
Jun 28 12:35:27 migration finished successfuly (duration 00:00:08)
TASK OK

Any idea why I got the errors initially? Could it be that I tried to do a live migration too soon after creating the VM?

Hello,

I have exactly the same error / problem. Did you find a solution ?

Martin
 
(ceph/sheepdog : you need 3 hosts minimum)

CEPH do not have to have 3 nodes minimum. By default CEPH does 2 replicas which is for 2 nodes. If you want to 3 replicas distributed among 3 nodes, then you will have to manually tell CEPH to do 3 replicas. CEPH actually works just fine with 2 nodes. I am sure you are aware of new Proxmox VE 3.2 which supports Proxmox and CEPH on same hardware, so your existing 2 nodes potentially could do double duty with ease assuming you have goo dneough resources such as memory and CPU. Sounds like you have a very small cluster, so there are no reason that you could not have both Proxmox and CEPH running on your existing hardware without spending money.

Sheepdog, as spirit pointed out already, goes through so many changes, i myself question about the dependability. I rather install something with almost full assurance that it is going to work for a very long time. :)
 
Hi samwayne

I have installed DRBD that are working in two PVE nodes since several years ago without problems (each Sunday a script is executed automatically for verification of replicated DRBD volumes with reports by email that tell me that my DRBD volumes are perfectly synchronized). In general lines, these are my configurations:

About of DRBD:
1- Each PVE Host have a dedicated DRBD volume, then, with two PVE Nodes i have 2 DRBD volumes, and each DRBD volume is on a different HDD for get more speed of access.
2- LVM is on top of DRBD (wiki of PVE)
3- Tuning of DRBD (this isn't in the PVE wiki, but you can see it in the website of DRBD linbit that tell us about that)
4- DRBD is configured for do reports immediately via mail if a volume is desynchronized (into of DRBD global configuration file)
5- Bonding in 2 NICs 1 Gb/s in balance-rr mode (NIC-to-NIC) for each DRBD volume (for duplicate the speed of net, and is sure DRBD will still work if a network interface is broken)
6- A script that is executed automatical and weekly for verification of replicated DRBD volumes

About of PVE cluster:
1- As i only have two PVE nodes, the PVE cluster configuration have only 1 vote for get quorum
2- fence_ack_manual is configured for get HA (human intervention is required for this case, but in little and simple steps)
Note: I know that this configuration of HA isn't recommended, but as i have fence_ack_manual, i have time for think, study the case and decide if i want apply manually the manual fence (that isn't a concern while DRBD is working very well)

For the backup:
1- The first PVE node do backup in the second PVE node
2- The second PVE node do backup in the first PVE node

For do several things:
With these configurations, i always can do live migration (in some seconds), live backup, and anything that PVE can do.

Best regards
Cesar

Re edited: The NICs for use of DRBD are dedicated only for these tasks!!! (that will be the same case if you want to use CEPH or some other replicated storage system and want get a higher performance for your VMs when the data must be written to his virtual disk).
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!