Proxmox setup with external storage

zentavr

New Member
Dec 5, 2013
7
0
1
Ukraine
Hello all :)
I'm going to set up Proxmox cluster, the schema of hardware is the next:
setup.png
As you can see - I have 2 nodes with identical hardware (Mainboard TYAN S7050, Intel(R) Xeon(R) CPU E5-2620 CPU), on every server i have LSI SAS HBA 9207-8i controllers through which the nodes are connected to Supermicro SCE 837E26-RJBOD1 shared storage (using the special cable - External mSASx4 (SFF-8088) to mSASx4 (SFF-8088) cable).

I'd setup a cluster, fencing daemon and now the big deal is a storage.
All the nodes see the disks as separate disks, but not as a single storage:
Code:
NAME                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                          8:0    0  93.2G  0 disk
├─sda1                       8:1    0   511M  0 part
│ └─md0                      9:0    0   511M  0 raid1 /boot
└─sda2                       8:2    0  92.7G  0 part
  └─md1                      9:1    0  92.7G  0 raid1
    ├─pve-root (dm-0)      253:0    0  23.3G  0 lvm   /
    ├─pve-swap (dm-9)      253:9    0  11.6G  0 lvm   [SWAP]
    └─pve-data (dm-10)     253:10   0  46.2G  0 lvm   /var/lib/vz
sdb                          8:16   0  93.2G  0 disk
├─sdb1                       8:17   0   511M  0 part
│ └─md0                      9:0    0   511M  0 raid1 /boot
└─sdb2                       8:18   0  92.7G  0 part
  └─md1                      9:1    0  92.7G  0 raid1
    ├─pve-root (dm-0)      253:0    0  23.3G  0 lvm   /
    ├─pve-swap (dm-9)      253:9    0  11.6G  0 lvm   [SWAP]
    └─pve-data (dm-10)     253:10   0  46.2G  0 lvm   /var/lib/vz
sdc                          8:32   0 558.9G  0 disk
└─35000cca041935468 (dm-1) 253:1    0 558.9G  0 mpath
sdd                          8:48   0 558.9G  0 disk
└─35000cca041933700 (dm-2) 253:2    0 558.9G  0 mpath
sde                          8:64   0 558.9G  0 disk
└─35000cca04194edd4 (dm-3) 253:3    0 558.9G  0 mpath
sdf                          8:80   0 558.9G  0 disk
└─35000cca04194d080 (dm-4) 253:4    0 558.9G  0 mpath
sdg                          8:96   0 558.9G  0 disk
└─35000cca041927e74 (dm-5) 253:5    0 558.9G  0 mpath
sdh                          8:112  0 558.9G  0 disk
└─35000cca04193a90c (dm-6) 253:6    0 558.9G  0 mpath
sdi                          8:128  0 558.9G  0 disk
└─35000cca04193a51c (dm-7) 253:7    0 558.9G  0 mpath
sdj                          8:144  0 558.9G  0 disk
└─35000cca04192f58c (dm-8) 253:8    0 558.9G  0 mpath

What I want to do is to have at least RAID1 or RAID5 from my 8 disks, then set up either GFS2 over it or CLVM+GFS2.
I'm afraid, that software RAID (dmraid) won't work effectively here.
I wonder, what can I do here in order not to have very poor integrity?
 
Last edited:
Instead of an HBA I would have went with a hardware raid card.

What exactly is your goal here? Are you looking for HA? HA is going to be very tough without hardware raid. Explaining what you want might help us come up with some solutions to your issue.
 
Instead of an HBA I would have went with a hardware raid card.
Ok, if I replace LSI HBA cards, to, example, LSA raid cards - how then I need to build the hardware RAID?
How the card of pve01 server would know what the changes are being done by the card of pve02?
E.g. imagine the situation: due to some reasons RAID got crashed and 2 RAID cards (because they see the same devices) start to rebuild it.

What exactly is your goal here? Are you looking for HA? HA is going to be very tough without hardware raid. Explaining what you want might help us come up with some solutions to your issue.
I want to have HA with Live migration support. So, couple of VMs would live on the first proxmox node and other couple on the second. When one of the nodes dies - the VMs of it are recovered on the alive node.
 
Last edited:
I want to have HA with Live migration support. So, couple of VMs would live on the first proxmox node and other couple on the second. When one of the nodes dies - the VMs of it are recovered on the alive node.
You have read this?
http://pve.proxmox.com/wiki/DRBD
http://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster

With only two nodes and using a single storage with direct LUN attachment I think the above is your only solution.

PS. I have no experience with GlusterFS and therefore I don't know whether GlusterFS can be used as well
 
You have read this?
http://pve.proxmox.com/wiki/DRBD
http://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster

With only two nodes and using a single storage with direct LUN attachment I think the above is your only solution.
PS. I have no experience with GlusterFS and therefore I don't know whether GlusterFS can be used as well

I do know about DRBD, but we'd spent couple thousands of $$ for that storage...
In case of using DRBD i need to divide the 4 HDDs to one node and other 4 HDDs to the second.
+ mirror raid and as a result all the space gets lower in 4 times :(
 
I do know about DRBD, but we'd spent couple thousands of $$ for that storage...
In case of using DRBD i need to divide the 4 HDDs to one node and other 4 HDDs to the second.
+ mirror raid and as a result all the space gets lower in 4 times :(

Yea DRBD is about your only option at this point. Best bet would have been central storage with raid cards of some sort.
 
Yea DRBD is about your only option at this point. Best bet would have been central storage with raid cards of some sort.

As I said in my previous message:
Ok, if I replace LSI HBA cards, to, example, LSI raid cards - how then I need to build the hardware RAID?
How the card of pve01 server would know what the changes are being done by the card of pve02?
E.g. imagine the situation: due to some reasons RAID got crashed and 2 RAID cards (because they see the same devices) start to rebuild it.
 
I do know about DRBD, but we'd spent couple thousands of $$ for that storage...
In case of using DRBD i need to divide the 4 HDDs to one node and other 4 HDDs to the second.
+ mirror raid and as a result all the space gets lower in 4 times :(
Hi,
my first idea was also DRBD, but in your case...
If i'm right, your "special cable" is an SFF-8088 to both nodes, which means that both nodes see the same storage. And the storage is only JBOD - this means that DRBD makes not much sense.
You can split the disks 4 for node a and 4 for node b, but if you have trouble with your external storage all is gone... the concept of DRBD is to use seperate storage (on each node).
Your external storage looks that's optimal for ZFS or CEPH - but than you need further server to provide the storage to the proxmox host (and for ceph the all 3 times).

Not easy...

Udo
 
Hi,
my first idea was also DRBD, but in your case...
If i'm right, your "special cable" is an SFF-8088 to both nodes, which means that both nodes see the same storage.
Yes, I'd just reviewed the bill for the equipment - it was External mSASx4 (SFF-8088) to mSASx4 (SFF-8088) cable, 1m

And the storage is only JBOD - this means that DRBD makes not much sense.
You can split the disks 4 for node a and 4 for node b, but if you have trouble with your external storage all is gone... the concept of DRBD is to use seperate storage (on each node).
That was I said. The factory, who had bought that configuration would very upset that they spent ~$5k for the device, which cannot do any redundancy storage.
In case of DRBD we could put 4 disks into server-case-a and 4 disks into server-case-b and that's it. ...for $4k we could buy a lot of ice-cream and have a party :)

Your external storage looks that's optimal for ZFS or CEPH - but than you need further server to provide the storage to the proxmox host (and for ceph the all 3 times).
but again - i need 3rd server in this case? no? ZFS is not a clustered file system.

I wonder, if i replace the LSI HBA card to LSI RAID card - would I be able to set up reliable RAID storage which wouldn't be ruined by writing from the both nodes?
 
That was I said. The factory, who had bought that configuration would very upset that they spent ~$5k for the device, which cannot do any redundancy storage.
Can't understand how someone think that an JBOD is an reduncance storage - equal if it's $100 or $5k cost.
In case of DRBD we could put 4 disks into server-case-a and 4 disks into server-case-b and that's it. ...for $4k we could buy a lot of ice-cream and have a party :)
Normaly, the usecase and planning for reduncy, SPOF, failover and so on is doing before buy something...
but again - i need 3rd server in this case? no? ZFS is not a clustered file system.
One scenario is a third server, which is connected to your Storage (on this can run opensolaris (illomous?) to use the disks for an ZFS-Pool). This server can distribute the storage as iSCSI to the proxmox ve nodes.
In this case you have no real redundancy on the storage-side. But there are solutions with two storage-servers (like nexenta).

The normal way to use shared storage (redundant with two controller possible) on two nodes is an SAS-Raid ( like this http://eurostor.de/en/products/raid-sas-host/es-6600-sassas.html ) connected to two nodes and use LVM on the storage. Basically your storage box with an internal raid-controller and sas-outlets.
I wonder, if i replace the LSI HBA card to LSI RAID card - would I be able to set up reliable RAID storage which wouldn't be ruined by writing from the both nodes?
You can't use the same disks in two raidsets (from different hosts)...

Udo
 
Can't understand how someone think that an JBOD is an reduncance storage - equal if it's $100 or $5k cost.
heh, this is very long story. In short, the first configuration was very differ instead of they have now. There were several "advisors" who tried to insert their 5 cents into the process.
For now first "advisor" disappeared at all.
The second "advisor" tells that:
- "...lets begin with, that in Windows 2012R2 you should create the pool of disks, then you create software RAIDs: Cluster StorageSpace. Every of each you attach to the nodes. How this could be done in Linux - I cannot to say. I hadn't tested this yet. As I know, in Linux you can create shared storage (software RAID) and attach it to the active node"

Normaly, the usecase and planning for reduncy, SPOF, failover and so on is doing before buy something...
yes, this was done. but as I said, "advisors" appeared and finance director paid for another bill, not the bill which was provided at the beginning of the process.

One scenario is a third server, which is connected to your Storage (on this can run opensolaris (illomous?) to use the disks for an ZFS-Pool). This server can distribute the storage as iSCSI to the proxmox ve nodes.
In this case you have no real redundancy on the storage-side. But there are solutions with two storage-servers (like nexenta).
So we do need another 2 servers. but again, how they would work with JBOD?

The normal way to use shared storage (redundant with two controller possible) on two nodes is an SAS-Raid ( like this http://eurostor.de/en/products/raid-sas-host/es-6600-sassas.html ) connected to two nodes and use LVM on the storage. Basically your storage box with an internal raid-controller and sas-outlets.
Thank you for the example. My thoughts were about that solution from the beginning.

You can't use the same disks in two raidsets (from different hosts)...
yes, I understand. So, I need to throw away LSI SAS HBA 9207-8i cards + cables and Supermicro SCE 837E26-RJBOD1, and need another equipment, right?
 
yes, I understand. So, I need to throw away LSI SAS HBA 9207-8i cards + cables and Supermicro SCE 837E26-RJBOD1, and need another equipment, right?
If you can live without distributed redundant storage the Supermicro JBOD is an excellent option for a ZFS storage pool. This will require 1 extra server which is suppose to distribute ZFS datasets through iSCSI to the two proxmox servers. Newest proxmox has support for native ZFS pools/datasets through iSCSI which at the same time ads features like life snapshots and clones.

http://pve.proxmox.com/wiki/Storage:_ZFS
 
If you can live without distributed redundant storage the Supermicro JBOD is an excellent option for a ZFS storage pool. This will require 1 extra server which is suppose to distribute ZFS datasets through iSCSI to the two proxmox servers. Newest proxmox has support for native ZFS pools/datasets through iSCSI which at the same time ads features like life snapshots and clones.

http://pve.proxmox.com/wiki/Storage:_ZFS
Yes, but if one of 8 HDDs gets stuck - all the storage will die.
As for another server - frankly speaking, I want to have as less software pieces as possible :)
Probably we'll pay little more and replace Supermicro SCE 837E26-RJBOD1 to another device, but it would work as it was agreed before.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!