Proxmox setup with external storage

zentavr · Dec 5, 2013

Hello all

I'm going to set up Proxmox cluster, the schema of hardware is the next:

As you can see - I have 2 nodes with identical hardware (Mainboard TYAN S7050, Intel(R) Xeon(R) CPU E5-2620 CPU), on every server i have LSI SAS HBA 9207-8i controllers through which the nodes are connected to Supermicro SCE 837E26-RJBOD1 shared storage (using the special cable - External mSASx4 (SFF-8088) to mSASx4 (SFF-8088) cable).

I'd setup a cluster, fencing daemon and now the big deal is a storage.
All the nodes see the disks as separate disks, but not as a single storage:

Code:

NAME                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                          8:0    0  93.2G  0 disk
├─sda1                       8:1    0   511M  0 part
│ └─md0                      9:0    0   511M  0 raid1 /boot
└─sda2                       8:2    0  92.7G  0 part
  └─md1                      9:1    0  92.7G  0 raid1
    ├─pve-root (dm-0)      253:0    0  23.3G  0 lvm   /
    ├─pve-swap (dm-9)      253:9    0  11.6G  0 lvm   [SWAP]
    └─pve-data (dm-10)     253:10   0  46.2G  0 lvm   /var/lib/vz
sdb                          8:16   0  93.2G  0 disk
├─sdb1                       8:17   0   511M  0 part
│ └─md0                      9:0    0   511M  0 raid1 /boot
└─sdb2                       8:18   0  92.7G  0 part
  └─md1                      9:1    0  92.7G  0 raid1
    ├─pve-root (dm-0)      253:0    0  23.3G  0 lvm   /
    ├─pve-swap (dm-9)      253:9    0  11.6G  0 lvm   [SWAP]
    └─pve-data (dm-10)     253:10   0  46.2G  0 lvm   /var/lib/vz
sdc                          8:32   0 558.9G  0 disk
└─35000cca041935468 (dm-1) 253:1    0 558.9G  0 mpath
sdd                          8:48   0 558.9G  0 disk
└─35000cca041933700 (dm-2) 253:2    0 558.9G  0 mpath
sde                          8:64   0 558.9G  0 disk
└─35000cca04194edd4 (dm-3) 253:3    0 558.9G  0 mpath
sdf                          8:80   0 558.9G  0 disk
└─35000cca04194d080 (dm-4) 253:4    0 558.9G  0 mpath
sdg                          8:96   0 558.9G  0 disk
└─35000cca041927e74 (dm-5) 253:5    0 558.9G  0 mpath
sdh                          8:112  0 558.9G  0 disk
└─35000cca04193a90c (dm-6) 253:6    0 558.9G  0 mpath
sdi                          8:128  0 558.9G  0 disk
└─35000cca04193a51c (dm-7) 253:7    0 558.9G  0 mpath
sdj                          8:144  0 558.9G  0 disk
└─35000cca04192f58c (dm-8) 253:8    0 558.9G  0 mpath

What I want to do is to have at least RAID1 or RAID5 from my 8 disks, then set up either GFS2 over it or CLVM+GFS2.
I'm afraid, that software RAID (dmraid) won't work effectively here.
I wonder, what can I do here in order not to have very poor integrity?

adamb · Dec 5, 2013

Instead of an HBA I would have went with a hardware raid card.

What exactly is your goal here? Are you looking for HA? HA is going to be very tough without hardware raid. Explaining what you want might help us come up with some solutions to your issue.

zentavr · Dec 5, 2013

adamb said:
Instead of an HBA I would have went with a hardware raid card.

Ok, if I replace LSI HBA cards, to, example, LSA raid cards - how then I need to build the hardware RAID?
How the card of pve01 server would know what the changes are being done by the card of pve02?
E.g. imagine the situation: due to some reasons RAID got crashed and 2 RAID cards (because they see the same devices) start to rebuild it.

adamb said:
What exactly is your goal here? Are you looking for HA? HA is going to be very tough without hardware raid. Explaining what you want might help us come up with some solutions to your issue.

I want to have HA with Live migration support. So, couple of VMs would live on the first proxmox node and other couple on the second. When one of the nodes dies - the VMs of it are recovered on the alive node.

mir · Dec 5, 2013

zentavr said:
I want to have HA with Live migration support. So, couple of VMs would live on the first proxmox node and other couple on the second. When one of the nodes dies - the VMs of it are recovered on the alive node.

You have read this?
http://pve.proxmox.com/wiki/DRBD
http://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster

With only two nodes and using a single storage with direct LUN attachment I think the above is your only solution.

PS. I have no experience with GlusterFS and therefore I don't know whether GlusterFS can be used as well

zentavr · Dec 5, 2013

mir said:
You have read this?
http://pve.proxmox.com/wiki/DRBD
http://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster

With only two nodes and using a single storage with direct LUN attachment I think the above is your only solution.
PS. I have no experience with GlusterFS and therefore I don't know whether GlusterFS can be used as well

I do know about DRBD, but we'd spent couple thousands of $$ for that storage...
In case of using DRBD i need to divide the 4 HDDs to one node and other 4 HDDs to the second.
+ mirror raid and as a result all the space gets lower in 4 times

adamb · Dec 5, 2013

zentavr said:
I do know about DRBD, but we'd spent couple thousands of $$ for that storage...
In case of using DRBD i need to divide the 4 HDDs to one node and other 4 HDDs to the second.
+ mirror raid and as a result all the space gets lower in 4 times

Yea DRBD is about your only option at this point. Best bet would have been central storage with raid cards of some sort.

zentavr · Dec 5, 2013

adamb said:
Yea DRBD is about your only option at this point. Best bet would have been central storage with raid cards of some sort.

As I said in my previous message:

Ok, if I replace LSI HBA cards, to, example, LSI raid cards - how then I need to build the hardware RAID?
How the card of pve01 server would know what the changes are being done by the card of pve02?
E.g. imagine the situation: due to some reasons RAID got crashed and 2 RAID cards (because they see the same devices) start to rebuild it.

udo · Dec 5, 2013

zentavr said:
I do know about DRBD, but we'd spent couple thousands of $$ for that storage...
In case of using DRBD i need to divide the 4 HDDs to one node and other 4 HDDs to the second.
+ mirror raid and as a result all the space gets lower in 4 times

Hi,
my first idea was also DRBD, but in your case...
If i'm right, your "special cable" is an SFF-8088 to both nodes, which means that both nodes see the same storage. And the storage is only JBOD - this means that DRBD makes not much sense.
You can split the disks 4 for node a and 4 for node b, but if you have trouble with your external storage all is gone... the concept of DRBD is to use seperate storage (on each node).
Your external storage looks that's optimal for ZFS or CEPH - but than you need further server to provide the storage to the proxmox host (and for ceph the all 3 times).

Not easy...

Udo

zentavr · Dec 5, 2013

udo said:
Hi,
my first idea was also DRBD, but in your case...
If i'm right, your "special cable" is an SFF-8088 to both nodes, which means that both nodes see the same storage.

Yes, I'd just reviewed the bill for the equipment - it was External mSASx4 (SFF-8088) to mSASx4 (SFF-8088) cable, 1m

udo said:
And the storage is only JBOD - this means that DRBD makes not much sense.
You can split the disks 4 for node a and 4 for node b, but if you have trouble with your external storage all is gone... the concept of DRBD is to use seperate storage (on each node).

That was I said. The factory, who had bought that configuration would very upset that they spent ~$5k for the device, which cannot do any redundancy storage.
In case of DRBD we could put 4 disks into server-case-a and 4 disks into server-case-b and that's it. ...for $4k we could buy a lot of ice-cream and have a party

udo said:
Your external storage looks that's optimal for ZFS or CEPH - but than you need further server to provide the storage to the proxmox host (and for ceph the all 3 times).

but again - i need 3rd server in this case? no? ZFS is not a clustered file system.

I wonder, if i replace the LSI HBA card to LSI RAID card - would I be able to set up reliable RAID storage which wouldn't be ruined by writing from the both nodes?

udo · Dec 6, 2013

zentavr said:
That was I said. The factory, who had bought that configuration would very upset that they spent ~$5k for the device, which cannot do any redundancy storage.

Can't understand how someone think that an JBOD is an reduncance storage - equal if it's $100 or $5k cost.

In case of DRBD we could put 4 disks into server-case-a and 4 disks into server-case-b and that's it. ...for $4k we could buy a lot of ice-cream and have a party

Normaly, the usecase and planning for reduncy, SPOF, failover and so on is doing before buy something...

but again - i need 3rd server in this case? no? ZFS is not a clustered file system.

One scenario is a third server, which is connected to your Storage (on this can run opensolaris (illomous?) to use the disks for an ZFS-Pool). This server can distribute the storage as iSCSI to the proxmox ve nodes.
In this case you have no real redundancy on the storage-side. But there are solutions with two storage-servers (like nexenta).

The normal way to use shared storage (redundant with two controller possible) on two nodes is an SAS-Raid ( like this http://eurostor.de/en/products/raid-sas-host/es-6600-sassas.html ) connected to two nodes and use LVM on the storage. Basically your storage box with an internal raid-controller and sas-outlets.

I wonder, if i replace the LSI HBA card to LSI RAID card - would I be able to set up reliable RAID storage which wouldn't be ruined by writing from the both nodes?

You can't use the same disks in two raidsets (from different hosts)...

Udo

zentavr · Dec 6, 2013

udo said:
Can't understand how someone think that an JBOD is an reduncance storage - equal if it's $100 or $5k cost.

heh, this is very long story. In short, the first configuration was very differ instead of they have now. There were several "advisors" who tried to insert their 5 cents into the process.
For now first "advisor" disappeared at all.
The second "advisor" tells that:
- "...lets begin with, that in Windows 2012R2 you should create the pool of disks, then you create software RAIDs: Cluster StorageSpace. Every of each you attach to the nodes. How this could be done in Linux - I cannot to say. I hadn't tested this yet. As I know, in Linux you can create shared storage (software RAID) and attach it to the active node"

udo said:
Normaly, the usecase and planning for reduncy, SPOF, failover and so on is doing before buy something...

yes, this was done. but as I said, "advisors" appeared and finance director paid for another bill, not the bill which was provided at the beginning of the process.

udo said:
One scenario is a third server, which is connected to your Storage (on this can run opensolaris (illomous?) to use the disks for an ZFS-Pool). This server can distribute the storage as iSCSI to the proxmox ve nodes.
In this case you have no real redundancy on the storage-side. But there are solutions with two storage-servers (like nexenta).

So we do need another 2 servers. but again, how they would work with JBOD?

udo said:
The normal way to use shared storage (redundant with two controller possible) on two nodes is an SAS-Raid ( like this http://eurostor.de/en/products/raid-sas-host/es-6600-sassas.html ) connected to two nodes and use LVM on the storage. Basically your storage box with an internal raid-controller and sas-outlets.

Thank you for the example. My thoughts were about that solution from the beginning.

udo said:
You can't use the same disks in two raidsets (from different hosts)...

yes, I understand. So, I need to throw away LSI SAS HBA 9207-8i cards + cables and Supermicro SCE 837E26-RJBOD1, and need another equipment, right?

mir · Dec 6, 2013

zentavr said:
yes, I understand. So, I need to throw away LSI SAS HBA 9207-8i cards + cables and Supermicro SCE 837E26-RJBOD1, and need another equipment, right?

If you can live without distributed redundant storage the Supermicro JBOD is an excellent option for a ZFS storage pool. This will require 1 extra server which is suppose to distribute ZFS datasets through iSCSI to the two proxmox servers. Newest proxmox has support for native ZFS pools/datasets through iSCSI which at the same time ads features like life snapshots and clones.

http://pve.proxmox.com/wiki/Storage:_ZFS

zentavr · Dec 6, 2013

mir said:
If you can live without distributed redundant storage the Supermicro JBOD is an excellent option for a ZFS storage pool. This will require 1 extra server which is suppose to distribute ZFS datasets through iSCSI to the two proxmox servers. Newest proxmox has support for native ZFS pools/datasets through iSCSI which at the same time ads features like life snapshots and clones.

http://pve.proxmox.com/wiki/Storage:_ZFS

Yes, but if one of 8 HDDs gets stuck - all the storage will die.
As for another server - frankly speaking, I want to have as less software pieces as possible

Probably we'll pay little more and replace Supermicro SCE 837E26-RJBOD1 to another device, but it would work as it was agreed before.

mir · Dec 6, 2013

zentavr said:
Yes, but if one of 8 HDDs gets stuck - all the storage will die.

RAIDz2 is your friend. RAIDz2 means you can survive using 2 of your 8 disk and still have your complete storage.
https://blogs.oracle.com/partnertech/entry/a_hands_on_introduction_to1

Search

Search

Proxmox setup with external storage

zentavr

New Member

adamb

Famous Member

zentavr

New Member

mir

Famous Member

zentavr

New Member

adamb

Famous Member

zentavr

New Member

udo

Distinguished Member

zentavr

New Member

udo

Distinguished Member

zentavr

New Member

mir

Famous Member

zentavr

New Member

mir

Famous Member

We value your privacy