Best RAID configuration for PVE

nick

Renowned Member
Mar 28, 2007
364
1
83
In this thread I intend to ask all users who use RAID with PVE to talk about the performance, stability and the most important part if something worng happen when they use RAID environment.

For exmample: what is the performant and stable RAID configuration for PVE?

RAID 5? RAID 10? RAID 6?

Please, who know some answer post your opinion and help us to learn from your expirience!

Thank you!

PS: please set this thread stiky if is ok
 
The ideea is to see how PVE work with RAID Hardware; not in general....
 
Can someone tell me what is the best configuration RAID configuration for PVE?
I want to be faster...and also secure!

RAID 10 or RAID 5? I need some opninions...
 
Optimal RAID Alignment

RAID 1 / 10 will give best RAID performance. For best filesystem performance you will need to align the partion and the PV and the FS. I did it and it was not that easy to get it done as by default the partioning tools for linux are always starting at sector 63 and a PV adds some unaligned metadata in front of the PEs.
The parted tool allows you to specify the partion start, use 2048s (which will align with anything up to 1MB chunk size). When creating the PV you need to pass --metadatasize 1020k (not use 1024k here, it will round up to something else!) you can verify if the PV is really aligned using:
Code:
pvs -o+pe_start --units s
Last but not least you should pass the RAID alignment to the filesystem like this:
Code:
mkfs.ext3 -b 4096 -E stride=16 -E stripe-width=32 /dev/XXXX
or for existing filesystems:
Code:
tune2fs  -E stride=16 -E stripe-width=32 /dev/XXXX
The stride and stripe-width size need to be adjusted to your RAID settings, the formular for that is:
Code:
raid chunk size / fs block size = stride
number of data-bearing drives * stride size = stripe-width
The example above is for a RAID 10 with 4 disks and a chunk size of 64K

Hope this short guide helps someone!
 
Optimal RAID Alignment: Benchmark Results

Here are the recorded benchmark results of my analysis:
Code:
RAID10 256KB
dd if=/dev/sda of=/dev/null bs=1M count=16000
16777216000 bytes (17 GB) copied, 90.1107 s, 186 MB/s

dd if=/dev/zero of=/dev/sda bs=1M count=16000
16777216000 bytes (17 GB) copied, 90.3506 s, 186 MB/s

Partition Alignment 2048s
mkfs.ext3 -b 4096 -E stride=64 -E stripe-width=128 /dev/sda1
mount -o noatime /dev/sda1 /mnt
bonnie++ -u root -f -n 0 -r 8000 -d /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
finnix          16G           145167  29 70308  10           183059  11 734.5  14
Latency                         596ms     380ms             57949us     140ms

Partition Alignment 63s
mkfs.ext3 -b 4096 -E stride=64 -E stripe-width=128 /dev/sda1
mount -o noatime /dev/sda1 /mnt
bonnie++ -u root -f -n 0 -r 8000 -d /mnt
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
finnix          16G           147447  30 69470  10           182686  11 569.6  13
Latency                         497ms    1989ms               222ms     532ms
Bigger sequential reads or writes will not benefit from this, but random reads and writes will do, as misaligned data causes extra reads/writes for nothing. You can see nicely that the seeks/s has a dramatic change.

If you optimize the alignment it's important to align through all layers starting with the disk (RAID) up to the filesystem, else it's pretty useless to do so. In real-world scenarios the correct alignment can get up to 30% more performance. For some more background information about this not so well known topic can be found here:
http://msdn.microsoft.com/en-us/library/dd758814.aspx
or here:
http://lonesysadmin.net/2006/05/20/vmware-io-problems/

The impact of misaligned is worse with RAID5 than RAID1/10 as extra writes are very expensive (because of the parity).

The impact is big enough that Microsoft has changed the default alignment when creating new partitions on Windows to Server 2008 to 1MB (2048 sectors).

It would be nice if Proxmox could incorporate the said alignment settings from above into the installer! For non-RAID systems there is no drawback besides wasting less than 1MB of the disk.
 
Hi,
the speed up with random seek is convincing! I think i will do some tuning in the next time...

But i'm wonder about the values with dd - read and write with the same speed?
Perhaps the raid-controller limit the speed? I try just a read (only read on a productive-system):

Code:
dd if=/dev/sda of=/dev/null bs=1M count=16000
16000+0 Datensätze ein
16000+0 Datensätze aus
16777216000 Bytes (17 GB) kopiert, 52,4607 s, 320 MB/s

Also with a raid-10 (4 disks).
 
Hi,
the speed up with random seek is convincing! I think i will do some tuning in the next time...

But i'm wonder about the values with dd - read and write with the same speed?
Perhaps the raid-controller limit the speed? I try just a read (only read on a productive-system):

Code:
dd if=/dev/sda of=/dev/null bs=1M count=16000
16000+0 Datensätze ein
16000+0 Datensätze aus
16777216000 Bytes (17 GB) kopiert, 52,4607 s, 320 MB/s
Also with a raid-10 (4 disks).

Well, the performance of that particular RAID-system as RAID10 is indeed not as good as it could be. This is indeed a limitation of the raid controller in combination of the bad linux driver. I should warn all Linux users to NOT use HighPoint based raid controllers! The hardware is very good but the driver is not getting the performance it could get. 3ware RAID controllers give much better performance with the same RAID settings and disks on Linux!
 
Well, the performance of that particular RAID-system as RAID10 is indeed not as good as it could be. This is indeed a limitation of the raid controller in combination of the bad linux driver. I should warn all Linux users to NOT use HighPoint based raid controllers! The hardware is very good but the driver is not getting the performance it could get. 3ware RAID controllers give much better performance with the same RAID settings and disks on Linux!

Oh one other thing, if you test the read speed you have to drop the caches (echo 3 > /proc/sys/vm/drop_caches) and use a size which is 2 times of physical host memory! Else you test buffered reads and not the RAID system ;)
 
micro.bauer: Awesome info!

here is the guide I found this info on that may be helpful for people that need a little more depth/explanation:

http://wiki.tldp.org/LVM-on-RAID

Nice resource/guide! I took a quick look and it seems to be including all the important bits, except one parameter that isn't supported in the LVM version that Debian 5.0 (Lenny) ships: "--dataalignment" for that reason I used "--metadatasize" which has the same effect but more as a side-effect than made for it :) (as it adds the metadata before the first PE)
 
Oh one other thing, if you test the read speed you have to drop the caches (echo 3 > /proc/sys/vm/drop_caches) and use a size which is 2 times of physical host memory! Else you test buffered reads and not the RAID system ;)

Hi Mirco,
you are right, but in my case is the chance, that the first 16G are buffered very small. A short test with droped caches and more than twice dd-data give slightly the same result (a little bit better - 328MB/s). BTW it's an Areca-Controller.

Also bad experience with the linux-driver i got with lsi-raids (onboard such as at sun-hw).

Udo
 
Having a hard time getting this down- Is there a suggested procedure for realigning partitions if you are using a single RAID1?

The tune2fs parameters mentioned by micro.bauer seem to be specific to striped arrays - 0 or 5.

My drives have a 512b sector, and the raid card is set to 64kb chunks on a raid1.

Should I stop all services, tar the partition and recreate it with adjusted metadata size on the lvm?
 
Having a hard time getting this down- Is there a suggested procedure for realigning partitions if you are using a single RAID1?

The tune2fs parameters mentioned by micro.bauer seem to be specific to striped arrays - 0 or 5.

My drives have a 512b sector, and the raid card is set to 64kb chunks on a raid1.

Should I stop all services, tar the partition and recreate it with adjusted metadata size on the lvm?
Hi,
I'm not sure if you get an big effort on a raid-1 (save but no speed) with alignment. I think the alignment of the partitionlayout is the only thing what you can do.

Udo
 
I dont imagine a big speed increase, but I want to try it to at least know this is done for my next server that will probably have RAID6.

partition alignment is what I am asking about - is this possible on proxmox on the primary drive, or is it only possible doing a scratch Debian install to get manual partitioning and using apt to install proxmox?
 
I dont imagine a big speed increase, but I want to try it to at least know this is done for my next server that will probably have RAID6.

partition alignment is what I am asking about - is this possible on proxmox on the primary drive, or is it only possible doing a scratch Debian install to get manual partitioning and using apt to install proxmox?
Hi,
if you have the chance go for raid-10 instead of raid-6 (perhaps you can do some performance test, normaly you see a huge difference on the right raid-controller).
But for aligning - in short: I do a normal install, and then boot a live-cd (like grml) with an second harddisk to store the data during aligning. Save the content of pve-data and pve-root with tar to the second harddisk, delete the vg pve, remove the pv, install a new partition layout (left /boot untouched, i don't need aligning here). create pv, create vg pve, create lv swap (-Cy), root and data (leave min. 4GB free in vg pve for backup). mkfs.ext3 /dev/pve/root and data, mkswap -f /dev/pve/swap; untar the content back to pve-root and pve-data. reboot and all done!

If you not firm with linux you will learn a lot ;)

Udo
 
thanks - thats what I figured, (hoped to avoid). I am good with nix, just not lvm and alignment calculations.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!