Linux SW RAID needed in Proxmox VE

gkovacs

Renowned Member
Dec 22, 2008
516
51
93
Budapest, Hungary
In my opinion the single biggest missing feature from Proxmox VE is easy array creation and installation to Linux software RAID.
There are many reasons:

1. HW RAID is best for reliability, but do we really need that?
HW RAID was created for ultimate reliablility: if you have a card with a RAID processor and BBU cache, it will ensure array and filesystem consistency in the event of a power failure or kernel panic. But Linux is very stable, we haven't seen a kernel panic for years now (apart from the recent 2.6.32-6 issue :) ), and servers are colocated in data centers with UPS, so power is never lost. Additionally, Proxmox VE supports snapshot backups and high availability cluster mirroring, so if you have all these turned on, you could even use single disk nodes without fear of data loss.

2. HW RAID is a black box to the Linux IO subsystem, killing performance
Proxmox VE (Linux 2.6.32) uses the CFQ IO scheduler by default, which in theory supports IO priorities. In reality however it is so badly performing on HW RAID controllers that many users (including us) have since switched to DEADLINE or NOOP schedulers (and we see much better performance during high IO). The reason is simple: when using a HW RAID controller there is a an IO queue on the card that knows about the disk layout but nothing about the kernel queue, and of course CFQ in the kernel which has no idea how the blocks are laid out physically. These two queues are basically working against each other, reordering requests several times in both places. Needless to say CFQ fully supports Linux SW RAID, so it can optimize IO effectively when it knows about the disk layout.

3. HW RAID is inflexible, incompatible and has no performance advantage
If your expensive RAID controller dies, usually you have to find an exact same model, because arrays are not portable between manufacturers (and sometimes not even within the same company). With software RAID, if your motherboard dies you can buy a different model, Linux will recognize your array without any problems.
Also you can't easily configure SSD caching (or partitions) for your HW RAID array (or you have to buy the highest priced card), while it's much easier with software.
RAID controllers don't have frequent firmware updates and community discussions, while common motherboard chipsets do.
Last but not least: only the newest HW RAID controllers can keep pace with SATA 6Gb/sec speeds, so it's entirely possible that a 2-3 year old RAID controller is slower than SW RAID on a modern simple PC.

4. HW RAID is expensive
A decent controller that supports parity-based arrays can cost as much as a single uniprocessor server, or should I say another node to your cluster. Which would you choose if there would be the option of SW RAID? I would rather spend it on a new node. (Google and Facebook do the same: they use cheap nodes and ensure data consistency in software, they don't rely on expensive proprietary hardware.)

I hope I explained myself clearly: I think HW RAID is an aging, expensive technology that is becoming irrelevant for the open-source cloud.
Hopefully the Proxmox devs will agree and enable us to create SW RAID arrays in PVE 2.0.
 
Last edited:
I've successfully created two RAID10 md devices in the Debian Lenny installer out of 4 disks: md0 is /boot, md1 is lvm.
Now at the LVM / filesystem config part, and have a couple of questions:

- do I have to leave free space for snapshots in the LVM configuration? (all the LVs combined should occupy less space than the VG has? how much less?)
- do I have to create 3 logical volumes (like the PVE installer does) or can the root and data volumes be the same?
- do I have to name the volumes /dev/pve/root, /dev/pve/data, /dev/pve/swap?
- can I use ReiserFS or XFS for root and/or data? is Proxmox VE compatible with other filesystems, or only with ext3?
 
Last edited:
I've successfully created two RAID10 md devices in the Debian Lenny installer out of 4 disks: md0 is /boot, md1 is lvm.
Now at the LVM / filesystem config part, and have a couple of questions:

- do I have to leave free space for snapshots in the LVM configuration? (all the LVs combined should occupy less space than the VG has? how much less?)

the iso installer reserves 4 GB, I suggest you reserve more to be sure. depends on your usage. maybe 16 GB is better.

- do I have to create 3 logical volumes (like the PVE installer does) or can the root and data volumes be the same?

create at least the data (mounted) lv. root and swap does not need to be on lvm.

- do I have to name the volumes /dev/pve/root, /dev/pve/data, /dev/pve/swap?

no, but why not?

- can I use ReiserFS or XFS for root and/or data? is Proxmox VE compatible with other filesystems, or only with ext3?

OpenVZ supports ext3 and ext4. ext4 has sometimes issues but getting better with OpenVZ. ext3 is the fastest and best supported for OpenVZ, therefore we recommend ext3.