Proxmox and RAID scenarios

Kordian

Active Member
Mar 31, 2018
26
0
41
52
Hello,
In the days of old the standard was to have a separate hardware raid for the os and a separate hardware raid for data. Now with virtualization I do not find much info about raid scenarios for proxmox.
I have a server with 8 hdds, raid controller, 16 cores and 382gb of ram.
Planned is a proxmox with separate http server, mariadb server, mail server and a couple of small linux/win servers. For only a small number of users.
I want the best performance.
1. What would be the best raid config:
a. Leave hardware raid and install proxmox with soft raid?
b. Hardware raid - Create separate raid1 partition for proxmox, then the rest as raid10 for all the rest vm?
c. Hardware raid - Create separate raid1 partition for proxmox and then separate hardware raid1 for each server (maria, mail, http)
d. Any other ideas?
2. Would also be nice to have a storage volume accessible for all servers - is nfs the only option here? I guess placing data (for example database or mail storage) on such nfs is not perfect from the performance point of view? Or how does proxmox tackle such scenarios?
3. I have also sd card slots. Read that proxmox can also be used (after some tuning). Is it a good idea from performance point of view?
Thank you in advance for advices.
 
1. What would be the best raid config:
a. Leave hardware raid and install proxmox with soft raid?
b. Hardware raid - Create separate raid1 partition for proxmox, then the rest as raid10 for all the rest vm?
c. Hardware raid - Create separate raid1 partition for proxmox and then separate hardware raid1 for each server (maria, mail, http)
d. Any other ideas?
Using hardware RAID is legitimate and usually works well.
Using a software RAID, in case of PVE only ZFS is supported, works well too. Additionally, you get some nice features from ZFS. The main ones are checksumming of everything, thus it knows if a bit flipped on one disk and is usually able to repair it. Replicating VMs between nodes on a regular interval works only with ZFS and you could use the send/recv mechanism to send incremental backups to another ZFS storage, for example with pve-zsync.

If you go down the ZFS route, be aware that in the best case scenario it does get access to the disks as raw as possible -> HBA controller (flashed) in IT mode.
With enough of RAM available you can get great read performance in a usual setup as almost 100% of the read request can be satisfied from RAM. Depending on the storage you can optimize for write access by using a dedicated fast SSD (e.g. Intel Optane) as SLOG/ZIL device which is used to write out sync writes before writing them to the slower disks. This does not need to be big as it only stores a few seconds of data, a few GiB are usually enough.

Another way to speed up ZFS operation is with the so-called special device. This vdev type is used to store the metadata of the pool, instead of having it on the slow HDDs with the actual data. Thus, reducing the access time to the metadata. You can even configure datasets to store small files directly on the special devices (useful for containers, not VMs). The special device needs to have redundancy because if it fails, the pool will fail as well.

How you split up the Disks into different RAIDs is up to you. Other people might be more opinionated. In general for a VM workload where IOPS is of importance, a RAID10 like ZFS pool (made up of mirror vdevs) is better than any raidz. It also won't need to store additional parity data for the VM disks which is a surprise for most people as the disk needs more space than anticipated in a raidz pool.

2. Would also be nice to have a storage volume accessible for all servers - is nfs the only option here? I guess placing data (for example database or mail storage) on such nfs is not perfect from the performance point of view? Or how does proxmox tackle such scenarios?

Yes, NFS or Samba/CIFS are the go to solutions for this. In the future, virtio-fs will hopefully be an alternative.

3. I have also sd card slots. Read that proxmox can also be used (after some tuning). Is it a good idea from performance point of view?
While some users seem to use this kind of setup, this is not officially supported.
 
Thank you.
As far as I can see the software raid option is not possible as I have a controller without the passthrough option (IT mode). There is some flashing possible, but do not want that.
Any other comments highly appreciated!
 
Hi,

If you do not go to IT mode for your HW Raid, and if your HW controller is good and do not lie about io opperation you could create a raid0 for each hdd who will be a zfs member and disable your HW raid cache if this is possible.
Then you can test this setup, and the best test is to poweroff your server, plugging out your outlet cable during the normal operation. If this test(I use to do this for 5-6 times / day during one week) do not show any zfs corruption, then you have a fair enough chance to be ok in the production period.


Good luck / Bafta !
 
Hi,

If you do not go to IT mode for your HW Raid, and if your HW controller is good and do not lie about io opperation you could create a raid0 for each hdd who will be a zfs member and disable your HW raid cache if this is possible.
Then you can test this setup, and the best test is to poweroff your server, plugging out your outlet cable during the normal operation. If this test(I use to do this for 5-6 times / day during one week) do not show any zfs corruption, then you have a fair enough chance to be ok in the production period.


Good luck / Bafta !
Thank you! I will check.
How does it look when a drive fails under zfs. Is this relatively easy to replace it without data loss? I have read one needed a procedure to perform such a replacement in the past.
Hardware raid with hot swap drives is a piece of cake in this respect...
 
You can add spare drives to a ZFS pool.

The most common administrative tasks with ZFS are covered in the documentation [0]. If you just want to play around and see how these things work and behave you can use files instead of disks. This way you can also easily simulate corrupted data by using dd to write some random data to one the files.

To use files use the truncate command to create a few files and in the zpool create command you use the full paths to these files instead of disks.

For example, creating a RAID10 like pool with a spare disk:
Code:
# truncate --size 2G d1.zfs d2.zfs d3.zfs d4.zfs d5.zfs
# zpool create -o ashift=12 testpool mirror `pwd`/d1.zfs `pwd`/d2.zfs mirror `pwd`/d3.zfs `pwd`/d4.zfs spare `pwd`/d5.zfs
# zpool status testpool
  pool: testpool
 state: ONLINE
  scan: none requested
config:

    NAME                     STATE     READ WRITE CKSUM
    testpool                 ONLINE       0     0     0
      mirror-0               ONLINE       0     0     0
        /testdirectory/d1.zfs  ONLINE       0     0     0
        /testdirectory/d2.zfs  ONLINE       0     0     0
      mirror-1               ONLINE       0     0     0
        /testdirectory/d3.zfs  ONLINE       0     0     0
        /testdirectory/d4.zfs  ONLINE       0     0     0
    spares
      /testdirectory/d5.zfs    AVAIL   

errors: No known data errors


Should you go down the RAID0 for single disks on the hardware RAID (which I do not recommend) you will also have to manage these hardware RAID0s when replacing a disk which does increase the level of complexity quite a bit in my opinion.


[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_zfs_administration
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!