New Proxmox 2 Cluster (HA, SAN, Migration, ...)

michaeljk · Oct 10, 2012

Hello,

we are currently using Proxmox 1.9 with 4 Cluster-Nodes and about 100 VM's (mainly KVM, some with openVZ). Hardware-Configuration for each node:

2x QuadCore Intel Xeon E5620 @ 2.60 GHz with Hyperthreading
32 GB RAM
6x 300 GB SAS Drives (Hardware RAID10)

In the next few weeks we need to move our cluster to a new datacenter. The new cluster will be based on Proxmox 2, we will backup each VM and transfer it to the new location. With the new setup, we would like to have a new and flexible storage solution. Unfortunatly, there are many solutions around - Hardware SAN, Distributed Replicated Storage (GlusterFS, MooseFS, Ceph, SheepDog, ...), Software SAN like Nexenta, NFS and more...

Hardware SAN is too expensive. We will have 2x GBit NIC's in each of the new nodes and tried already some Distributed Filesystems, but they were either too slow in terms of I/O or they had too many bugs and haven't been recommended for production use at this time. We want a flexible solution, so that faulty hardware can be removed without interrupting the cluster and that we can easily extend the storage size at anytime. As mentioned before, we also have some openVZ systems - but if it's necessary, we would replace them with regular KVM images to get a good and working solution. A software SAN which will be build with 10 TB of storage in the beginning which cannot be extended is no solution for us.

I know that Proxmox 2 has beta support for Ceph and SheepDog, but I'm not sure if they are ready for production use yet. If someone has already built a working cluster with up to 250-500 VM's with Distributed Storage, then I would really like to see your recommendations for the configuration and why you choosed a specific storage backend.

Michael

spirit · Oct 10, 2012

I'm testing ceph since some months, it's pretty stable, I can got around 20.000io/s in random read/write with 4K block (ceph have a commercial support from intank if you want)
Sheepdog is more unstable, have some crash with it, but performance are around 40.000io/s.
A glusterfs native client is coming for kvm 1.3. (But I don't known the performance of glusterfs for kvm....)
Nexenta: we have a plugin to manage it (volume creation,iscsi mapping,snapshots,...). I'm running a nexenta ha-cluster since 2 year,it's works fine.

For your need, I think you should look at ceph. (RBD).

michaeljk · Feb 21, 2013

Sorry for getting this topic up again but unfortunatly we have no final solution yet. Since all our tests with Distributed Replicated Storage haven't been stable enough, we currently consider using normal NFS storage. What do you think of the following components and configuration - would it be fast enough, especially with Gigabit Ethernet?

8x Proxmox-Hosts:
QuadCore Intel Xeon
16 GB RAM
2x 500 GB SATA (Software-RAID1)
Gigabit-Port

1x Storage Server:
QuadCore Intel Xeon
8 GB RAM
24x 1TB SATA2 (Hardware RAID10)
2x Gigabit Ports (Bonded)

Do we need to have fast storage on the Proxmox-Hosts if the KVM-Images / openVZ servers are located on the Storage-Server? My thought is that I can disable the local storage on the Proxmox hosts completely and use the NFS storage for both the VM's and ISO-Images.

mir · Feb 21, 2013

There is only one catch to OpenVZ on NFS. You cannot do live migration without some outage on the containers since Proxmox only support live migration in suspend mode. Given your hardware specs I would recommend KVM for all VM's in a HA setup. Your storage server looks like an ideal solution for OpenIndiana so you could consider this added napp-it http://wiki.openindiana.org/oi/napp-it,+ready+to+use+NAS+SAN+storage+server+with+WEB-GUI

Then you can drop your RAID10 and go for a ZFS pool. A ZFS pool literally scales indefinitely

From wikipedia:
[h=3]Capacity[/h]ZFS is a 128-bit file system,^{[citation needed]} so it can address 1.84 × 10¹⁹ times more data than 64-bit systems such as Btrfs. The limitations of ZFS are designed to be so large that they would never be encountered. This was assured by surpassing physical rather than theoretical limitations—filling a 128-bit file system would require more energy than that needed to boil all the oceans on planet Earth.^[51] Some theoretical limits in ZFS are:

2⁴⁸ — Number of entries in any individual directory^[52]
16 exabytes (2⁶⁴ bytes) — Maximum size of a single file
16 exabytes — Maximum size of any attribute
256 zettabytes (2⁷⁸ bytes) — Maximum size of any zpool
2⁵⁶ — Number of attributes of a file (actually constrained to 2⁴⁸ for the number of files in a ZFS file system)
2⁶⁴ — Number of devices in any zpool
2⁶⁴ — Number of zpools in a system
2⁶⁴ — Number of file systems in a zpool

mmenaz · Feb 21, 2013

I'm not an expert of clusters, just jumps to my eyes that 2xGbit on the server looks really a bottleneck to server 8 nodes. Put more bounded nics or, if you have the budget, you could buy a 10Gbit optical nic and plug in a 24xgbit switch with a 10Gbit "upstream" optical port (maybe this is ok http://enterprise.alcatel-lucent.com/?product=OmniSwitch6450&page=features). Never tried this config so you have to investigate yourself, but I think it should work pretty well.

mir · Feb 21, 2013

With your storage server why not go for OpenIndiana and ZFS added napp-it ontop?
http://wiki.openindiana.org/oi/napp-it,+ready+to+use+NAS+SAN+storage+server+with+WEB-GUI

[h=3]Capacity[/h]ZFS is a 128-bit file system,^{[citation needed]} so it can address 1.84 × 10¹⁹ times more data than 64-bit systems such as Btrfs. The limitations of ZFS are designed to be so large that they would never be encountered. This was assured by surpassing physical rather than theoretical limitations—filling a 128-bit file system would require more energy than that needed to boil all the oceans on planet Earth.^[51] Some theoretical limits in ZFS are:

2⁴⁸ — Number of entries in any individual directory^[52]
16 exabytes (2⁶⁴ bytes) — Maximum size of a single file
16 exabytes — Maximum size of any attribute
256 zettabytes (2⁷⁸ bytes) — Maximum size of any zpool
2⁵⁶ — Number of attributes of a file (actually constrained to 2⁴⁸ for the number of files in a ZFS file system)
2⁶⁴ — Number of devices in any zpool
2⁶⁴ — Number of zpools in a system
2⁶⁴ — Number of file systems in a zpool

For HA you should forget about OpenVZ since live migration is only supported in suspend mode.
[h=3][/h]

michaeljk · Feb 21, 2013

@mmenaz:
This was also my doubt. We also have the option do use a dualport NIC with 4x 1GE (bonded) or a 10 GBit card with a specific uplink port on the switch - but will the configured RAID10 offer the speed for this with normal SATA2 drives?

@mir:
OpenIndiana with ZFS sounds very interesting, especially with the snapshot options. Which hardware options do you suggest for that, do we need to have a special configuration (e.g. SSD, lot of RAM, Hardware RAID, ...) for that kind of backend system?

mir · Feb 21, 2013

For speed the napp-it guys recommends installing OpenIndiana on a small SSD - you might consider a raid1 for this installation -> http://constantin.glez.de/blog/2011/03/how-set-zfs-root-pool-mirror-oracle-solaris-11-express

RAM: The more the merrier. For your use I would consider going for >=32GB since RAM comes cheap these days.
Hardware RAID: For ZFS it is actually not recommended to use Hardware RAID since ZFS needs to know and control every bit of your storage.
Given more RAM and SSD('s) for the root pool your storage server is more than capable of serving your needs.

Update: For even more improved speed ZIL and ARC could be installed on separate SSD's but this can always be done at a later time if your are not satisfied with the speed. In general RAM is the key issue for speed and IOPS.

UpdateII: For speed and IOPS go for several mirrored vdevs in a pool instead of one big vdev for the pool.

Search

Search

New Proxmox 2 Cluster (HA, SAN, Migration, ...)

michaeljk

Renowned Member

spirit

Distinguished Member

michaeljk

Renowned Member

mir

Famous Member

mmenaz

Renowned Member

mir

Famous Member

michaeljk

Renowned Member

mir

Famous Member

We value your privacy