Advice on Shared Storage for Proxmox Cluster

D

deejross

Guest
I have been using Proxmox VE for a few months using local storage. While running a few VMs at the same time, I'm starting to notice some IO delays. So I started thinking of how to improve performance and allow things like live migration when I want to add a second Proxmox server. I was originally looking at the Synology 5 bay device, but now I'm thinking of going with a custom built machine (out of spare parts) running FreeNAS. Either way, I know that I have to use iSCSI, but supposedly you can't use an iSCSI target with more than one machine at a time.

So my first question is, how do I do shared storage between two machines using iSCSI? Do I follow the two steps listed here: http://pve.proxmox.com/wiki/Storage_Model#LVM_Groups_with_Network_Backing on both machines using the same iSCSI target?

Second, if I am using a 4 disk RAID-Z (3TB SATA x4) for a total of 9 TB usable (using one disk for parity) for my FreeNAS box, what kind of performance should I expect to see? I would hope it would be faster than local SATA storage, but are we talking about a significant performance improvement?

I don't have a lot of money to spend on this, as it's more of a hobby/experiment right now, but I will be using this setup to run a few web servers (OpenVZ) and maybe a couple of Windows desktop instances (KVM). Eventually, I would like to get a fencing device for HA, but right now I have to get storage taken care of.

Thanks for any advice and helpful tips you can give me!
 
You can use iScsi as shared storage. Proxmox ensures that only one machine at time connects to the same volume on the iScsi server.

Freenas uses istgt - http://www.peach.ne.jp/archives/istgt/ for iScsi. I would not recommend it, it gave me a lot of errors after a few days with heavy use (on bothe freenas and debian). Iscsitarget for Debian was approximately the same. Windows storage server 2k3 was reliable but very slow, maybe a configuration error, but i could not find it. There is an alternative iscsi implementaiton for freenas (SCST), but when i researched it a year ago it seems a bit complicated to setup

My experience, http://stgt.sourceforge.net/ in 1.0.19+ versions works well, reliable and very simple to setup. It is the standard iscsi server on redhat/centos, but i use it on Debian Squeeze.

You could also use NFS, which should work fine and have some advantages.

Best regards
 
Thanks for the insight. I thought about using NFS, as it's supposed to be as fast, if not faster than, iSCSI, but the only issue that bothers me about it is that you can't do snapshots with it. I have scheduled my backups to run twice a day for all my VMs and I really don't want to have to suspend them all just to do a backup. If there's a way to make that work with NFS, then it would be perfect.
 
Hi, just a footnote on this thread: I have used "OpenFiler" as a generic NAS appliance for iSCSI target at a number of client sites, and it is a very solid product. Free / runs CentOS under the hood; good web management interface; and not too hard to use once you get used to it.

iSCSI is definitely no problem as a shared storage target for ProxVE (or a variety of other VM platforms... there is a whole industry built on this right now :)

Openfiler also supports channel bonding // port trunking -- if your hardware supports it (ie, ideally LACP support on the Switch; plus the proxVE clients need multiple nics to benefit too, ideally) - configured thus, it can give better IO throughput than just single-interface gig-ether.
(Although really saturated single gig ether throughput is nothing to sneeze it .. it just gets a bit of a bottleneck if you have a number of IO hungry VMs hammering it at once)

Hope this helps a bit,


Tim Chipman
Fortech IT Solutions
http://FortechITSolutions.ca
 
Last edited:
I'd second the recommendation on OpenFiler. If you are looking for an 'enterprise tough' SAN solution, its hard to beat.
 
Bu the way. Mabe somebody use it...
Who ever uses some HA iSCSI cluster (with Hearbeat and virtual IP bounded to iSCSItarget service)?

Asking because I try to make HA Cluster storage solution (http://http://eugenyho.blogspot.com/), and seems to be
when one node is rebooted and heartbeat service move virtual IP to another node and up iscsitarget service, the PM 2.0 can not properly handle this situation.

I am trying to investigate this problem, but can someone already did?
 
Ok, so if I go with Openfiler (seems to be the general consensus), Would I set it up like this:

iSCSI target of 500 GB which will hold all of my VM images and containers, then have my two machines connect to this same 500 GB target? They would both see the same data, and that's what makes live migration possible? Or would I set up two iSCSI targets (250 GB each) for the nodes? If each node has it's own storage space, is live migration still possible?

I am trying to clarify my question, because everything I read about iSCSI targets says only one machine can access it at a time or else there will be data corruption (similar to two machines accessing the same physical hard drive at the same time). But I see some conflicting information that says, "yea I do this all the time, and it works great".

Thanks for your patience. I'm still wrapping my head around this whole iSCSI thing.
 
You can't live migrate with different iscsi LUN for each machine - and it is NOT neccescary.

You can corrupt fileysstems if you mount the fileystems on iscsi drives on two machines on the same time, but proxmox is splitting up each iscsi LUN in to different LVM partitions. Proxmox ensures LVM partitions is only mounted from ONE machine at a time, so you do not risk to "double mount".

Openfiler without the "advanced iscsi target plugin" is using iscsitarget.... see also

http://www.openfiler.com/products/advanced-iscsi-plugin-faq

My expereience with openfiler was the same as iscsitarget on Debian, worked ok but after a few days with (very) heavy use it disconnected clients etc.

I woult still go with STGT or LIO.

Best regars.
 
If you pick your hardware wisely, you can also try the OpenIndiana (Solaris) based ZFS and use Napp-IT as a front end. Nexenta is much more slick but the free version hits a wall at 18TB before you have to purchase a license. We are running both in house. Two Proxmox nodes pulling from a single 8 disk ZFS array with mirrored flash for log and one flash for read cache L2ARC. It is crazy fast over iscsi (seeing line speed over the four gigabit links to 17 Windows 2003 server VMs using virtio). The Napp-IT one has five drives and only 2GB of ram cache we use for NFS backups, ISOs, and SMB for email archives. We usually see 30 - 50 MB/s transfers with 20 - 50 simultaneous users on a single GB link in a mostly 90/10 read write ratio.
 
Just noticed you said RAID-Z. With RAID-Z your writes will be as fast as a single drive and reads will depend on cache. Use plenty of ram (I would advise 16GB as it's cheap enough to get for $100). Do not use dedupe or you will kill your performance.
 
Hi charnov,

i like your idea to use ZFS on OpenIndiana with napp-it. But i want to think a bit further: why not virtualize the ZFS Storage Software by installing ZFS/napp-it with LSI HBA's inside two nodes (HA) and connect them to one JBOD Storage?

At the Moment i test this with a virtualized Nexenta CE on ESXi 5. Therefor i use one LSI 2008 Controller (IT-Mode FW) in Passthrough and connect the HDDs directly to the Nexenta VM. Additionaly i put 2 Intel 1Gbit Cards in Passthrough also, to have the Storage available in the LAN. Its running for few month now with no failures. I use this to Backup my VMs from my primary ESXi Srv. also. The only bad thing is, that it is not possible to install the VMware-Tools to Nexenta. With VMware-Tools it would be possible to use VMXNET3 to build a virtual 10Gbit Network inside the VM.

I think this should be possible with Promox 2.x. inside a dedicated KVM, also?

Idea for the Hardware in two nodes:
HBA: LSI SAS 9200-8e
NIC: Intel 10Gbit X520-DA2 (maybe if more then two nodes should used)

For JDOB: Supermicro SC837E26-RJBOD1

Maybe one QNAP NAS as additional Backup Solution.

What are you thinking about?

Greeting
Udo
 
You could have a look at Debian-based OpenMediaVault. I'm using it as OS on the storage server in our Proxmox test cluster.

I tried with Openfiler and have it running on a older storage server in our Proxmox production environment but I'll move away a.s.a.p.
The authentication mechanisms (other than the built-in) is buggy to say the at least and I was not able to change the iSCSI service from IET to SCST.
The iSCSI-plugin I found for OpenMediaVault is also IET but that I know how to change.
Inconsistent messages in the Openfiler forum from users, and the lack of proper guidance (also from the Openfiler team) has set my mind on moving to OpenMediaVault all over.

Yes, I'm very biased towards Debian-based solutions like Proxmox and OpenMediaVault since I have spend ever too much time on solving problems with other systems. I don't dislike RHEL/Centos but I do what I can to avoid them.
 
You could have a look at Debian-based OpenMediaVault. I'm using it as OS on the storage server in our Proxmox test cluster.

Hey Morten, tnx for ur answer. But i think i will go with ZFS as Main-Storage. Anyway, the OpenMediaVault could be a Solution for the secondary (alternative for QNAP) Backup.

Greetings
Udo
 
If you can get PCI passthrough to work flawlessly, then go for it. I had all kinds of trouble figuring out what Intel-based motherboards really, truly support it right now (AMD based solutions seem to have more broad support). You'll lose migration of that VM and that node will be a single point of failure. 10GbE can really beat the crap out of a servers CPU/PCIe bus trying to keep up with the IO and is still very pricey per port. I have seen bus starvation between a good card like the LSI and big NICs on shared buses. Of course this was for a video server that streamed full 1080p to many users simultaneously. What bandwidth and IO requirements do you really have? My setup is three 1Gb per proxmox (2 goes to to the servers VLAN and 1 is for the SAN) for 23 VMs with 100 users (SQL server based ERP, Exchange, AD, fileservers, real time backup, etc.) and we don't come close to saturation. The issue you will see with going iSCSI is while you can get near wire speed on throughput, if you have lots of little file read/writes you'll see latency. If that is what happens then you would start thinking of going the Infiniband route which can be cheap if you use older HBAs from eBay and plan your topology to get around the need for a switch (they are freaking expensive). There's a little bit of info on the wiki for Infiniband as it can be tricky.

I also noticed that OpenSolaris/Illumos/OpenIndiana doesn't support VMXNET3 very well or at all and there are issues with jumbo frames. I do not know if that is fixed in current releases but there are ways of installing a version of the vmware tools. I believe virtio is not all that hot either, but I am betting this has improved some. In my experience, though, Solaris is not that forgiving when it comes to adding layers between it and hardware, even via passthrough.
 
Tnx charnov,

i don't see any Probs with the PCI Passtrough on my ESXi 5. Neither the LSI HBA nor for the two Intel Nics. Only the VMware Tools are not working inside the Nexenta CE. But this might change with the new upcomming Version 4. They changing the underlaying OS from Nexenta Core to Illumos/OpenIndiana. I have running a second vSAN with OpenIndiana and napp-It. No Probs with VMXNET 3. But this is for VMware and I would like to setup a small Proxmox 2.0 Cluster with Failover :)

That sayed, I never worked before with Proxmox. But what i read about sounds good to try. The requirements are not that big. I would run about 25-30 VMs, most with CentOS and Plesk Hosting. I will modify the Apache and MySQL configs and also run PHP Accelerator and memcached on each Host. Beside this Webservers i would run 2 Mailgates, 4 MySQL Servers and 2 Firewalls (HA). Not all Hosts must be HA. The Webservers should run very fast! The risk to fail because a single SAN (spof!) is minimal. A System fail for only a few ours will be tolareted but should always minimized. My way to the Datacenter where the Hardware located is about 25 mins. I think only Spares are realy needed.

One more question: How many phys. Hosts are needed to build a minimal Cluster? What i read about, it must be min. 3 hosts in total. Do all of this Hosts are need the same hardware configuration (CPU, RAM, etc.), or could i use one host with a minimized config?

Cu, Udo
 
Last edited by a moderator:
We've had good experience with Nexenta Community edition setups for ESXi iscsi targets. I'm in the process of building a SuperMicro rig with 4U and 36 drive bays. I'm only starting with 8 drive bays filled, but will add them as we need the expansion or IOps.
 
Just a note to the Nexenta Community Editions, take from "http://www.nexentastor.org/projects/site/wiki/CommunityEdition"

"Production use is not allowed with Community edition. It is designed and intended to help people become familiar with the product or for hobbiest use. Not designed to be run in the enterprise."
 
Good point Tom, their licenses aren't terribly expensive for larger companies to absorb, the upside is you can build a great performing SAN using your own hardware and save a ton of cash.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!