Proxmox cluster with SAN storage

Spiros Pap

Well-Known Member
Aug 1, 2017
87
1
48
44
Hi all,

We would like to setup 10 new servers in a proxmox cluster in order to run our workloads (VMs). We also have an 100TB EMC storage that will be used for storing VMs (over iSCSI at 10G).

I would like to know what is the suggested storage setup for our hardware and setup in order to have a shared storage solution?
- Should we use zfs over iSCSI ? It seems that zfs over iSCSI has the most 'ticks' in the Proxmox storage page. Is that reliable? Is it fast?
- Should we use LVM over iSCSI? How reliable is that? What are we missing (feature wise or speedwise)?
- Should we go with NFS (the EMC does NFS natively and we don't need a server for that)
- What other options, do we have for a shared filesystem, where each cluster member could run its own VMs?

Our primary concern is reliability and after that features.

Thanx,
spiros
 
As block storage, you can go with LVM over iSCSI, it's fast but no snapshots.
As file storage, you can use NFS and then RAW (no snapshots) or QCOW2 (snapshots, thin provisioning, but slower)

ZFS can only work if your EMC does ZFS, which it does not, so that's off the table.
 
Hi all,

Thanx for the answer. If we use the LVM over iSCSI solution, does this mean that we can't backup a VM, because we don't have snapshots?

One more stupid question. When we say that LVM over iSCSI can be shared storage for proxmox, does this mean that all proxmox machines can see a single SAN LUN and each proxmox can see its own Logical Volumes (LVM) inside this LUN? or only one proxmox can use the whole LUN at a time?
The question is, if the locking is LUN wide or LV wide...

Sp

As block storage, you can go with LVM over iSCSI, it's fast but no snapshots.
As file storage, you can use NFS and then RAW (no snapshots) or QCOW2 (snapshots, thin provisioning, but slower)

ZFS can only work if your EMC does ZFS, which it does not, so that's off the table.
 
Last edited:
Thanx for the answer. If we use the LVM over iSCSI solution, does this mean that we can't backup a VM, because we don't have snapshots?

Backups and snapshots are two different concepts that are handled differently, so they work independently

One more stupid question. When we say that LVM over iSCSI can be shared storage for proxmox, does this mean that all proxmox machines can see a single SAN LUN and each proxmox can see its own Logical Volumes (LVM) inside this LUN? or only one proxmox can use the whole LUN at a time?
The question is, if the locking is LUN wide or LV wide...

Normally, you can do both, yet having one LUN for all VMs with one physical volume in one shared volume group is IMHO the easiest to maintain. Once set up, works until you run out of space. The LVM is automatically shared with correct locking (via PVE, not LVM itself) so all nodes can access everything.
 
Hi,

Until now, we have been using NFS as a shared storage on a FreeNAS server. It's as reliable as your NFS server, in our case it works great.

It is easy to manage and you can do snaps so I think is a good option to consider.

Now, as an improvement, we are going to add new servers to the cluster and configure a Ceph's cluster to share HA storage to the other nodes.

Regards,

Manuel Martínez
 
Hi again,

Regarding LVM over iscsi, I'm also doing some tests also and I have some questions on this subject:

1) If I use an LVM storage over iscsi, then from every node in the cluster sees both the iscsi storage and the LVM storage. Is it posible to hide the iscsi storage from the nodes when we use it as a shared LVM storage?

2) Acording to https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_clvm_config.html, with LVM (or clvm) is posible to create a VG on top of two different iscsi SAN. I've tried to do this using the GUI and I've not been able to add more than one device to a VG, so I've managed to do it from the CLI and then add the lv_on_top_of_both_san to storage.cfg to use it. It works but for now I think it's not useful for us because if I disconnect one SAN then proxmox storage gets freezed until I connect it again.

3) I've seen that LVM-thin supports snapshots and clones but it can't be shared between nodes. Do you think it would be posible in the future to have snaps on a shared LVM storage?

I'll probably need to read and test more but I'd like to know your opinion.What do you think of using LVM on top of two different SAN? It could be considered as a HA storage?

Thanks,

Manuel Martínez
 
Hi,

If I have your hardware (lucky me), I would test this scenario, because ... I like zfs ;) Please do not understand that what I say it is a better solution.

zfs has a new feature that is called multihost= on/off. So from what I read now (without any tests, so be kind ... ) it is possible to import the same zfs pool on many hosts, this is nice...! But this multihost pool will be usable only on a single host at any time.
I can only guess that if the active host who use this pool will be broken, then another host can be promoted as active and can use this pool.
With this supposition you can create an iscsi target on your SAN. Then you can access this iscsi server from many proxmox nodes, and use is as a zfs pool.
So if the active zpool node will be broken I hope/expect that you can use the same pool on another node. I will try for myself to test this if I can, because this ideea could be very interesting; )

Good luck!
 
Hi,

If I have your hardware (lucky me), I would test this scenario, because ... I like zfs ;) Please do not understand that what I say it is a better solution.

zfs has a new feature that is called multihost= on/off. So from what I read now (without any tests, so be kind ... ) it is possible to import the same zfs pool on many hosts, this is nice...! But this multihost pool will be usable only on a single host at any time.
I can only guess that if the active host who use this pool will be broken, then another host can be promoted as active and can use this pool.
With this supposition you can create an iscsi target on your SAN. Then you can access this iscsi server from many proxmox nodes, and use is as a zfs pool.
So if the active zpool node will be broken I hope/expect that you can use the same pool on another node. I will try for myself to test this if I can, because this ideea could be very interesting; )


Yes true, but not directly usable on a cluster, because it can only be mounted once, so live migration is not possible and a cluster with only one server capable of running your VMs is not really good.

This multihost feature is great if you have two "real" servers with ZFS that export their stuff via iSCSI, so you can build a HA-ZFS environment with an HA IP on top. Cost effective if you have external disk shelves with multiple paths, so that each server can attach to the same disks. This leads to building your own SAN and is, unfortunately, not the best you can do if you already have a SAN. There are also vendors selling exactly this kind of setup, just google for ZFS HA. I hope to see more setups using such a technique.
 
  • Like
Reactions: guletz
1) If I use an LVM storage over iscsi, then from every node in the cluster sees both the iscsi storage and the LVM storage. Is it posible to hide the iscsi storage from the nodes when we use it as a shared LVM storage?

Do you mean you see it in PVE or just on the OS?

2) Acording to https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_clvm_config.html, with LVM (or clvm) is posible to create a VG on top of two different iscsi SAN. I've tried to do this using the GUI and I've not been able to add more than one device to a VG, so I've managed to do it from the CLI and then add the lv_on_top_of_both_san to storage.cfg to use it. It works but for now I think it's not useful for us because if I disconnect one SAN then proxmox storage gets freezed until I connect it again.

The use of clvm is not mandatory anymore, because PVE does all the locking and if you do not create volumes by hand, you're fine.

You can use different kinds or RAID-like assemblies with LVM, that's right. Normally, you have multiple physical volumes that are part of one volume group leading to a RAID-0like setup with possible (you have to do it manually or reorder) enhanced throughput. You can also create a mirrored setup, yet as far as I understand if, you can only create mirrored volumes, not volume groups, so that you have to specify the mirrored volume on volume creation. Other software raid setups are probably not cluster-aware.


3) I've seen that LVM-thin supports snapshots and clones but it can't be shared between nodes. Do you think it would be posible in the future to have snaps on a shared LVM storage?

Would be great, yet I do not know of any plans to do that. Maybe you should ask on their mailing list over at RedHat:
https://www.redhat.com/mailman/listinfo/linux-lvm

I'll probably need to read and test more but I'd like to know your opinion.What do you think of using LVM on top of two different SAN? It could be considered as a HA storage?

Is your SAN not HA, or do you mean a stretched SAN over a multi kilometer/miles wan route so that one single, big incident can wipe everything?
For most SANs we use, we have "redundant-everything" SANs with at least dual heads, dual UPS, dual switches, etc. In hardcode cases we have datacenter replication, because the distance (and therefore delay) is big and not practical.

In case of your actual problem, I do not know of any solution and have never tried to do LVM based mirroring, we always went with RAID below, external redundancy or hyperconverged storage with CEPH.
 
Do you mean you see it in PVE or just on the OS?

I mean from the Proxmox web interface. I'm mounting the iscsi LUN from proxmox/storage and then I see the storage under every node.
As I'm using it not directly because I've defined a Logical LVM volume to use on my nodes, it would be good to hide the scsi devices used by LVM.


The use of clvm is not mandatory anymore, because PVE does all the locking and if you do not create volumes by hand, you're fine.

Yes, I've read this before on the forum


You can use different kinds or RAID-like assemblies with LVM, that's right. Normally, you have multiple physical volumes that are part of one volume group leading to a RAID-0like setup with possible (you have to do it manually or reorder) enhanced throughput. You can also create a mirrored setup, yet as far as I understand if, you can only create mirrored volumes, not volume groups, so that you have to specify the mirrored volume on volume creation. Other software raid setups are probably not cluster-aware.

Ok.

Would be great, yet I do not know of any plans to do that. Maybe you should ask on their mailing list over at RedHat:
https://www.redhat.com/mailman/listinfo/linux-lvm

Is your SAN not HA, or do you mean a stretched SAN over a multi kilometer/miles wan route so that one single, big incident can wipe everything?
For most SANs we use, we have "redundant-everything" SANs with at least dual heads, dual UPS, dual switches, etc. In hardcode cases we have datacenter replication, because the distance (and therefore delay) is big and not practical.

I just want to avoid a single point of failure in my storage. I already have HA on my Proxmox nodes but, as I'm using NFS from a single Freenas NAS it could fail. We use LACP bonds and stacked switches and we have another freenas server with same ram, cpu, disks just for replicas and backups. When I mean HA in storage I mean a reduntant network raid on top of two local SAN/NAS.

I've seen that NAS4Free has a HAST/CARP solution but we like FreeNAS so I was just wondering if, apart from CEPH, there are other choices that we can implement with our current hardware.


In case of your actual problem, I do not know of any solution and have never tried to do LVM based mirroring, we always went with RAID below, external redundancy or hyperconverged storage with CEPH

Thanks
 
I mean from the Proxmox web interface. I'm mounting the iscsi LUN from proxmox/storage and then I see the storage under every node.
As I'm using it not directly because I've defined a Logical LVM volume to use on my nodes, it would be good to hide the scsi devices used by LVM.

Best would be to use it with the OS integration and not the Proxmox VE one, then you have only the LVM in your GUI.
https://wiki.debian.org/SAN/iSCSI/open-iscsi

I've seen that NAS4Free has a HAST/CARP solution but we like FreeNAS so I was just wondering if, apart from CEPH, there are other choices that we can implement with our current hardware.

Yes, I'd recommend some HA on the storage side. CARP is great in FreeBSD for failover-stuff.
 
Hi all,

We would like to setup 10 new servers in a proxmox cluster in order to run our workloads (VMs). We also have an 100TB EMC storage that will be used for storing VMs (over iSCSI at 10G).

I would like to know what is the suggested storage setup for our hardware and setup in order to have a shared storage solution?
- Should we use zfs over iSCSI ? It seems that zfs over iSCSI has the most 'ticks' in the Proxmox storage page. Is that reliable? Is it fast?
- Should we use LVM over iSCSI? How reliable is that? What are we missing (feature wise or speedwise)?
- Should we go with NFS (the EMC does NFS natively and we don't need a server for that)
- What other options, do we have for a shared filesystem, where each cluster member could run its own VMs?

Our primary concern is reliability and after that features.

Thanx,
spiros

Hi,

As I pointed before, the use of NFS as a shared storage works fine. You can use it as a shared storage and move quickly the virtual machines between the nodes (live migration) and do snaps.

You can also connect directly iscsi LUN (managed from proxmox) to some of your virtual machines as a data disk. This works great if you want to rely on your SAN to do frequent snapshots or to do zfs replicas of those snapshots as we do.

We have tested both aproches in the past: to connect the LUN using a iscsi initiator directly from de VM or to manage the iscsi connection from proxmox and then serve the disk to the VM as a sata, scsi, ide or virtio device. We found that it was better for us the second option, as it worked better when moving a VM from one node to another one or during night vm backups (when using the first option we had some little problems maybe related to the iscsi initiator manager we used on the vm).

Hope it helps,

Manuel Martínez
 
Last edited:
Hi,

I'm doing LVM over iscsi for my cluster and there are some advantages and some inconvenients.:

From what I've read this is the best option for performance as vm disks are raw.
LVM is really good documented in Internet.
This can be space wasting if you migrate from VMware to proxmox because you have to them into flat format.
As disks are big our nightly backups with vzdump are very long.
You can't do snapshots.
If you have a snapshot/replication capable SAN you can do snaps and clone them for testing (I'm doing it one another node not belonging to the cluster) you can do the same for replicated Lunsford and test them in another site.

Sincerely,
 
Thank you for your answer LnxBil.I tried this on one linux VM and backup time was almost the same.
Disk space is 120 GB,speed 31 MB/s and backup time is 4000 sec ,so it backup the full disk but size of backup is less important (60GB).Is it because my disk image is .raw format?
 

Attachments

  • proxback.JPG
    proxback.JPG
    53 KB · Views: 28
The backup time will not differ much with zeroing, yet the resulting file size does.

Your system is really, really slow with 31 MB/sec for a SAN. In our 8 GBit FC network, we see a throughput of roundabout 800 MB/sec, depending on other activities.

This is my personal experience, so bare with me:
I was never fond of iSCSI, because I've never experienced a fast one. Every customer I worked for with an iSCSI SAN had performance problems, therefore we exclusively recommend to use FC-based storage, which never failed our expectations and is - with respect to our needs - cheaper with respect to throughput per Euro.
 
The backup time will not differ much with zeroing, yet the resulting file size does.

Your system is really, really slow with 31 MB/sec for a SAN. In our 8 GBit FC network, we see a throughput of roundabout 800 MB/sec, depending on other activities.

This is my personal experience, so bare with me:
I was never fond of iSCSI, because I've never experienced a fast one. Every customer I worked for with an iSCSI SAN had performance problems, therefore we exclusively recommend to use FC-based storage, which never failed our expectations and is - with respect to our needs - cheaper with respect to throughput per Euro.

Yes we are in a GB network and our SAN only has 2 active GB port by controller for 5 nodes and some backups are running during production We are looking for upgrading to either 10GB or FC (8G or 16 G?).So as i understand you are FC pro.Your customers were in GB or 10 GB network?Do you think 10 Gb is not a way to go?
 
10 GBE is standard for many years and I've never seen an iSCSI SAN in a production system without 10 GBE or higher. Still, they were not near the performance of FC. This could be due to the fact that iSCSI is the less expensive technology and there are SANs out there that are technically SANs, yet the performance is not enterprise grade. Those entry-level SANs are often bought (due to their price), yet you often have performance problems with them. If you buy an expensive iSCSI SAN, I suppose, you get decent performance. The entry price for FC-based SANs is much higher, yet you won't get a crippled device. Often companies like NetApp and such have great technology and great products, yet you have to buy the higher end models and license everything - so you have the same ball-park figures like you have with an expensive FC-based SAN.

FC is currently also at 32 GBit, so really fast, yet the HBAs and switches are expensive. We run older servers still with 4 GBit FC (throughput at approximately 400 MB/s) and the system still "feel" very fast.
 
The backup time will not differ much with zeroing, yet the resulting file size does.

Your system is really, really slow with 31 MB/sec for a SAN. In our 8 GBit FC network, we see a throughput of roundabout 800 MB/sec, depending on other activities.

This is my personal experience, so bare with me:
I was never fond of iSCSI, because I've never experienced a fast one. Every customer I worked for with an iSCSI SAN had performance problems, therefore we exclusively recommend to use FC-based storage, which never failed our expectations and is - with respect to our needs - cheaper with respect to throughput per Euro.

Hello LnxBil:

I'm currently evaluating Proxmox as a replacement for one of my vSphere 6 clusters. My setup is:

IBM Bladecenter H
IBM HS22 Blades
Qlogic HBA modules
Dual FC San Switches
IBM DS3525 Dual-controller SAN storage connected via Fibre Channel network

Same as you, we have always relied and trusted FC over iSCSI/NFS and would like to be able to setup a Proxmox cluster that supports live migrations and snapshots using the above mentioned hardware. Can you provide me some setup recommendations to help me achieve my goal?

Thanks for any help you can provide.

Regards,

Dennis
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!