Design considerations for a new 5 nodes PVE cluster with FiberChannel SAN

sparkx

New Member
Jul 24, 2021
3
0
1
124
Hello all,

Our final goal is to have a "as close as possible to HA" PVE cluster with the best performance using the following hardware:
Switches:
1 x 24 gigabit switch with 2 port bonding to each server - I know I should get 2 of them but for now I think it's not an issue
There is no FiberChannel switch so connections will be Point-to-point from each server to the SAN's HBAs (cards), 4 of them in FABRIC mode.
Now the servers:
1 x Fujitsu Eternus DX200 S4 FiberChannel SAN with ONLY 4 FC ports, 4xSSD (400GB each) and 12xSAS drives (1.2TB each). No FCoE or NAS interface is available.
2 x Fujitsu Celsius C740, lets call them Server1 and Server2, each with 1 local 250GB SSD to boot from. Only Server2 has a FC HBA connected P2P to SAN.
3x Fujitsu Primergy RX2540 M4, lets call them Server3, Server4 and Server5, each with Raid1 2x300GB sas to boot from. Each server has FC HBAs connected P2P to SAN.

Proposed scenario, questions, confusions and misunderstandings:
So far everything is clear, but now we came to the shared storage for the cluster and here our thoughts get foggy.

- Server1 will act as our internet firewall and gateway for the cluster and local network with VM images stored locally, no access to SAN due to lack of hardware.
- Maybe it would be a good idea to create on the other nodes a samba docker container and let it have access to SAN shares trough this NAS, would this be possible?​
- VM's and CT's images and disks and docker container storage will be kept on SAN to be available to ALL the 4 nodes trough the FC cards at the same time, would this be possible, using what storeage tech from the ones we have available?
- Eternus DX200 S4 SAN can expose over FC the following resources:
- RAID Groups that can be created with either:​
- High Performance RAID1+0​
- High Capacity RAID5​
- High Reliability RAID6​
- Reliability RAID5+0​
- Mirroring RAID1​
- Stripping RAID0​
- Thin Provisioning Pools can be created ONLY based on the above RAID options​
- Block Volumes (of Type Standard and TPV, what does this mean?) can be created and added to LUNs on top of the said RAID Groups or Thin Provisioning Pools.​

I would have went for CEPH with all these drives in the SAN but it does not have the option to expose them unless they are in some of the RAID formats from above so I think CEPH would be redundant here since RAID would already provide the needed data security level, am I wrong, would it make the system faster or slower?

So the big question is what should we choose and what should we use in PVE (SCSI groups, CEPH, ZFS, LVM, Thin LVM, I only have a vague idea of what they are) to make it work?

I've searched the forum for some guide to what I think we want to build but could not get a clear image of how it should be possible or what to use.
Any help to clear the fog would be highly appreciated, thank you!

 
In a bit of the different order:

- Using Ceph with SAN backed storage that provides its own RAID protection is not a great idea. You will cut your already reduced capacity by 3-4x. Your backend SAN cache will be overwhelmed and useless. Ceph was design for local independent storage.

- You can use your SAN with 4 hosts as shared storage. You have a choice in how you address the resulting Luns. The supported methods are listed here: https://pve.proxmox.com/wiki/Storage . The only method that really fits with your environment is LVM thick.
You can also try, with a lot of manual involvement, to create Lun per VM disk. You cant use ZFS - its not suited for shared storage.

- Your questions about Raid types or specific fujitsu terminology is unlikely to get an answer here, you will have to find the manual for the device or reach out to device support. In the end, from Proxmox perspective its irrelevant what raid type you end up using.

- Yes, you can have a container serve cifs or nfs to server1. Either via VM or CT. The space presented will need to be accessed by other servers via the same method, ie via cifs/nfs by other servers.
 
if you use a fiberchannel san, you should use shared LVM (not thinlvm) as storage type.

Note that you'll not have snapshots or thin provisionning.

if you need them, you could use ceph on top, but it seem to be overkill (and you don't have 10g network for ceph replication).
(and using a single array behind ceph is not a good idea too. Ceph is really working fine with local disk + good network for replication)
 
Sorry for not thanking you for the answers faster but I was away for a bit,

How about trying something like exposing from SAN multiple (let's say 4) raid0 luns with cache disabled (each lun with the minimum 2 disks required by the SAN os) and using them for ceph? Would this work, would this have any benefit, has anybody tried this, am I confusing things here?
 
Essentially you will be treating your SAN as almost a JBOD. If the SAN allows you to create bare RAID0 (a bit of an oxymoron as its not "redundant" at all), then it should work for Ceph. As long as the LUN consumes the entire disk pair and is not collocated with another LUN, the Ceph replication should not hit them hard.

The big issue I see - data safety. Your SAN becomes your single point of failure, its multiplied by RAID0 setup. A disk failure would mean that entire LUN is disposable garbage. A second disk failure in another LUN, before Ceph recovers the shards, means you lost everything.

Unfortunately disks that are used together tend to fail together when their time comes. Only you can decide if the risk is acceptable. Keep in mind the reserved free space that would be needed to recover if LUN is gone.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Do not use SAN LUNs as backend storage for Ceph. This may technically work but this is not how Ceph is intended to operate.
You are right. Ceph is meant to work with local disk on the node, so the fault domain is the node, not centralized SAN.
However, when one has significant investment in hardware - its really hard to not want to get maximum ROI out of it. Cant blame OP for trying.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
You are right. Ceph is meant to work with local disk on the node, so the fault domain is the node, not centralized SAN.
However, when one has significant investment in hardware - its really hard to not want to get maximum ROI out of it. Cant blame OP for trying.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
There are other methods beside Ceph to use SAN storage with Proxmox.

Ceph is really the least best here.
 
There are other methods beside Ceph to use SAN storage with Proxmox.

Ceph is really the least best here.
Again, completely agree with you! If I gave an impression of advocating Ceph as best option - that was not my intention.

A clustered file system running across PVE nodes would be the most generic approach, as in fact you mentioned in comment #3. There is a learning curve to it though.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
So, to summarise a bit as viable options we have so far OCFS2 on FC SAN LUNs (with additional software installed on PVE directly) and shared LVM (not thinlvm) that does not have snapshots or thin provisionning.
CEPH is taken out of the question.
Since it's a new install and we're planning to add capacity as/if needed which one would you go with for best performance and features available, less waisted space, ease of operation?
There are other methods beside Ceph to use SAN storage with Proxmox.
Could you please offer some more info on this?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!