PVE5.1 Cluster on ZFS Multipath Storage

theappy · Jan 6, 2018

Good day all. Thank you for an amazing OS!
I have a 7 node cluster.
On each node is a Fibre Channel Card.
Each node sees the storage and I have created multipath config on every node.
My question is how to proceed to enable replication?
Must I create:
1. The same zpool one every node and then add it in Gui/Storage/ZFS and have it shared?
2. The same zpool on every node and then add it only for the specific node and not share it?
As a test, I created via console, 3 zpools on the 3 multipath outputs.
when I do zfs list and/or zpool list, I see them mounted and available.
Upon rebooting each node, neither zfs nor zpool shows.
zfs import shows that it was last accessed by the last node I rebooted.

3. Should I add the pools on each node, add it in Gui/Storage before rebooting, for it to be written into PVE storage config?

Thank you for your assistance and time

proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.35-1-pve: 4.4.35-76

dietmar · Jan 6, 2018

You cannot use zfs on a shared storage device and mount that on multiple hosts - hope you do not try that?

theappy · Jan 6, 2018

The Storage is Fujitsu DX200 with Fiber Uplinks through a Brocade Fiber switch, and from there two fiber links to each node using multipath.
As I understand from the vendor, this is normal / standard. You create multipath on each hypervisor and the hypervisor should control file access?
On the DX200 storage they have created 3 LUNS for me, which has been granted access to each fiber card on each node.

It is a new install Dietmar, so I am still searching how to do this.
Any thoughts wrt configure proxmox with this storage?
Thank you

dietmar · Jan 6, 2018

theappy said:
As I understand from the vendor, this is normal / standard. You create multipath on each hypervisor and the hypervisor should control file access?

It is still unclear to me what you are doing exactly, sorry.

theappy · Jan 6, 2018

Sorry if I am unclear.
Management want the following:
Central storage Attached to each PVENode1 - 7.
We have a Fiber Card in each node attached to the Storage.
There are 3 Disk Volumes on the Storage: BlockDeviceStorageA=6TB; BlockDeviceStorageB=7TB; BlockDeviceStorageC=8TB.
I need all the Nodes to have access to these Storage Pools:
Node1 Console: multipathBlockDeviceStorageA; multipathBlockDeviceStorageB; multipathBlockDeviceStorageC
Node2 Console: multipathBlockDeviceStorageA; multipathBlockDeviceStorageB; multipathBlockDeviceStorageC
Node3 Console: multipathBlockDeviceStorageA; multipathBlockDeviceStorageB; multipathBlockDeviceStorageC
...up to node 7

I want the 3 BlockDeviceStorages visible to all 7 Nodes, and if any node reboots, the Storage must be available to all nodes.
And all 7 Nodes must have access to the 3 BlockDevices at all times, regardless if 1 or 6 are rebooted.
We also have a 2nd DX200 with the exact same storage pools.
Which we want to create replication to.
That is why I want to use zfs.
Any suggestions?
And I do hope that I am painting an adequate picture

.

theappy · Jan 6, 2018

Just a Further Note:
The Nodes already see the block devices attached to them via fiber cards.
Therefore, Node1-7 already see and have multipath confifigured on them as above.
I just need to know how to effectively present the storage pools to PVE?

dietmar · Jan 6, 2018

theappy said:
I just need to know how to effectively present the storage pools to PVE?

I would create a LVM volume group on each such shared device. Then create a storage definition for that VG and mark it as shared.
That works, but does not have features like snapshots... I guess replication need to be done on the storage box.

theappy · Jan 6, 2018

Thank you Dietmar, yes, I am trying to do the LVM at this moment

Would I be able to load a zfs pool on that LVM?
Would that be of any benefit?

dietmar · Jan 6, 2018

theappy said:
Would I be able to load a zfs pool on that LVM?

No, that makes no sense to me. ZFS is a local file system, not designed to work on shared storage.

theappy · Jan 6, 2018

Oh, okay.
Any way to implement the replication feature found with zfs in this scenario, as you understand it?

Really also want to thank you again for your time to respond.

dietmar · Jan 7, 2018

theappy said:
Any way to implement the replication feature found with zfs in this scenario, as you understand it?

No. But AFAIK the DX200 box is able to replicate (AdvancedCopy Management).

theappy · Jan 7, 2018

Good morning to you

Yes, I am aware of the advanced copy. Please will you indicate how that would effect a running VM?
Perhaps I am greatly mistaken?
How would this play out:
VM1's qcow2 runs on StorageA.
- StorageA replicates to StorageB
- While the replication is taking place, VM1 has (as example) a write to SQL happening. At this time storageA fails and VM1 continues to run from StorageB. Would this not cause corruption? Almost like unplugging the qcow2 disk and plugging in an older version?

dietmar · Jan 7, 2018

theappy said:
- While the replication is taking place, VM1 has (as example) a write to SQL happening. At this time storageA fails and VM1 continues to run from StorageB. Would this not cause corruption?

Sorry, I do not have any information about that feature - I do not know how they implemented that.

We usually use ceph for setupus where you want to replicate and distribute storage among several nodes.

theappy · Jan 7, 2018

I saw in the ceph config that you must have exact systems. Was that referring to the compute or storage side of things?
Can I have ceph on Databank1 with SSD Raid10 drives and Databank2 with SSD Raid5 drives:
Databank1-LUN1,2,3 = 6TB, 7TB,8TB
Databank2-LUN1,2,3 = 6.1TB, 7.1TB,8.1TB

Thank you Dietmar

dietmar · Jan 7, 2018

theappy said:
Can I have ceph on Databank1 with SSD Raid10 drives and Databank2 with SSD Raid5 drives:

I makes no sense to use ceph on shared drives, and it is also pointless to use any RAID for ceph...
I think you current HW does not fit well for a ceph setup.

theappy · Jan 7, 2018

Oh, so when you said you usually use ceph to replicate and distribute storage, did you imply you would configure your storage without any raid and just passthrough the storage to the hosts and configure ceph on the x-amount of disks?

dietmar · Jan 7, 2018

Yes, we simply use local disks (SSDs only most times). We also use hyperconverged setups for small installations. Also see:

https://pve.proxmox.com/wiki/Manage_Ceph_Services_on_Proxmox_VE_Nodes

LnxBil · Jan 7, 2018

theappy said:
While the replication is taking place, VM1 has (as example) a write to SQL happening. At this time storageA fails and VM1 continues to run from StorageB. Would this not cause corruption? Almost like unplugging the qcow2 disk and plugging in an older version?

First, you cannot use QCOW2 on LVM, you have RAW. Second, you will lose data if the replication is not synchronous.

Why do you think that the DX will fail completely? Everything in there is redundant and I never experienced such a total failure.

theappy · Jan 8, 2018

@LnxBil: Thank you yes, I know Fujitsu is rock-solid

, this is more a measure of assurance with the client. As you know, they always need to know that there are redundancies in place, so it is more a matter of principle and having settings in place, since this client falls under one of our major banking institutions. Really appreciate your feedback.
Yep, replication would not be synchronous, as we will have a DR site connected with fiber and we are not sure that the line will be fast enough to support synchronous connectivity.
@dietmar: I have seen that, but the bank wanted to go with this storage. I believe our client is a test-subject for them to see how it all works.
Thank you.

guletz · Jan 8, 2018

Hi,

theappy said:
- While the replication is taking place, VM1 has (as example) a write to SQL happening. At this time storageA fails and VM1 continues to run from StorageB. Would this not cause corruption? Almost like unplugging the qcow2 disk and plugging in an older version?

In this particular test case, one of the best option is to have a SQL cluster(so you will need to use 3 different VM for sql server for many sql product). You will setup a frontend like haproxy to have load-balancing/fail-over. If one of 3 sql VM will be broken, it will be no coruption inside sql.

PVE5.1 Cluster on ZFS Multipath Storage

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Distinguished Member

Active Member

Distinguished Member

We value your privacy