PVE5.1 Cluster on ZFS Multipath Storage

theappy

Active Member
Jan 6, 2018
10
0
41
51
Good day all. Thank you for an amazing OS!
I have a 7 node cluster.
On each node is a Fibre Channel Card.
Each node sees the storage and I have created multipath config on every node.
My question is how to proceed to enable replication?
Must I create:
1. The same zpool one every node and then add it in Gui/Storage/ZFS and have it shared?
2. The same zpool on every node and then add it only for the specific node and not share it?
As a test, I created via console, 3 zpools on the 3 multipath outputs.
when I do zfs list and/or zpool list, I see them mounted and available.
Upon rebooting each node, neither zfs nor zpool shows.
zfs import shows that it was last accessed by the last node I rebooted.

3. Should I add the pools on each node, add it in Gui/Storage before rebooting, for it to be written into PVE storage config?

Thank you for your assistance and time :)

proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.35-1-pve: 4.4.35-76
 
You cannot use zfs on a shared storage device and mount that on multiple hosts - hope you do not try that?
 
The Storage is Fujitsu DX200 with Fiber Uplinks through a Brocade Fiber switch, and from there two fiber links to each node using multipath.
As I understand from the vendor, this is normal / standard. You create multipath on each hypervisor and the hypervisor should control file access?
On the DX200 storage they have created 3 LUNS for me, which has been granted access to each fiber card on each node.

It is a new install Dietmar, so I am still searching how to do this.
Any thoughts wrt configure proxmox with this storage?
Thank you :)
 
Sorry if I am unclear.
Management want the following:
Central storage Attached to each PVENode1 - 7.
We have a Fiber Card in each node attached to the Storage.
There are 3 Disk Volumes on the Storage: BlockDeviceStorageA=6TB; BlockDeviceStorageB=7TB; BlockDeviceStorageC=8TB.
I need all the Nodes to have access to these Storage Pools:
Node1 Console: multipathBlockDeviceStorageA; multipathBlockDeviceStorageB; multipathBlockDeviceStorageC
Node2 Console: multipathBlockDeviceStorageA; multipathBlockDeviceStorageB; multipathBlockDeviceStorageC
Node3 Console: multipathBlockDeviceStorageA; multipathBlockDeviceStorageB; multipathBlockDeviceStorageC
...up to node 7

I want the 3 BlockDeviceStorages visible to all 7 Nodes, and if any node reboots, the Storage must be available to all nodes.
And all 7 Nodes must have access to the 3 BlockDevices at all times, regardless if 1 or 6 are rebooted.
We also have a 2nd DX200 with the exact same storage pools.
Which we want to create replication to.
That is why I want to use zfs.
Any suggestions?
And I do hope that I am painting an adequate picture :).
 
Just a Further Note:
The Nodes already see the block devices attached to them via fiber cards.
Therefore, Node1-7 already see and have multipath confifigured on them as above.
I just need to know how to effectively present the storage pools to PVE?
 
I just need to know how to effectively present the storage pools to PVE?

I would create a LVM volume group on each such shared device. Then create a storage definition for that VG and mark it as shared.
That works, but does not have features like snapshots... I guess replication need to be done on the storage box.
 
Thank you Dietmar, yes, I am trying to do the LVM at this moment :)
Would I be able to load a zfs pool on that LVM?
Would that be of any benefit?
 
Oh, okay.
Any way to implement the replication feature found with zfs in this scenario, as you understand it?

Really also want to thank you again for your time to respond.
 
Good morning to you :)
Yes, I am aware of the advanced copy. Please will you indicate how that would effect a running VM?
Perhaps I am greatly mistaken?
How would this play out:
VM1's qcow2 runs on StorageA.
- StorageA replicates to StorageB
- While the replication is taking place, VM1 has (as example) a write to SQL happening. At this time storageA fails and VM1 continues to run from StorageB. Would this not cause corruption? Almost like unplugging the qcow2 disk and plugging in an older version?
 
- While the replication is taking place, VM1 has (as example) a write to SQL happening. At this time storageA fails and VM1 continues to run from StorageB. Would this not cause corruption?

Sorry, I do not have any information about that feature - I do not know how they implemented that.

We usually use ceph for setupus where you want to replicate and distribute storage among several nodes.
 
I saw in the ceph config that you must have exact systems. Was that referring to the compute or storage side of things?
Can I have ceph on Databank1 with SSD Raid10 drives and Databank2 with SSD Raid5 drives:
Databank1-LUN1,2,3 = 6TB, 7TB,8TB
Databank2-LUN1,2,3 = 6.1TB, 7.1TB,8.1TB

Thank you Dietmar :)
 
Can I have ceph on Databank1 with SSD Raid10 drives and Databank2 with SSD Raid5 drives:

I makes no sense to use ceph on shared drives, and it is also pointless to use any RAID for ceph...
I think you current HW does not fit well for a ceph setup.
 
Oh, so when you said you usually use ceph to replicate and distribute storage, did you imply you would configure your storage without any raid and just passthrough the storage to the hosts and configure ceph on the x-amount of disks?
 
While the replication is taking place, VM1 has (as example) a write to SQL happening. At this time storageA fails and VM1 continues to run from StorageB. Would this not cause corruption? Almost like unplugging the qcow2 disk and plugging in an older version?

First, you cannot use QCOW2 on LVM, you have RAW. Second, you will lose data if the replication is not synchronous.

Why do you think that the DX will fail completely? Everything in there is redundant and I never experienced such a total failure.
 
@LnxBil: Thank you yes, I know Fujitsu is rock-solid :), this is more a measure of assurance with the client. As you know, they always need to know that there are redundancies in place, so it is more a matter of principle and having settings in place, since this client falls under one of our major banking institutions. Really appreciate your feedback.
Yep, replication would not be synchronous, as we will have a DR site connected with fiber and we are not sure that the line will be fast enough to support synchronous connectivity.
@dietmar: I have seen that, but the bank wanted to go with this storage. I believe our client is a test-subject for them to see how it all works.
Thank you.
 
Hi,

- While the replication is taking place, VM1 has (as example) a write to SQL happening. At this time storageA fails and VM1 continues to run from StorageB. Would this not cause corruption? Almost like unplugging the qcow2 disk and plugging in an older version?

In this particular test case, one of the best option is to have a SQL cluster(so you will need to use 3 different VM for sql server for many sql product). You will setup a frontend like haproxy to have load-balancing/fail-over. If one of 3 sql VM will be broken, it will be no coruption inside sql.