Questions about HA on ZFS cluster?

killmasta93

Renowned Member
Aug 13, 2017
973
57
68
30
Hi,
I was wondering if someone could share some knowledge or recommendation on creating a HA on a ZFS cluster.
Currently have 3 nodes with proxmox 5.3 each node has a raid 10 ZFS with 4 X 2tb. Currently using pve-zsync to do the backups but was really looking into HA. I was reading about CEPH but read a few comments saying do not mix CEPH with ZFS. Here is my question.

1) Each node are identical meaning the pools are with rpool and the vm are stored on local-zfs. Do i need to create on each node another pool inside of local-zfs to have a shared storage? or do i need a NAS/SAN? or another storage


Thank you
 
I was reading about CEPH but read a few comments saying do not mix CEPH with ZFS
I'm not sure what you read about it, but CEPH and ZFS are different things. Im sure, its no Problem to setup the PVE on two ZFS Disks and pass the other Drives to CEPH. So normally CEPH need an HBA to work propely, but ZFS is not really different at this. But Sure, you can not use a Disk for ZFS an CEPH at the same time (also you can do it, but its really really not recommended :D ).

Do i need to create on each node another pool inside of local-zfs to have a shared storage? or do i need a NAS/SAN? or another storage
If you have the requirements to have an HA Storage, then you need an Shared Storage. In my opinion therefore is only CEPH or NFS usable (SAN itself with FC too, but this is more for SMB with an internal IT). I'm really not a friend of iSCSI / FC, there is a high risk to loose all of your data.
So it depends on your needs and your Budget what you should do here.
 
  • Like
Reactions: killmasta93
Thanks for the reply, So lets say i have the 3 nodes each running ZFS RAID 10 with the rpool. How can i enable a shared storage with the setup I have? Because all the drives are using ZFS and for CEPH i would need OSD and would not be applicable for my setup.

Thank you
 
Thanks for the reply, So lets say i have the 3 nodes each running ZFS RAID 10 with the rpool. How can i enable a shared storage with the setup I have?
Take a look at the Documentation: https://pve.proxmox.com/wiki/PVE-zsync
I don't use it this way, because it is async, not sync - for me this is not usable. I'm using CEPH Storage to get a Shared Storage which has HA Capability and a "RAID over Network" functionality to reduce dataloss und prevent bit rot.

Because all the drives are using ZFS and for CEPH i would need OSD and would not be applicable for my setup.
In CEPH an OSD means it is a single disk, so hard drive = OSD for CEPH, its only another description for it :)
 
Thanks for the reply, currently using pve-zsync working great but as you pointed it out its a async not sync. So the only way to have HA is either a single disk which is not in ZFS format in which the Vms will be stored? Maybe is there a way to have all the shared storage replicated on all the Host?
 
Bad personal experience?
No, i don't use it because of the risk of Data loss :)

Please elaborate.
One Mistake and BOOM, iSCSI is not really a Shared Storage. Normally you can not active using a LUN on two Systems, you always need an broker, Cluster Filesystem, etc who can manage this and write and publish the meta data correct to all the other Nodes. If two Systems writes the meta data, then you will have an very very high risk of data loss, proberly you can loose the complete LUN.
NFS or CEPH have an file locking, so its possible to mount and use the "LUN" (its more an CEPH Pool and an NFS Share :) ) actively on all Systems, without a broker or anything else. Sure, if you have an VM Image and you mount this twice, you can run in the same Situation.

So if you have any Problems with your Cluster, the Software is fucked up or something, maybe it will mount an iSCSI LUN active and destroy ALL the Data on the LUN. This can happend too with NFS or CEPH, but only the VM itself will have an damage, not the Whole CEPH Pool or NFS Share.

Yes mistakes can happen everywhere, but in my opinion the risk to loose my Data in an iSCSI enviroment is much higher than with NFS or CEPH and these are great alternatives to iSCSI. But yes, for SMB its perfect, so normally you have three or more Nodes and you EMC Storage connected per FC, then you use VMware and the World works as designed. At the internal IT you do not experiment with your Cluster, you need an Enterprise feature and you buy Support from the Vendor - if you have an Problem, then normally you call the Vendor and don't fix this by yourself.
 
Thanks for the reply, currently using pve-zsync working great but as you pointed it out its a async not sync. So the only way to have HA is either a single disk which is not in ZFS format in which the Vms will be stored? Maybe is there a way to have all the shared storage replicated on all the Host?
It's a bit outside the scope of Pve but you have some commercial ZFS-HA offerings like RSF-1, or - https://github.com/ewwhite/zfs-ha/wiki
 
... But Sure, you can not use a Disk for ZFS an CEPH at the same time (also you can do it, but its really really not recommended :D ).

Hi,

Could you explain why not ?

I have a VM (SQL Svr 2k16) with one Ceph 'OS' disk and a ZFS 'SQL Data' disk running without any kind of issues for months.
ZFS disk is replicated every 2 hours. Everything works as expected.
The only dirty thing is that I must power off VM if I want to migrate to another node.
 
I have a VM (SQL Svr 2k16) with one Ceph 'OS' disk and a ZFS 'SQL Data' disk running without any kind of issues for months.
Sure, i dont see any Problems here too. Re-Read my qoute, its not a problem if you have a CEPH Disk and a seperated ZFS Disk in your HV and you assign from both Storage one Disk to your VM. But its really really not recommended to use ONE Hard Drive for both and create a partition for CEPH and another one for ZFS.
 
Ok so I misunderstood, sorry for that.

I can't believe anyone want to do kind of thing :)
 
  • Like
Reactions: velocity08
I can't believe anyone want to do kind of thing :)
You can believe it, a colleague of my has done this. He has an mdadm raid 1 on the same disks as the OSDs. But it is for his private use and he say all is running fine. I won't do this, but he is happy :D

But really, it's not for commercial and productive usage :D
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!