Install three Proxmox servers with ZFS shared storage system for high availability

Thvle

New Member
May 3, 2022
6
0
1
Hello friends, I am new here (with proxmox) and I have a question. I want to mount three Proxmox servers together with a ZFS shared storage system so that a Web server, if the server Proxmox where this VM goes down, it can migrate to another proxmox machine.

I know this can be done with an NFS system but I would like to do it with a ZFS system. What suggestions do you give me? Thank you.
 
ZFS is not cluster-aware storage, you cant use it in the same way that NFS is.
You can use ZFS as local storage on each node and use live PVE/QEMU live migration option, but its not the same as shared storage.
There list of PVE supported shared storage types is located here: https://pve.proxmox.com/wiki/Storage


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Thvle
ZFS is not cluster-aware storage, you cant use it in the same way that NFS is.
You can use ZFS as local storage on each node and use live PVE/QEMU live migration option, but its not the same as shared storage.
There list of PVE supported shared storage types is located here: https://pve.proxmox.com/wiki/Storage


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thank you for your kind response. And this live migration would be done automatically on one of the other two proxmox servers automatically? Can you configure priority criteria, etc? Thanks!
 
And this live migration would be done automatically on one of the other two proxmox servers automatically?
No, you trigger it.
Just keep in mind (as @bbgeek17 already said):
Live migration also only works if both the source and target are alive, online and working. You will not have the sama data if one node fails and you start the VMs on the other side.
 
What has not been mentioned yet is that you can use ZFS in combination with HA if you are okay with async replication.

The VM replication feature of Proxmox VE needs ZFS storage underneath. Create the same ZFS pool on each node with the storage config for it. You can then use the replication to set up replication of the disks between the nodes. In a 3 node cluster, that would be 2 replication jobs.

If you do run into the situation that a node fails, then the HA stack will start that VM on one of the remaining nodes. Since the replication jobs made sure that the local disk image(s) are present, the VM can start.

The only downside is that you might have some potential data loss in the HA case. Depending on how long the replication interval is and when the last successful replication ran before the node failed.

Replication intervals can be as short as each minute.

I personally use it in a small 2 node cluster and have the interval down to 1 minute for the mail server but quite a bit longer for the VMs where I can live with some data loss. For example, the DNS and DHCP server.
 
One more thing. If you have a 3 node cluster and a fast network as well (at least 10Gbit or faster) you could consider using Ceph for clustered storage running on the Proxmox VE nodes themselves. But it does need a bit more resources. Check out the Proxmox VE docs which should also have a link to the Ceph docs and the requirements mentioned there https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pveceph
 
What has not been mentioned yet is that you can use ZFS in combination with HA if you are okay with async replication.

The VM replication feature of Proxmox VE needs ZFS storage underneath. Create the same ZFS pool on each node with the storage config for it. You can then use the replication to set up replication of the disks between the nodes. In a 3 node cluster, that would be 2 replication jobs.

If you do run into the situation that a node fails, then the HA stack will start that VM on one of the remaining nodes. Since the replication jobs made sure that the local disk image(s) are present, the VM can start.

The only downside is that you might have some potential data loss in the HA case. Depending on how long the replication interval is and when the last successful replication ran before the node failed.

Replication intervals can be as short as each minute.

I personally use it in a small 2 node cluster and have the interval down to 1 minute for the mail server but quite a bit longer for the VMs where I can live with some data loss. For example, the DNS and DHCP server.
So I'm going to choose that option. Is there any official guide to do that? Thank you!
 
Ok, I am configuring everything and I would like to do bounding, for this I am creating three network interfaces apart from the bridge to connect to the internet. These three interfaces in VirtualBox, how should I configure them? "internal network"? Thank you!
 
Ok, I am configuring everything and I would like to do bounding, for this I am creating three network interfaces apart from the bridge to connect to the internet. These three interfaces in VirtualBox, how should I configure them? "internal network"? Thank you!
Bridging and Bonding are on different levels.

Could you please outline what network setup you have in mind?
 
  • Like
Reactions: Thvle
Bridging and Bonding are on different levels.

Could you please outline what network setup you have in mind?
Three nodes Proxmox in a cluster. One VM with server Apache2 in a first node (proxmox1). I would like to use it to make the replication speed between nodes even faster.
 
Three nodes Proxmox in a cluster. One VM with server Apache2 in a first node (proxmox1). I would like to use it to make the replication speed between nodes even faster.
So, you'd like to use LACP in order to get double or even tripple the replication speed? Do you have LACP capable switches?
Keep in mind, that replication speed does not scale linear. Often, you're CPU bound if you use replication over SSH. This heavily depends on the used CPUs.
 
So, you'd like to use LACP in order to get double or even tripple the replication speed? Do you have LACP capable switches?
Keep in mind, that replication speed does not scale linear. Often, you're CPU bound if you use replication over SSH. This heavily depends on the used CPUs.
I have not said that my intention is to create this with virtual machines (VirtualBox), since I am doing a project for my studies.
 
I have not said that my intention is to create this with virtual machines (VirtualBox), since I am doing a project for my studies.
I would totally recommend doing that with VMs. Best to simulate, virtualize and play around with everything you go into production with. If you want to create a network for storage and the cluster/corosync network, I' recommend to create one virtual network for each and attach the virtualized network cards respectively.
 
ZFS is not cluster-aware storage, you cant use it in the same way that NFS is.
You can use ZFS as local storage on each node and use live PVE/QEMU live migration option, but its not the same as shared storage.
There list of PVE supported shared storage types is located here: https://pve.proxmox.com/wiki/Storage


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
it's mentioned in the documentation that ZFS could be used as a shared storage but only over iscsi
1660314829652.png
 
it's mentioned in the documentation that ZFS could be used as a shared storage but only over iscsi
View attachment 39924
The naming is a bit confusing. It works by having a compatible storage server running ZFS to which Proxmox VE will connect to via SSH, manage the ZFS volumes and the export of those via iSCSI so that they can be consumed by the Proxmox VE nodes via good old iSCSI.
 
  • Like
Reactions: cool
The naming is a bit confusing. It works by having a compatible storage server running ZFS to which Proxmox VE will connect to via SSH, manage the ZFS volumes and the export of those via iSCSI so that they can be consumed by the Proxmox VE nodes via good old iSCSI.
and it can be a good option in my case ?
 
hello,everybody.i have a question i have 2 nodes and config extra quorate device i config ZFS and HA then i use command:pve-zsync sync --source 102 --dest 10.1.23.99:ET-ZFS-1 --verbose --maxsnap 1 --name test --limit 102400 at node98 then at node99 i use command:
pve-zsync create --source 10.1.23.98:102 --dest ET-ZFS-1 --verbose --maxsnap 1 --name test --limit 512 --skip
then i shutdown node98 vm102 migrate to node99 from HA ,then i start node98 and vm 102 go to node98 that is true and the problem is when i try the second time i see the log is Nov 25 10:54:04 node99 pve-ha-lrm[11271]: Task 'UPID:node99:00002C08:00657704:63802D7F:qmstart:102:root@pam:' still active, waiting in node99 vm102 can not start again .
then i start node98 vm 102 can not migrate to node98 final i delect vm102 disk then migrate use my hand and add disk in /etc/pve/qemu-xxx/102.conf
the question is how can vm 102 can migrate many times ??
sorry,my english not good.
 
hahaha,i got it two code choose one both of code can use when the vm go back to inital node then input again
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!