Hi
We are looking into hardening our infrastructure to better handle outages due to network or hardware failtures.
We use OVH dedicated physical servers for our infrastructure and currently have 3 node PVE cluster. This cluster is currently configured with 2x4TB nVME drives per node, with one of the nodes restricted to primarily Windows VMs and therefore it has only 1 CPU to bring down the licensing costs due to a low volume of clients requesting services that have to run on Windows.
All servers have access to the OVH vRack system.
Currently all servers have the disks configured as a ZFS mirror, but we are looking into configuration of HA in Proxmox. We can however see that HA works best (only?) with shared storage (network share) or distributed storage (Ceph).
As our monitoring of storage health is currently based on the output from emails generated zfs-zed, we would prefer not to redo our tooling and was therefore thinking of alternatives to handling this.
We therefore have the following questions, which we hope that someone can help us find the answers for:
- Can you implement HA by enabling replication of the required VMs between nodes in the same HA group and still benefit from the online/live migration or do we need to implement something like Ceph?
- If we have to implement Ceph, can it then be done on top of the ZFS pools for us to keep current storage monitoring tools or would we need to bring up Ceph directly on the bare disks and implement new tooling for storage health monitoring?
- If necessary to bring up Ceph to get wokring HA, can we bring up new nodes in our existing cluster and only deploy Ceph across the new nodes and then migrate VMs from current non-HA setup in same cluster?
We are looking into hardening our infrastructure to better handle outages due to network or hardware failtures.
We use OVH dedicated physical servers for our infrastructure and currently have 3 node PVE cluster. This cluster is currently configured with 2x4TB nVME drives per node, with one of the nodes restricted to primarily Windows VMs and therefore it has only 1 CPU to bring down the licensing costs due to a low volume of clients requesting services that have to run on Windows.
All servers have access to the OVH vRack system.
Currently all servers have the disks configured as a ZFS mirror, but we are looking into configuration of HA in Proxmox. We can however see that HA works best (only?) with shared storage (network share) or distributed storage (Ceph).
As our monitoring of storage health is currently based on the output from emails generated zfs-zed, we would prefer not to redo our tooling and was therefore thinking of alternatives to handling this.
We therefore have the following questions, which we hope that someone can help us find the answers for:
- Can you implement HA by enabling replication of the required VMs between nodes in the same HA group and still benefit from the online/live migration or do we need to implement something like Ceph?
- If we have to implement Ceph, can it then be done on top of the ZFS pools for us to keep current storage monitoring tools or would we need to bring up Ceph directly on the bare disks and implement new tooling for storage health monitoring?
- If necessary to bring up Ceph to get wokring HA, can we bring up new nodes in our existing cluster and only deploy Ceph across the new nodes and then migrate VMs from current non-HA setup in same cluster?