BACKGROUND
I work for a small business that provides a 24 hour service on our servers and requires as close to 100% uptime as possible. Our old IT company sold us 2 identical Dell R420 servers several years ago with a single 6 core processor, 4x 3.5" 600GB 10K SAS HDD in RAID10, and 16GB RAM and Server 2012 (not R2) that was ostensibly to allow virtualization for 3 VMs, but never implemented any of it and installed everything bare metal, sharing all services including DC and AD with our core business services, making it difficult to switch things to VMs due to not having the ability for down time while we completely reconfigure the server. Our core service has a program that runs on the alternate server called Cluster that basically monitors the service on Server 1 for any database changes and duplicates them, then when the service on Server 1 is unreachable it will stop Cluster and launch the database and services on Server 2. This process can take up to 10 min, which means any information that is trying to come into our core services during this period is dropped and lost forever. Another problem/inconvenience is that we then need to manually start Cluster on Server 1 when it is back online to be able to failover back to Server 1 if Server 2 has a fault.
GOAL
To have a High Availability Cluster where the VMs running our core services can automatically migrate between hosts with as little downtime as possible. We will still have a second VM in High Availability running the Cluster service so that we can migrate the services to an alternate VM while doing updates to the VM OS.
We are finally looking to upgrade our servers to something high quality and future proof. We do not have the budget for more than a couple servers so we are looking at getting a couple identical R660 or R760 servers running DDR5 RAM and 10x 3.2TB NVME drives in RAID Z2 (Z3?) with 10Gb/25Gb network. I read up on CEPH and ZFS with replication and came to the conclusion that we won't have enough hardware for CEPH, even if we splurged on another server (consensus seems to be minimum 4-5 CEPH nodes). Therefore, the goal at this time is to use ZFS replication at 1 min. intervals for our core services VMs and ~10 or 15 minutes for other VMs. For a 3rd node to achieve quorum, we will run a QDevice on another piece of hardware. Will this scenario work for high availability with ZFS replication between only two nodes?
We also have an old R720 that we came into possession with 8x 10TB SATA drives that we were planning on running TrueNAS Scale as a storage server for our office workstations and also as a backup for Proxmox snapshots. We would then backup the snapshots and workstation shares to a cloud backup service. We would probably run the QDevice as a VM in TrueNAS Scale. I don't believe that the 8 SATA disks in RAIDZ2 would provide the bandwidth needed to be a shared storage for the VMs.
Is there a better way to configure this setup without having to buy more servers? Could I technically use my old servers as part of the quorum and then use CEPH, even though they don't have an identical storage config? I could change the RAID card to IT mode and possibly get 4 SAS SSD or SATA SSD to speed up the storage.
I work for a small business that provides a 24 hour service on our servers and requires as close to 100% uptime as possible. Our old IT company sold us 2 identical Dell R420 servers several years ago with a single 6 core processor, 4x 3.5" 600GB 10K SAS HDD in RAID10, and 16GB RAM and Server 2012 (not R2) that was ostensibly to allow virtualization for 3 VMs, but never implemented any of it and installed everything bare metal, sharing all services including DC and AD with our core business services, making it difficult to switch things to VMs due to not having the ability for down time while we completely reconfigure the server. Our core service has a program that runs on the alternate server called Cluster that basically monitors the service on Server 1 for any database changes and duplicates them, then when the service on Server 1 is unreachable it will stop Cluster and launch the database and services on Server 2. This process can take up to 10 min, which means any information that is trying to come into our core services during this period is dropped and lost forever. Another problem/inconvenience is that we then need to manually start Cluster on Server 1 when it is back online to be able to failover back to Server 1 if Server 2 has a fault.
GOAL
To have a High Availability Cluster where the VMs running our core services can automatically migrate between hosts with as little downtime as possible. We will still have a second VM in High Availability running the Cluster service so that we can migrate the services to an alternate VM while doing updates to the VM OS.
We are finally looking to upgrade our servers to something high quality and future proof. We do not have the budget for more than a couple servers so we are looking at getting a couple identical R660 or R760 servers running DDR5 RAM and 10x 3.2TB NVME drives in RAID Z2 (Z3?) with 10Gb/25Gb network. I read up on CEPH and ZFS with replication and came to the conclusion that we won't have enough hardware for CEPH, even if we splurged on another server (consensus seems to be minimum 4-5 CEPH nodes). Therefore, the goal at this time is to use ZFS replication at 1 min. intervals for our core services VMs and ~10 or 15 minutes for other VMs. For a 3rd node to achieve quorum, we will run a QDevice on another piece of hardware. Will this scenario work for high availability with ZFS replication between only two nodes?
We also have an old R720 that we came into possession with 8x 10TB SATA drives that we were planning on running TrueNAS Scale as a storage server for our office workstations and also as a backup for Proxmox snapshots. We would then backup the snapshots and workstation shares to a cloud backup service. We would probably run the QDevice as a VM in TrueNAS Scale. I don't believe that the 8 SATA disks in RAIDZ2 would provide the bandwidth needed to be a shared storage for the VMs.
Is there a better way to configure this setup without having to buy more servers? Could I technically use my old servers as part of the quorum and then use CEPH, even though they don't have an identical storage config? I could change the RAID card to IT mode and possibly get 4 SAS SSD or SATA SSD to speed up the storage.