Second server

TIENDER

Member
Nov 12, 2020
12
0
6
32
Hi All,

I've got a server running at home with proxmox and several Virtual machines/dockers.
Because it's running some home-automation VM's i want to build a bit of redundancy.
So if one of the server fails, the other has to continue the VM's

.
I think i have two options

Ceph:

Pros:
Without downtime or data loss

Cons:
(to) expensive (3 servers, 10GB nics, a lot of SSD/HDD

ZFS with replication

Pros:
Cheaper (2 servers and a cheap Q-device for the third vote)

Cons:
Data loss because the servers aren't syncronized


I think ZFS with replication is the best option for my situation. But i'm still searching for the best setup

Each server with two SSD's in mirror (so 4 SSD in total) for the VM's
Do i need an extra SSD for the operating system?
On my current server the OS is running on the SSD's in mirror

Is it possible to add 10GB nics so the replication is faster or even synchronized?
 
Ceph is most likely overkill in that situation. Two server plus an external QDevice which could run on an RPI for example to have enough votes if one of the nodes is down.

Just make sure that the ZFS pools and storages are called the same on both nodes for replication to work. You can run it as short as each minute which should be short enough to tolerate that data loss in case a node fails and HA needs to start the VMs on the remaining node.

You can store the guests on the same SSDs as the OS. The storage to use would then be the "local-zfs" if you use the Proxmox VE installer.

You can define which network to use in Datacenter -> Options -> Migration settings. This network will then be used for live migrations as well as replication.

Even having a dedicated 1Gbit with its own subnet can already improve the situation. A 10Gbit network will of course make it even faster. Though in my own experience, you will notive it in live migrations and in the initial replication when the full disk needs to be copied over. After that, if you run it every few minutes and you don't have a ton of new data, the replication should be done quite quickly because only the changes since the last replication is sent.