The aim is to build a 4 TB file server with some sort of redundancy on a shoestring, hopefully with automatic failover.
My background: I am a part-time hacker who has read a few Wikipedia articles and now believes that being a sysadmin can't be that hard. Yes, I have thoroughly read all Proxmox marketing materials. I've had a number of bad experiences with server-class hardware from HP Enterprise, but thanks to Windows 11, I have many reasonable PCs to play with.
I gather that I can set up a 2-PC Proxmox cluster to run my virtual machine. Some other high-unreliability PC on the LAN can run a 3rd instance of Corosync Quorum Device (QDevice) for quorum purposes. Easy peasy then.
Proxmox can allegedly be configured to stop the VMs during the night and synchronise their images across the clusters. If the main PC fails, then Proxmox will automatically start a copy of the VM from last night on the other PC. I can live with it.
That solution should work with small VMs, but not with 4 TB of files. I read that Proxmox has "Local Storage Replication" for this scenario. Apparently, it is based on "zfs send" and "zfs recv". That reminds me, I need to read about ZFS some day.
Now the question is, how does this "storage replication" work in practice?
Say the VM runs Debian with SAMBA simulating an Active Directory etc. The local ZFS would be mounted in that VM, so that the VM can share the files on the network.
Question 1) Who is going to do the ZFS replication then? If it's the VM itself, then Proxmox would not be able to automate it or its failover. Or can Proxmox do a "zfs send" even though the ZFS is currently mounted in the VM? How do we make sure then that some file is not open and being written to at ZFS replication time?
Perhaps Proxmox needs to stop the VM in order to run "zfs send". In this case, should the primary PC fail, we could lose all file changes from today, as the last sync was last night. Is that right?
Question 2) QEMU / qcow2 has some mechanism to pause and synchronise disk state, so that you can take a safe snapshot of the filesystem during the day. Is that something I should be trying to do with the ZFS disk? However, I heard that Storage Replication works only with storage type 'zfspool', which could mean no qcow2 then.
You would hope that most applications should not end up with data corruption due to such async snapshots. After all, a real PC can lose power at any time without warning. I also hope that taking such as snapshot does not freeze the VM for too long, or the users may notice.
Question 3) Now say that the failed primary PC in the cluster gets repaired in the meantime and comes back up again. I heard that Proxmox cannot automatically switch back. I guess you need to manually invert the roles in the Proxmox configuration, so that the secondary is now the primary, and the old primary becomes the secondary. Is that right?
That shouldn't be a big problem for small VMs, but can we reverse the primary/secondary roles of the 4 TB ZFS disks so easily? Or would Proxmox copy the full 4 TB over the first time around?
Hopefully, I'll learn enough so that I can tell whether a proper Proxmox consultant/freelancer actually knows more than I do. Many thanks in advance!
My background: I am a part-time hacker who has read a few Wikipedia articles and now believes that being a sysadmin can't be that hard. Yes, I have thoroughly read all Proxmox marketing materials. I've had a number of bad experiences with server-class hardware from HP Enterprise, but thanks to Windows 11, I have many reasonable PCs to play with.
I gather that I can set up a 2-PC Proxmox cluster to run my virtual machine. Some other high-unreliability PC on the LAN can run a 3rd instance of Corosync Quorum Device (QDevice) for quorum purposes. Easy peasy then.
Proxmox can allegedly be configured to stop the VMs during the night and synchronise their images across the clusters. If the main PC fails, then Proxmox will automatically start a copy of the VM from last night on the other PC. I can live with it.
That solution should work with small VMs, but not with 4 TB of files. I read that Proxmox has "Local Storage Replication" for this scenario. Apparently, it is based on "zfs send" and "zfs recv". That reminds me, I need to read about ZFS some day.
Now the question is, how does this "storage replication" work in practice?
Say the VM runs Debian with SAMBA simulating an Active Directory etc. The local ZFS would be mounted in that VM, so that the VM can share the files on the network.
Question 1) Who is going to do the ZFS replication then? If it's the VM itself, then Proxmox would not be able to automate it or its failover. Or can Proxmox do a "zfs send" even though the ZFS is currently mounted in the VM? How do we make sure then that some file is not open and being written to at ZFS replication time?
Perhaps Proxmox needs to stop the VM in order to run "zfs send". In this case, should the primary PC fail, we could lose all file changes from today, as the last sync was last night. Is that right?
Question 2) QEMU / qcow2 has some mechanism to pause and synchronise disk state, so that you can take a safe snapshot of the filesystem during the day. Is that something I should be trying to do with the ZFS disk? However, I heard that Storage Replication works only with storage type 'zfspool', which could mean no qcow2 then.
You would hope that most applications should not end up with data corruption due to such async snapshots. After all, a real PC can lose power at any time without warning. I also hope that taking such as snapshot does not freeze the VM for too long, or the users may notice.
Question 3) Now say that the failed primary PC in the cluster gets repaired in the meantime and comes back up again. I heard that Proxmox cannot automatically switch back. I guess you need to manually invert the roles in the Proxmox configuration, so that the secondary is now the primary, and the old primary becomes the secondary. Is that right?
That shouldn't be a big problem for small VMs, but can we reverse the primary/secondary roles of the 4 TB ZFS disks so easily? Or would Proxmox copy the full 4 TB over the first time around?
Hopefully, I'll learn enough so that I can tell whether a proper Proxmox consultant/freelancer actually knows more than I do. Many thanks in advance!