Multiple nodes in Cluster why need backup?

jHorrocks · Dec 12, 2022

I am designing a 3 node proxmox environment:

pveOffice1
location: my office
installed on 6 x 1TB HDD with RAID 0 - total 6TB space with proxmox installed on all 6 disks as zfs RAID 0

pbsOffice2
location: my office
installed on 4 x 4TB HDD with RAID 10 - total 8TB space with proxmox backup server installed on all 4 disks as zfs RAID 10

pveProduction1
location: datacenter
installed on 10 x 2 TB SAS 12k drives RAID 10 with proxmox installed on all 10 SAS drives as zfs RAID 10

My idea is pveOffice1 is a staging server that I create all the CT or VMs then migrate to pveProduction1 and then setup replication to replicate from
pveProduction1 to pveOffice1 every 1 hour or so then on a daily basis backup pveOffice1 to pbsOffice2.

My questions are:

1 - Is it ok to install across all drives as RAID or is a separate SSD better and then create a new ZFS storage in datacenter?

2 - Instead of creating pbsOffice2 as a pbs backup server but install it as another pve and instead replicate pveProduction1 to pveOffice1 and then replicate pveOffice1 to pbsOffice2 and that serves as a 3 node backup solution?

3 - Is there a better way to do this? My current hardware specs are mixed so I am trying to utilise as much hardware I already own.

snpz · Dec 12, 2022

I would say that using ZFS sync this kind of setup if fairly ok!
1. I would suggest to use SSDs on PVE Prod 1 if you plan to run systems, that are resource consuming and needs fast IO;
2. I would leave it as PBS, because it is better to have a real copy from VM, that you can recover in case of disaster. Image situation when a VM on Prod1 is damaged and all this crap sync to Office 1. For sure that it is easier to recover a VM from backup.
3. Link between Prod and Office is reliable and wide enough to serve all the sync tasks?

jHorrocks · Dec 12, 2022

Thanks for your reply, appreciate it buddy. I only have a 10MBs connection from the data hosting company to my Office. Maybe I can ask them to upgrade it to 100MBs.

snpz · Dec 12, 2022

In case of disaster - using 10mbit/s link this might take a while to recover from backup image. It depends on the size of the VM, of course

Neobin · Dec 13, 2022

Your setup sounds not good to me.
The (in my eyes) bad starts with using raid0, goes over your (possibly) assumption that replication/sync replaces backups and ends with a PVE-cluster via a WAN-connection [1].

Just my 2 cents...

[1] https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_cluster_network_requirements

jHorrocks · Dec 13, 2022

Thanks buddy, I am asking the datacenter for options. Now I have setup 3 servers in my office as a test same as my OP:

pveOffice1
pbsOffice2
pveProduction1

I have created a VM in pveProduction1 and setup replication from pveProduction1 to pveOffice1 using zfs, all works well.

If I run the following command on both servers I see the disk subvol-104-disk-0

ls /rpool/data/

If I run the following command on both servers i see the same size disk:

df -h /rpool/data/subvol-104-disk-0

So basically replication on the VM disk is working but now on pveOffice1 I want to backup to pbsOffice2 but because the VM isnt showing up as a VM in pveOffice1 because it is running on pveProduction1 I cannot select the VM in pveOffice1 to backup to pbsOffice2 (the image is in /rpool/data/ but its not showing on pveOffice1).

So:
1 - How do I backup this VM image /rpool/data/subvol-104-disk-0 from pveOffice1 to pbsOffice2
2 - If pveProduction1 VM or the server fails but I still have the image subvol-104-disk-0 replicated on pveOffice1 how do I open it once i fix pveProduction1?

Dunuin · Dec 13, 2022

jHorrocks said:
1 - How do I backup this VM image /rpool/data/subvol-104-disk-0 from pveOffice1 to pbsOffice2

If it is using a subvol, its a Container and not a VM. PVE isn't just backing up the data. It needs to access the guest to make sure to flush the write cache and so on before doing the backup to ensure data integrity. And in case of VMs it won't just backup the data on the ZFS level, it will make use of QEMUs functionalities, so it needs to communicate with the running VM. So you should backup the guest from pveProduction1 to the PBS if that VM is running on pveProduction1 and not from pveOffice1.

jHorrocks said:
2 - If pveProduction1 VM or the server fails but I still have the image subvol-104-disk-0 replicated on pveOffice1 how do I open it once i fix pveProduction1?

You still lose the data since the last replication snapshot, as ZFS is never perfectly synced. And as Neobin already mentioned, a cluster needs low latency (I think a staff member said something about <10ms). It's usually not a good idea to create a cluster over the internet.
Do you want HA to prevent downtime or just as some kind of "backup" (as Neobin already mentioned, raid or replication between nodes isn't replacing a real backup). In case of HA PVE would then automatically start the guest, based on the last replicated snapshot, on the node that not failed.

jHorrocks · Dec 13, 2022

It's usually not a good idea to create a cluster over the internet

I didnt know this, thanks for the advice and YES, you are correct, I am testing with containers and NOT VMs, my bad (sorry for the confusion).

I am limited on funds and trying to utilise 2 PC's I have at the office to "mimic" the pveProduction1 incase it goes down until I can get it back online and I have at least a close backup of data using replication but I am starting to see this isnt the case (15 mins replication losing some data isnt critical as my websites are not critical).

Ideally I need 4 servers in the datacenter, 3 as pve with HA and 1 as pbs but a new server costs around $6000+ x 4 I just cannot afford this. I feel like I am stuck with trying to get old hardware running to get a basic proxmox environment working but seems not possible.

Is there no way that I can open a replicated CT from pveOffice1 /rpool/data/subvol-104-disk-0 incase a pveProduction1 failure and once I get pveProduction1 online migrate the image back?

EDIT:
I cannot backup from pveProduction1 to pbsOffice2 at my office as some CT's images are 2TB+ so the idea to replicate from pveProduction1 to pveOffice1 is quick, then from pveOffice1 to pbsOffice2 which are on the same local network wouldnt take so long, that was my idea but like I said, I cannot see CT104 (/rpool/data/subvol-104-disk-0) in pveOffice1 so cannot backup

Dunuin · Dec 13, 2022

jHorrocks said:
I cannot backup from pveProduction1 to pbsOffice2 at my office as some CT's images are 2TB+ so the idea to replicate from pveProduction1 to pveOffice1 is quick, then from pveOffice1 to pbsOffice2 which are on the same local network wouldnt take so long, that was my idea but like I said, I cannot see CT104 (/rpool/data/subvol-104-disk-0) in pveOffice1 so cannot backup

PBS uses deduplication and only backups data that doesn't already exists. So PBS is basically only backing up what changed since the last backup. Do a hourly snapshot mode backup and there won't be much to backup, so it will be fast. I also would recommend to use VMs and not LXCs, as only VMs can make use of dirty-bitmapping when doing snapshot mode backups. When backing up a 2TB LXC it will always need to read and hash the whole 2TB, so the backup will be very slow. When backing up a 2TB VM it only needs to read and hash those parts of the virtual disks that actually got overwritten since the last backup.
Restores of cause would be very slow when restoring a 2TB guest over a 10MB/s internet connection. But here it could be a solution to have a PBS in the datacenter and a PBS in the office. You could then setup a hourly sync job that will pull the backups from the PBS in datacenter to the PBS in your office. And syncing also only needs to sync the data that changed since the last sync job/backup, so not that much to sync between the two PBS.
So you would have a very recent backup at both locations and you could then restore a guest from the PBS at that location, where the PVE server failed. That way you can always do a fast local restore from LAN instead of needing to restore over the slow internet.
And you could run unclustered PVE nodes at both locations. Its even possible to install PVE + PBS bare metal on the same server, so you could restore to the same host.

jHorrocks said:
Ideally I need 4 servers in the datacenter, 3 as pve with HA and 1 as pbs

Jup, that would be the best case. With ceph as a real shared storage.
If you don't care that much about losing the last minutes of data you could run two server in the datacenter with ZFS replication. With PVE + PBS installed bare metal on both of them. Then a super cheap VPS or dedicated server in the same datacenter that acts as a qdevice.
Then you got your HA cluster with 3 voters in the datacenter and those two PVE nodes could backup each other.

Just to give some ideas that might be worth having a deeper look into.

jHorrocks · Dec 15, 2022

Dunuin said:
PBS uses deduplication and only backups data that doesn't already exists. So PBS is basically only backing up what changed since the last backup. Do a hourly snapshot mode backup and there won't be much to backup, so it will be fast. I also would recommend to use VMs and not LXCs, as only VMs can make use of dirty-bitmapping when doing snapshot mode backups. When backing up a 2TB LXC it will always need to read and hash the whole 2TB, so the backup will be very slow. When backing up a 2TB VM it only needs to read and hash those parts of the virtual disks that actually got overwritten since the last backup.
Restores of cause would be very slow when restoring a 2TB guest over a 10MB/s internet connection. But here it could be a solution to have a PBS in the datacenter and a PBS in the office. You could then setup a hourly sync job that will pull the backups from the PBS in datacenter to the PBS in your office. And syncing also only needs to sync the data that changed since the last sync job/backup, so not that much to sync between the two PBS.
So you would have a very recent backup at both locations and you could then restore a guest from the PBS at that location, where the PVE server failed. That way you can always do a fast local restore from LAN instead of needing to restore over the slow internet.
And you could run unclustered PVE nodes at both locations. Its even possible to install PVE + PBS bare metal on the same server, so you could restore to the same host.

Jup, that would be the best case. With ceph as a real shared storage.
If you don't care that much about losing the last minutes of data you could run two server in the datacenter with ZFS replication. With PVE + PBS installed bare metal on both of them. Then a super cheap VPS or dedicated server in the same datacenter that acts as a qdevice.
Then you got your HA cluster with 3 voters in the datacenter and those two PVE nodes could backup each other.

Just to give some ideas that might be worth having a deeper look into.

Wow, some great ideas for me to now consider, I really appreciate your time in replying.

I will spend today going through each option you suggested and get back to you on what I ended up doing.

Thank you to all who helped me on this

Search

Search

Multiple nodes in Cluster why need backup?

jHorrocks

New Member

snpz

Well-Known Member

jHorrocks

New Member

snpz

Well-Known Member

Neobin

Distinguished Member

jHorrocks

New Member

Dunuin

Distinguished Member

jHorrocks

New Member

Dunuin

Distinguished Member

jHorrocks

New Member

We value your privacy