Backups and DR Testing

Sep 26, 2023
62
1
8
Hello all.

I have a cluster with storage replication going on between the servers and backups at the main site using PBR. I also have a remote site that has a single server with PBR loaded as a VM on it. I have configured the remote site to replicate the backups from the main site and that seems to be working. I had thought that I would be able to utilize the remote sites PBR to either 'import or restore' the servers on the remote site so that I would have a DR/remote server with my images in case of a main site issue.

However, since the remote site doesn't know about the backups, only the backups that have been done on that site - I don't have access to those images for restoration, should I need. The remote site is configured in this manner: PVE with a large zfs pool. PBR is loaded as a vm on it and I have added a 2nd disk to the pbr server (hardware), from the PVE pool - created a zfs pool for that data, and called it backup. I'm sure it's just a matter of making a data-store connection, referencing that zfs volume, but not sure how to do it. I can just use an external NAS for my backups and utilize the tools within it to replicate to another NAS at the remote site but wouldn't think I'd need to utilize some type of 3rd party solution for Proxmox. An simple option, if possible - might be to 'map' the zfs-pool from the pbs server, as a directory or datastore somehow for the pve server to utilize and then be able to 'restore' servers as needed.

If this can't be done simply or easily within the software, then how can i replicate my servers - to another site, and have them available in case needed? I haven't found any documentation about what Proxmox refers to as a DR solution other than HA. HA is good for local servers but there still needs to be some type of solution needed for a remote site/DR environment. I suppose I could continue to do the backup replication and some other type of 'zfs send' solution' but that seems to be duplicating data traffic and not very efficient. We are required to have, and do a DR test, several times a year so any suggestions are welcome.

thanks -
 
Hi,

It seems that you use a slightly different terminology: If you refrerr to PBR I assume you mean a Proxmox Backup Server instance (please use PBS as acronym for that) and in case of replicating backups you actually mean what is a Remote Sync Job in PBS, see https://pbs.proxmox.com/docs/managing-remotes.html#sync-jobs

Also, not sure if I understood you current setup correctly:
So you have the main site, which is a cluster with storage replication (that was set up in the other thread if I remember correctly?). Further, you have another host with Proxmox VE installed and a PBS instance as a VM, which pulls the backups via a Remote Sync Job from a different Proxmox Backup Server instance and would like to make these accessible to the Proxmox VE host the PBS VM runs on for restore. Well, if that is the casue you can simply add a new storage of type Proxmox Backup Server to the Proxmox VE host, so the backups should be accessible (given you have the storage set up using the correct user, token/password, permission and datastrore/namespace).
 
Hi,

It seems that you use a slightly different terminology: If you refrerr to PBR I assume you mean a Proxmox Backup Server instance (please use PBS as acronym for that) and in case of replicating backups you actually mean what is a Remote Sync Job in PBS, see https://pbs.proxmox.com/docs/managing-remotes.html#sync-jobs

Also, not sure if I understood you current setup correctly:
So you have the main site, which is a cluster with storage replication (that was set up in the other thread if I remember correctly?). Further, you have another host with Proxmox VE installed and a PBS instance as a VM, which pulls the backups via a Remote Sync Job from a different Proxmox Backup Server instance and would like to make these accessible to the Proxmox VE host the PBS VM runs on for restore. Well, if that is the casue you can simply add a new storage of type Proxmox Backup Server to the Proxmox VE host, so the backups should be accessible (given you have the storage set up using the correct user, token/password, permission and datastrore/namespace).
 
Apologies for any clarity. Here is the configuration that I currently have in place.

Site A

2 servers in a cluster. 1 running pve and the other pve with an instance of pbs as a vm. The primary server is configured to sync the datastore over to the 2nd server. This is working as scheduled. The pbs server is backing up the data as configured. I can see both the datastore with the backups as well as have the restore option either for the whole vm, or file level. I have backups here, can restore and can migrate over to the backup server at this site w/o issues and I think this is because I am backing up the files and can see those – as well as the backup (blob) info. I believe the backups are the actual .qcow2 files along with the snapshots – and the (blob) info from the backups with updates within them.



Site B

1 server. It is loaded with pve and an instance of pbs as a vm. I have configured a sync job on this server, which pulls from Site-A, the backups so that I have copies of my backups at the remote site. I was expecting to see the actual files instead of ‘backups of my backup files. I am not able to see, either from the pve or pbr the database that has the backups in them tho. Kind of confusing but I think I have copies of the backups – from the other site (Site A) but I don’t have the ‘images’ of the backups so I am unable to restore either the whole machine, or an individual file. Make sense?



Part of a DR solution is to have backups offsite and be able to recover from a main site by loading your ‘most current’ images remotely. I can’t seem to do this currently and need to. In reviewing the documentation about ‘sync’ – it seems like I have the job configured to sync – the backups that were done – and not the actual backup jobs.

I need to be able to see the actual backups that were done – on Site B. The only way I think I can do a restoration is if I have the actual back that was done, and not just the ‘blob’ info associated with the backup process. Trying to open a datastore – from another site, over a VPN doesn’t seem like a practical solution but rather to have the datastore – from the primary site replicated over to the remote site. Perhaps if I could see that datastore – on my sync job – then I could select it and have it replicated over.

Let me know if there are any configs you need to see.
 
I was expecting to see the actual files instead of ‘backups of my backup files. I am not able to see, either from the pve or pbr the database that has the backups in them tho. Kind of confusing but I think I have copies of the backups – from the other site (Site A) but I don’t have the ‘images’ of the backups so I am unable to restore either the whole machine, or an individual file. Make sense?
A sync job pulls the snapshots and their corresponding chunks from the remote Proxmox Backup Server instance (limited by what is already present and what should be synced according to the sync jobs settings). So after the sync, you should see the snapshots in the host performing the sync in the same way as they where on the remote, providing the same restore and file restore capabilities as for the remote.

How does your sync job look like? Please share the output of cat /etc/proxmox-backup/sync.cfg and what is the output of proxmox-backup-client snapshot list.
 
The sync.cfg shows info - but the 'snapshot list' doesn't. maybe i haven't done the correct command?
both commands were done on the pbs server - which i believe where you wanted them done.

root@nocpbs:~# cat /etc/proxmox-backup/sync.cfg
sync: s-62ba515d-92ce
ns
owner root@pam
remote corpoffice
remote-ns
remote-store backups
remove-vanished false
schedule hourly
store backup
root@nocpbs:~# ^C
root@nocpbs:~#
root@nocpbs:~# proxmox-backup-client snapshot list
Error: unable to get (default) repository
 
Last edited:
The sync.cfg shows info - but the 'snapshot list' doesn't. maybe i haven't done the correct command?
both commands were done on the pbs server - which i believe where you wanted them done.

root@nocpbs:~# cat /etc/proxmox-backup/sync.cfg
sync: s-62ba515d-92ce
ns
owner root@pam
remote corpoffice
remote-ns
remote-store backups
remove-vanished false
schedule hourly
store backup
root@nocpbs:~# ^C
root@nocpbs:~#
root@nocpbs:~# proxmox-backup-client snapshot list
Error: unable to get (default) repository
I've started getting this error also in my backup log -

sync group vm/139 failed - owner check failed (root@pam != cbackupid@pbs)

the rest of the job seems to be working - just something with that vm - I created a user account (cbackupid) and have been trying to use it for the backups instead of using 'root'. I've checked the permissions and believe 'all' should be given to this account but am unsure what would be the issue as 'root' would superceede that users' account permissions.


2024-03-21T10:00:00-04:00: Starting datastore sync job 'corpoffice:backups:backup::s-62ba515d-92ce'
2024-03-21T10:00:00-04:00: task triggered by schedule 'hourly'
2024-03-21T10:00:00-04:00: sync datastore 'backup' from 'corpoffice/backups'
2024-03-21T10:00:00-04:00: ----
2024-03-21T10:00:00-04:00: Syncing datastore 'backups', root namespace into datastore 'backup', root namespace
2024-03-21T10:00:00-04:00: found 18 groups to sync
2024-03-21T10:00:00-04:00: sync group vm/139 failed - owner check failed (root@pam != cbackupid@pbs)
2024-03-21T10:00:00-04:00: re-sync snapshot vm/200/2024-03-20T14:22:19Z
2024-03-21T10:00:00-04:00: no data changes
2024-03-21T10:00:00-04:00: percentage done: 11.11% (2/18 groups)
2024-03-21T10:00:00-04:00: skipped: 5 snapshot(s) (2024-03-16T00:30:42Z .. 2024-03-20T00:30:48Z) - older than the newest local snapshot
2024-03-21T10:00:00-04:00: re-sync snapshot vm/201/2024-03-21T00:30:56Z
2024-03-21T10:00:00-04:00: no data changes
2024-03-21T10:00:00-04:00: percentage done: 16.67% (3/18 groups)
2024-03-21T10:00:00-04:00: skipped: 5 snapshot(s) (2024-03-16T00:31:02Z .. 2024-03-20T00:31:08Z) - older than the newest local snapshot
2024-03-21T10:00:00-04:00: re-sync snapshot vm/202/2024-03-21T00:31:19Z
2024-03-21T10:00:00-04:00: no data changes
2024-03-21T10:00:00-04:00: percentage done: 22.22% (4/18 groups)
2024-03-21T10:00:00-04:00: skipped: 5 snapshot(s) (2024-03-16T00:31:23Z .. 2024-03-20T00:31:30Z) - older than the newest local snapshot
2024-03-21T10:00:00-04:00: re-sync snapshot vm/203/2024-03-21T00:31:41Z
2024-03-21T10:00:00-04:00: no data changes
2024-03-21T10:00:00-04:00: percentage done: 27.78% (5/18 groups)
2024-03-21T10:00:01-04:00: skipped: 5 snapshot(s) (2024-03-16T00:32:02Z .. 2024-03-20T00:31:48Z) - older than the newest local snapshot
2024-03-21T10:00:01-04:00: re-sync snapshot vm/204/2024-03-21T00:32:00Z
2024-03-21T10:00:01-04:00: no data changes
2024-03-21T10:00:01-04:00: percentage done: 33.33% (6/18 groups)
2024-03-21T10:00:01-04:00: skipped: 5 snapshot(s) (2024-03-16T00:32:39Z .. 2024-03-20T00:32:28Z) - older than the newest local snapshot
2024-03-21T10:00:01-04:00: re-sync snapshot vm/205/2024-03-21T00:32:58Z
2024-03-21T10:00:01-04:00: no data changes
2024-03-21T10:00:01-04:00: percentage done: 38.89% (7/18 groups)
2024-03-21T10:00:01-04:00: skipped: 5 snapshot(s) (2024-03-16T00:33:52Z .. 2024-03-20T00:33:16Z) - older than the newest local snapshot
2024-03-21T10:00:01-04:00: re-sync snapshot vm/207/2024-03-21T00:34:23Z
2024-03-21T10:00:01-04:00: no data changes
2024-03-21T10:00:01-04:00: percentage done: 44.44% (8/18 groups)
2024-03-21T10:00:01-04:00: skipped: 5 snapshot(s) (2024-03-16T00:34:39Z .. 2024-03-20T00:34:21Z) - older than the newest local snapshot
2024-03-21T10:00:01-04:00: re-sync snapshot vm/208/2024-03-21T00:35:37Z
2024-03-21T10:00:01-04:00: no data changes
2024-03-21T10:00:01-04:00: percentage done: 50.00% (9/18 groups)
2024-03-21T10:00:01-04:00: skipped: 5 snapshot(s) (2024-03-16T00:34:54Z .. 2024-03-20T00:34:54Z) - older than the newest local snapshot
2024-03-21T10:00:01-04:00: re-sync snapshot vm/209/2024-03-21T00:36:10Z
2024-03-21T10:00:01-04:00: no data changes
2024-03-21T10:00:01-04:00: percentage done: 55.56% (10/18 groups)
2024-03-21T10:00:02-04:00: skipped: 5 snapshot(s) (2024-03-16T00:35:47Z .. 2024-03-20T00:35:46Z) - older than the newest local snapshot
2024-03-21T10:00:02-04:00: re-sync snapshot vm/211/2024-03-21T00:37:02Z
2024-03-21T10:00:02-04:00: no data changes
2024-03-21T10:00:02-04:00: percentage done: 61.11% (11/18 groups)
2024-03-21T10:00:02-04:00: skipped: 5 snapshot(s) (2024-03-16T00:36:26Z .. 2024-03-20T00:36:39Z) - older than the newest local snapshot
2024-03-21T10:00:02-04:00: re-sync snapshot vm/212/2024-03-21T00:37:55Z
2024-03-21T10:00:02-04:00: no data changes
2024-03-21T10:00:02-04:00: percentage done: 66.67% (12/18 groups)
2024-03-21T10:00:02-04:00: skipped: 5 snapshot(s) (2024-03-16T00:37:07Z .. 2024-03-20T00:38:49Z) - older than the newest local snapshot
2024-03-21T10:00:02-04:00: re-sync snapshot vm/217/2024-03-21T00:39:10Z
2024-03-21T10:00:02-04:00: no data changes
2024-03-21T10:00:02-04:00: percentage done: 72.22% (13/18 groups)
2024-03-21T10:00:02-04:00: skipped: 5 snapshot(s) (2024-03-16T00:37:32Z .. 2024-03-20T00:39:50Z) - older than the newest local snapshot
2024-03-21T10:00:02-04:00: re-sync snapshot vm/218/2024-03-21T00:40:26Z
2024-03-21T10:00:02-04:00: no data changes
2024-03-21T10:00:02-04:00: percentage done: 77.78% (14/18 groups)
2024-03-21T10:00:02-04:00: skipped: 5 snapshot(s) (2024-03-16T00:38:19Z .. 2024-03-20T00:41:04Z) - older than the newest local snapshot
2024-03-21T10:00:02-04:00: re-sync snapshot vm/241/2024-03-21T00:42:04Z
2024-03-21T10:00:02-04:00: no data changes
2024-03-21T10:00:02-04:00: percentage done: 83.33% (15/18 groups)
2024-03-21T10:00:03-04:00: skipped: 5 snapshot(s) (2024-03-16T00:38:52Z .. 2024-03-20T00:41:45Z) - older than the newest local snapshot
2024-03-21T10:00:03-04:00: re-sync snapshot vm/301/2024-03-21T00:42:47Z
2024-03-21T10:00:03-04:00: no data changes
2024-03-21T10:00:03-04:00: percentage done: 88.89% (16/18 groups)
2024-03-21T10:00:03-04:00: skipped: 5 snapshot(s) (2024-03-16T00:38:57Z .. 2024-03-20T00:41:55Z) - older than the newest local snapshot
2024-03-21T10:00:03-04:00: re-sync snapshot vm/326/2024-03-21T00:43:12Z
2024-03-21T10:00:03-04:00: no data changes
2024-03-21T10:00:03-04:00: percentage done: 94.44% (17/18 groups)
2024-03-21T10:00:03-04:00: skipped: 5 snapshot(s) (2024-03-16T00:39:15Z .. 2024-03-20T00:42:14Z) - older than the newest local snapshot
2024-03-21T10:00:03-04:00: re-sync snapshot vm/335/2024-03-21T00:43:31Z
2024-03-21T10:00:03-04:00: no data changes
2024-03-21T10:00:03-04:00: percentage done: 100.00% (18/18 groups)
2024-03-21T10:00:03-04:00: Finished syncing namespace , current progress: 17 groups, 1 snapshots
2024-03-21T10:00:03-04:00: TASK ERROR: sync failed with some errors.
 
The sync.cfg shows info - but the 'snapshot list' doesn't. maybe i haven't done the correct command?
both commands were done on the pbs server - which i believe where you wanted them done.

root@nocpbs:~# cat /etc/proxmox-backup/sync.cfg
sync: s-62ba515d-92ce
ns
owner root@pam
remote corpoffice
remote-ns
remote-store backups
remove-vanished false
schedule hourly
store backup
root@nocpbs:~# ^C
root@nocpbs:~#
root@nocpbs:~# proxmox-backup-client snapshot list
Error: unable to get (default) repository
For the second command to work you will have to provide the default repository and credentials either via the cli or by setting the environment variables. See https://pbs.proxmox.com/docs/backup-client.html

Can you also share the latest sync job task log?

Edit: You were faster.
 
Last edited:
For the second command to work you will have to provide the default repository and credentials either via the cli or by setting the environment variables. See https://pbs.proxmox.com/docs/backup-client.html

Can you also share the latest sync job task log?

Edit: You were faster.
I fixed the permissions - via the gui for vm/139 and the sync is working again now.

how can i fix/create a sync job to actually send the backups - and not the copies of the backups done?
 
Regarding the error with the incorrect owner, you can fix this by setting the correct ownership as described here https://pbs.proxmox.com/docs/backup-client.html#changing-the-owner-of-a-backup-group.

So according to the task log, the backup snapshots get synced just fine, except for the older ones. So how did you experience that the backups are not restorable/accessable? What were your exact steps?
Whenever I do a restore - either file or whole image, this is my process.

1. go to the pve server.
2. select the storage having my backups and select the vm i want to restore, or file restore.
3. choose the option and the process completes.
---- - Site A works fine as the backups are done there and the images are there.

on the remote server, or Site B

1. go to the pve server
2. i don't have the storage 'location' there from the main site, or Site A.

Do I need to add that (remote) pbs server to site B in order to see those files? I thought I'd have to have only 1 pbs server at the remote site and the backups would be replicated from there - in a restorable format for site B.

maybe the attached file will provide more info -
 
Ok.
I added the Site A pbs to Site B.
I can now see the backups and have the option to file/whole restore. This will work - till the connection isn't there so not really much help as i'd expected to have that info 'locally' at Site B. I also don't have the option to restore the server - to the remote (site b).

Issues still remaining.
1. is is possible, w/o using the datastore from Site A, on Site B to restore files and images?
2. I haven't found a method which has has the following: sync vm images from Site A to Site B. This is needed if site A goes down and I need Site B in a DR situation. If those files can be syn'd then I should be able to power them on 'remotely' (on site B) to check for accuracy and if need - to have the users access that data while Site A is repaired.
3. if the 2 sites are truly separate (not part of a cluster) then it seems like the 'gui' option for the datastore isn't possible. if I could add a (remote) server to this environment and sync between the 2 different sites, then I think this would work as I'd have 'images' (not just backups) that I can use on Site B (DR Site).

Is there some other cli-manual sync process of doing this outside the gui? I found something called pve-zsync which might work, but not sure.

I'm sure its not just me who want's to have a 'working copy' of your servers offsite somewhere for DR purposes.
 
Whenever I do a restore - either file or whole image, this is my process.

1. go to the pve server.
2. select the storage having my backups and select the vm i want to restore, or file restore.
3. choose the option and the process completes.
---- - Site A works fine as the backups are done there and the images are there.

on the remote server, or Site B

1. go to the pve server
2. i don't have the storage 'location' there from the main site, or Site A.

Do I need to add that (remote) pbs server to site B in order to see those files? I thought I'd have to have only 1 pbs server at the remote site and the backups would be replicated from there - in a restorable format for site B.

maybe the attached file will provide more info -
You will have to add the Proxmox Backup Server of Site B as storage to the Proxmox VE host at side B. Than you can restore the snapshots from there, no need to access site A, as the sync job pulls the snapshots from site A to site B. So if site A is not available, you still can restore from PBS on site B to the PVE host on site B.

Of course you will have to make sure the user/permission of the PVE host on site B has access to the snapshots on site B. That might be your issue here.
 
Last edited:
ok. A couple of follow up items.
i added and can see/restore now. was missing that 'local' datastore info somehow. DOLT !!!!
I was able to restore a couple of servers, and after adding a virtual bridge - get the 2 machines to talk with each other which is what i would want to verify DR replication works, vm's are restorable, and able to talk to each other.

After doing this - i have to remove them from Site B pve config because there doesn't seem to be a way to 'overwrite' an existing vm. Say I restore vm/100 - do some tests and all is ok. then 2 weeks later i need to do the same process - but with updated info within the vm. i have to remove vm/100 from the pve environment, then restore it again. Is there any way other than this process to 'over-write' a previous image and use the same name?
 
ok. A couple of follow up items.
i added and can see/restore now. was missing that 'local' datastore info somehow. DOLT !!!!
I was able to restore a couple of servers, and after adding a virtual bridge - get the 2 machines to talk with each other which is what i would want to verify DR replication works, vm's are restorable, and able to talk to each other.

After doing this - i have to remove them from Site B pve config because there doesn't seem to be a way to 'overwrite' an existing vm. Say I restore vm/100 - do some tests and all is ok. then 2 weeks later i need to do the same process - but with updated info within the vm. i have to remove vm/100 from the pve environment, then restore it again. Is there any way other than this process to 'over-write' a previous image and use the same name?
You can restore a backup over an existing VM by selecting the VM in the WebUI, going to the Backup panel, selecting the PBS storage from the top right and restore one of the snapshots over the existing VM. Note that this will wipe all the data of the VM.
 
Chris,

For some reason my storage is gone now and my backups, obsiviously are failing. I can't figure out how to add that storage back - how to locate it, or add other storage. I hope to be able to add that storage back so I don't loose those days of backups. If not - what is the secret of adding storage - from server B - to the pbr at site 2? Whenever I enter the creds, I'm not able to see the 'datastore' from site b's pbr server.
 
Chris,

For some reason my storage is gone now and my backups, obsiviously are failing. I can't figure out how to add that storage back - how to locate it, or add other storage. I hope to be able to add that storage back so I don't loose those days of backups. If not - what is the secret of adding storage - from server B - to the pbr at site 2? Whenever I enter the creds, I'm not able to see the 'datastore' from site b's pbr server.

Please check both, the systemd journal on the PVE host as well as the journal on the PBS for errors. Can the PVE host reach the PBS host?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!