my experience with Proxmox Backup

chrigiboy

Well-Known Member
Nov 6, 2018
93
1
48
Hello everyone,

I would like to report on Proxmox Backup. What Proxmox does well: The restore function works very well. In particular, the live restore is absolutely brilliant.

However, so far we have only needed to restore because the backup has crashed the server...

The following problems:

During the backup, the source machine may freeze. Once the machine freezes (TOP goes up to a load of over 150 points), the machine is no longer usable, very sluggish, and only improves until the backup is complete or the backup is aborted. Adjusting the backup speed to 1 Mbyte/s in the vzconf does not help.

Kernel panics and blue screens may occur on the source machine during the backup.

File loss and a corrupted file system may occur during the backup.

During the backup, data loss or kernel panics may occur on all machines on the same server.

If a Proxmox backup server is set up newly, it works well for about 30 days, and then the problems suddenly start.

Having a separate backup server for each Proxmox also does not help. Limiting of vzdump.conf with bwlimit: 5000 will also not help

We have used Dell, HP, and custom-built servers for host systems as well as for Proxmox backup servers. The problem occurs with all systems, even if the host system has NVME or RAID controllers with SSD or HDDs.

In the Proxmox management interface, "Connection Timed Out" appears constantly, and you have to click on the respective backup server 10 times before anything happens, even if there is no load or no backups running in the background.

Very often, individual servers cannot be backed up at all. Then this error message appears: "() TASK ERROR: could not activate storage 'Pxb2': Pxb2: error fetching datastores - 500 read timeout"

One problem that I also have is that when I copy an HDD from SMB to a Proxmox server that uses more than 50% of RAM, other machines on the same Proxmox server frequently experience data loss. This problem occurs with all Proxmox servers with NVME, as well as with all Dell servers with RAID controllers and SSD hard drives.

What's going on there? Are there any solutions to these problems?
 
Hi,

did you try reducing the max-worker count in the performance setting to 8 or 4 [1]? That may help with the VMs freezing.

The issues regarding 500er errors sound more like a network error or it may be related to your PBS setup. Can you maybe post a part of the syslog from the PBS side?

One problem that I also have is that when I copy an HDD from SMB to a Proxmox server that uses more than 50% of RAM, other machines on the same Proxmox server frequently experience data loss. This problem occurs with all Proxmox servers with NVME, as well as with all Dell servers with RAID controllers and SSD hard drives.
That sounds odd. Can you possibly provide a clearer picture of your storage situation here?

Since you have a subscription you could also open a ticket over in our support portal if your subscription level isn't just "Community" [2].

[1]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#vzdump_configuration
[2]: https://my.proxmox.com/en
 
Thank you for your reply. I changed from 64 workers to 4.


Here is my log:

Apr 17 10:19:49 bx2 proxmox-backup-proxy[432738]: Upload size: 272629760 (77%)
Apr 17 10:19:49 bx2 proxmox-backup-proxy[432738]: Duplicates: 19+3 (26%)
Apr 17 10:19:49 bx2 proxmox-backup-proxy[432738]: Compression: 21%
Apr 17 10:19:49 bx2 proxmox-backup-proxy[432738]: successfully closed fixed index 1
Apr 17 10:19:49 bx2 proxmox-backup-proxy[432738]: add blob "/backup/vm/238/2023-04-17T08:19:12Z/index.json.blob" (379 bytes, comp: 379)
Apr 17 10:19:49 bx2 proxmox-backup-proxy[432738]: syncing filesystem
Apr 17 10:19:54 bx2 proxmox-backup-proxy[432738]: successfully finished backup
Apr 17 10:19:54 bx2 proxmox-backup-proxy[432738]: backup finished successfully
Apr 17 10:19:54 bx2 proxmox-backup-proxy[432738]: TASK OK
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: retention options: --keep-daily 25
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: Starting prune on datastore 'Backup', root namespace group "vm/238"
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-16T09:49:30Z remove
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: removing backup snapshot "/backup/vm/238/2023-03-16T09:49:30Z"
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-17T10:35:33Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-18T08:55:07Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-19T10:01:43Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-21T13:36:51Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-22T10:22:33Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-23T08:34:52Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-24T09:42:30Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-25T09:41:35Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-27T12:32:38Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-28T09:10:35Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-29T08:32:13Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-30T09:27:22Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-03-31T09:16:58Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-01T08:13:18Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-03T12:14:38Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-04T08:41:30Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-05T08:30:07Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-06T09:00:16Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-07T09:14:27Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-08T07:43:03Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-10T12:19:52Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-14T19:59:58Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-15T06:35:35Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-16T10:13:06Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: vm/238/2023-04-17T08:19:12Z keep
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: TASK OK
Apr 17 10:19:55 bx2 proxmox-backup-proxy[432738]: Upload backup log to datastore 'Backup', root namespace vm/238/2023-04-17T08:19:12Z/client.log.blob
Apr 17 10:19:59 bx2 proxmox-backup-proxy[432738]: starting new backup on datastore 'Backup': "vm/249/2023-04-17T08:19:36Z"
Apr 17 10:19:59 bx2 proxmox-backup-proxy[432738]: download 'index.json.blob' from previous backup.
Apr 17 10:19:59 bx2 proxmox-backup-proxy[432738]: register chunks in 'drive-virtio0.img.fidx' from previous backup.
Apr 17 10:19:59 bx2 proxmox-backup-proxy[432738]: download 'drive-virtio0.img.fidx' from previous backup.
Apr 17 10:19:59 bx2 proxmox-backup-proxy[432738]: created new fixed index 1 ("vm/249/2023-04-17T08:19:36Z/drive-virtio0.img.fidx")
Apr 17 10:19:59 bx2 proxmox-backup-proxy[432738]: add blob "/backup/vm/249/2023-04-17T08:19:36Z/qemu-server.conf.blob" (543 bytes, comp: 543)
Apr 17 10:21:02 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_00] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 81 to 82
Apr 17 10:21:02 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_01] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 70 to 74
Apr 17 10:21:02 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_04] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 84 to 71
Apr 17 10:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_05] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 77 to 78
Apr 17 10:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_06] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 79 to 80
Apr 17 10:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_07] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 80 to 81
Apr 17 10:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_07] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 57 to 56
Apr 17 10:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_07] [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 43 to 44
Apr 17 10:27:58 bx2 proxmox-backup-proxy[432738]: write rrd data back to disk
Apr 17 10:27:58 bx2 proxmox-backup-proxy[432738]: starting rrd data sync
Apr 17 10:27:58 bx2 proxmox-backup-proxy[432738]: rrd journal successfully committed (25 files in 0.166 seconds)
Apr 17 10:51:02 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_01] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 74 to 76
Apr 17 10:51:02 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_02] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 83 to 84
Apr 17 10:51:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_03] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 82 to 83
Apr 17 10:51:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_04] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 71 to 74
Apr 17 10:51:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_05] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 78 to 79
Apr 17 10:57:59 bx2 proxmox-backup-proxy[432738]: write rrd data back to disk
Apr 17 10:57:59 bx2 proxmox-backup-proxy[432738]: starting rrd data sync
Apr 17 10:58:00 bx2 proxmox-backup-proxy[432738]: rrd journal successfully committed (25 files in 0.747 seconds)
Apr 17 11:17:01 bx2 CRON[743562]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Apr 17 11:21:02 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_01] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 76 to 77
Apr 17 11:21:02 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_02] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 84 to 68
Apr 17 11:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_04] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 74 to 76
Apr 17 11:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_05] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 79 to 80
Apr 17 11:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_06] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 80 to 81
Apr 17 11:21:03 bx2 smartd[638]: Device: /dev/bus/2 [megaraid_disk_07] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 81 to 82
Apr 17 11:28:01 bx2 proxmox-backup-proxy[432738]: write rrd data back to disk
Apr 17 11:28:01 bx2 proxmox-backup-proxy[432738]: starting rrd data sync
Apr 17 11:28:01 bx2 proxmox-backup-proxy[432738]: rrd journal successfully committed (25 files in 0.356 seconds)
Apr 17 11:44:39 bx2 systemd[1]: Created slice User Slice of UID 0.
Apr 17 11:44:39 bx2 systemd[1]: Starting User Runtime Directory /run/user/0...
Apr 17 11:44:40 bx2 systemd[1]: Finished User Runtime Directory /run/user/0.
Apr 17 11:44:40 bx2 systemd[1]: Starting User Manager for UID 0...
Apr 17 11:44:42 bx2 systemd[743580]: Queued start job for default target Main User Target.
Apr 17 11:44:42 bx2 systemd[743580]: Created slice User Application Slice.
Apr 17 11:44:42 bx2 systemd[743580]: Reached target Paths.
Apr 17 11:44:42 bx2 systemd[743580]: Reached target Timers.
Apr 17 11:44:42 bx2 systemd[743580]: Listening on GnuPG network certificate management daemon.
Apr 17 11:44:42 bx2 systemd[743580]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Apr 17 11:44:42 bx2 systemd[743580]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Apr 17 11:44:42 bx2 systemd[743580]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Apr 17 11:44:42 bx2 systemd[743580]: Listening on GnuPG cryptographic agent and passphrase cache.
Apr 17 11:44:42 bx2 systemd[743580]: Reached target Sockets.
Apr 17 11:44:42 bx2 systemd[743580]: Reached target Basic System.
Apr 17 11:44:42 bx2 systemd[743580]: Reached target Main User Target.
Apr 17 11:44:42 bx2 systemd[743580]: Startup finished in 1.639s.
Apr 17 11:44:42 bx2 systemd[1]: Started User Manager for UID 0.
Apr 17 11:44:42 bx2 systemd[1]: Started Session 4514 of user root.
 
If a Proxmox backup server is set up newly, it works well for about 30 days, and then the problems suddenly start.
Is it rotating rust?

If that's the case then this anecdote from my own experience may help you to seek the reason: Initially I had slow harddisks because my idea was "speed does not matter, data integrity does". And in the beginning it worked fine! Already slow, but without any failure symptoms. When PBS got some TB of actual content the simple access to list the content of the datastore was too slow and it would produce error messages due to timeouts. At the beginning a second try would work - probably because the content listing was cached partially now. Later on it was just unusable.

So my impression is that this is expected behavior. The official recommendation is "Use only SSDs..." for a reason - https://pbs.proxmox.com/docs/installation.html#recommended-server-system-requirements

Good luck.
 
Hello together.
Although I set the max-worker to 4, it came now tonight again to hanging servers.
We have 2 systems with 100 Tbyte storage each. About 50% occupied. Of course with HDD, because we can not put so many SSD in the server ;-)
 
Hello together.
Although I set the max-worker to 4, it came now tonight again to hanging servers.
We have 2 systems with 100 Tbyte storage each. About 50% occupied. Of course with HDD, because we can not put so many SSD in the server ;-)
Can't or don't want? There are single SSDs with 100TB capacity so you could even fit 400TB of hotswappable SSD-only storage in a 1U case ;)

At least you need a pair of SSDs to store the metadata. Then listing the datastore content or doing a GC isn't a problem anymore.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!