Backup Error for ONE VM only - ERROR: Backup of VM 100 failed - job failed with err -125 - Operation canceled

Dec 23, 2022
6
1
3
Hi,

I need some assistance in order to identify the root cause of a backup job failing for the reason of: ERROR: Backup of VM 100 failed - job failed with err -125 - Operation canceled

PREVIOUS ISSUE IN DECEMBER AND HOW IT WAS RESOLVED

In December, I started to have problems on a backup job as well with a different error code: job failed with err -5 - Input/output error
It was resolved by reinstalling Proxmox from scratch on the server. It resolved the issue for a few weeks than it came back. I suspended the backup job, and recently the VM as it was decommissioned. Thus until last Thursday, everything was running smoothly.

DETAILS

Please find in attached file some technical information:
- Physical server info
- Basic Promox version and setup
- PVE Version -v
- Journalctl -xe (pertaining to the last occurrence)
- QM CONFIG VM 100 (affected VM)
- smartctl -a for both devices (sda and sdb)

Let me know if you need further information to assist in troubleshooting.

Regards,
 

Attachments

Last edited:
Updating the case here. I opened a ticket with DELL to confirm whether or not it was hardware-related. No issue with the hardware or the drives.
Thus, I'd appreciate inputs from anyone who faced the same issue AND identified the root cause. :)
Also, i updated to the latest on Promox and the node was restarted. I assume the BTRFS CSUM errors will not be fixex unfortunately.
Last time this issue occurred, as mentioned, I reinstalled completely and restored my VMs on it. Hoping I won't have to start from scratch again.
Although, what bugs me is not only the fact I need to resinstall, rather the lost of confidence on the system and its integrity, and whether or not I will face more serious issue in the future.
 
Was told eventually by Proxmox BTRFS is still experimental.
Anyone had similar issue and found a way to resolve it? Apart from rebuilding the node from zero?