Proxmox Freezing/Crashing During Backup on Dell PowerEdge R820

meisolated

New Member
Jul 20, 2023
4
0
1
Hello Proxmox Community,

I'm currently running a Proxmox VE setup on a Dell PowerEdge R820 with the following specifications:

  • CPU: 4x Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz
  • RAID Controller: PERC H710 Adapter
  • Memory: 64GB RAM
  • Storage:
    • 3x Dell Enterprise-grade SAS 900GB hard disks
    • 1x Samsung EVO SSD (1TB) where Proxmox is currently installed
I've installed Proxmox VE 8.0.4 and am running around 15 instances (2 VMs and 13 LXC containers). Out of these, 10 are running all the time.

In addition, I have another non-consumer grade PC where I've installed the Proxmox Backup Server.

Here's the challenge I'm facing: whenever I initiate backups, my Proxmox server tends to freeze or even crash. Here's what I've observed and attempted so far:

  1. I initially suspected the issue might be with my PERC H710 RAID card. It seems to heat up significantly during operations.
  2. Initially, Proxmox was installed on the SAS drives, and they were configured in RAID 1. Suspecting the RAID configuration might be a factor, I changed it to RAID 0. Unfortunately, the problem persisted.
  3. Thinking the SAS drives might be the issue, I transitioned to installing Proxmox on a separate Samsung EVO 1TB SATA SSD. Yet, the freezing/crashing issue remained.
  4. As a workaround, I tried to split the backup jobs into two separate time slots, hoping to reduce the load or any potential I/O bottlenecks. This didn't resolve the issue either.
I'm reaching out to the community to see if anyone has encountered a similar problem or can provide insights into potential solutions. I've considered several factors, from hardware to I/O, but so far, I haven't been able to pinpoint or rectify the root cause.

Any help or suggestions would be deeply appreciated. Thank you in advance!

Best regards,
 
Are you getting any errors in the logs? Either on the PVE server or the backup server? You'll want to post those here.

Hopefully you have put your SAS drives back to RAID 1 (with a spare perhaps?). Also, I would highly recommend adding another SSD and using RAID 1 for Proxmox, especially if you rely on these VMs at all. SSDs do die. It's rare, but I've had it happen.

I'm troubleshooting an issue related to backups myself which is how I found this post.
 
Are you getting any errors in the logs? Either on the PVE server or the backup server? You'll want to post those here.

Hopefully you have put your SAS drives back to RAID 1 (with a spare perhaps?). Also, I would highly recommend adding another SSD and using RAID 1 for Proxmox, especially if you rely on these VMs at all. SSDs do die. It's rare, but I've had it happen.

I'm troubleshooting an issue related to backups myself which is how I found this post.
Logs don't really provide/produce any information about that, apart from saying "backup job failed" something like that,
And yeah 3 main drives are running on RAID 1 now and 2 other drive on RAID 0 using them for CCTV storage (not running any backup job for them),
I will add another SSD and use RAID 1
I guess its some hardware issue not software because it happens randomly.

Thanks
 
It just freezes, no reboot or shutdown.
Have you tried running Memtest86+ for a few rounds or preferably overnight?

I would also run something like sysbench --threads="$(nproc)" --time=0 cpu run and watch CPU temps (say with ipmitool sensor). (apt update && apt install sysbench ipmitool)

If that succeeds you could also try running fio to stress the RAID card / disks. The goal being to trigger a freeze outside of Proxmox.

Make sure the RAID card has the latest firmware installed too.
 
Have you tried running Memtest86+ for a few rounds or preferably overnight?

I would also run something like sysbench --threads="$(nproc)" --time=0 cpu run and watch CPU temps (say with ipmitool sensor). (apt update && apt install sysbench ipmitool)

If that succeeds you could also try running fio to stress the RAID card / disks. The goal being to trigger a freeze outside of Proxmox.

Make sure the RAID card has the latest firmware installed too.
Memtest passed, 2 rounds took around 14hrs or something, tried fio as well with different settings and I did stress test on CPU previously temps seems to be fine, temps don't even touch warning figures and did update RAID firmware a week ago
I am really not able to understand where the problem is, I think im going to upgrade everything over time (hardware), maybe that will help, for now i will set backups on weekly bases and will also try backup for each VM's on 10mins difference
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!