Proxmox Freezing/Crashing During Backup on Dell PowerEdge R820

meisolated

New Member
Jul 20, 2023
6
0
1
Hello Proxmox Community,

I'm currently running a Proxmox VE setup on a Dell PowerEdge R820 with the following specifications:

  • CPU: 4x Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz
  • RAID Controller: PERC H710 Adapter
  • Memory: 64GB RAM
  • Storage:
    • 3x Dell Enterprise-grade SAS 900GB hard disks
    • 1x Samsung EVO SSD (1TB) where Proxmox is currently installed
I've installed Proxmox VE 8.0.4 and am running around 15 instances (2 VMs and 13 LXC containers). Out of these, 10 are running all the time.

In addition, I have another non-consumer grade PC where I've installed the Proxmox Backup Server.

Here's the challenge I'm facing: whenever I initiate backups, my Proxmox server tends to freeze or even crash. Here's what I've observed and attempted so far:

  1. I initially suspected the issue might be with my PERC H710 RAID card. It seems to heat up significantly during operations.
  2. Initially, Proxmox was installed on the SAS drives, and they were configured in RAID 1. Suspecting the RAID configuration might be a factor, I changed it to RAID 0. Unfortunately, the problem persisted.
  3. Thinking the SAS drives might be the issue, I transitioned to installing Proxmox on a separate Samsung EVO 1TB SATA SSD. Yet, the freezing/crashing issue remained.
  4. As a workaround, I tried to split the backup jobs into two separate time slots, hoping to reduce the load or any potential I/O bottlenecks. This didn't resolve the issue either.
I'm reaching out to the community to see if anyone has encountered a similar problem or can provide insights into potential solutions. I've considered several factors, from hardware to I/O, but so far, I haven't been able to pinpoint or rectify the root cause.

Any help or suggestions would be deeply appreciated. Thank you in advance!

Best regards,
 
Are you getting any errors in the logs? Either on the PVE server or the backup server? You'll want to post those here.

Hopefully you have put your SAS drives back to RAID 1 (with a spare perhaps?). Also, I would highly recommend adding another SSD and using RAID 1 for Proxmox, especially if you rely on these VMs at all. SSDs do die. It's rare, but I've had it happen.

I'm troubleshooting an issue related to backups myself which is how I found this post.
 
Are you getting any errors in the logs? Either on the PVE server or the backup server? You'll want to post those here.

Hopefully you have put your SAS drives back to RAID 1 (with a spare perhaps?). Also, I would highly recommend adding another SSD and using RAID 1 for Proxmox, especially if you rely on these VMs at all. SSDs do die. It's rare, but I've had it happen.

I'm troubleshooting an issue related to backups myself which is how I found this post.
Logs don't really provide/produce any information about that, apart from saying "backup job failed" something like that,
And yeah 3 main drives are running on RAID 1 now and 2 other drive on RAID 0 using them for CCTV storage (not running any backup job for them),
I will add another SSD and use RAID 1
I guess its some hardware issue not software because it happens randomly.

Thanks
 
It just freezes, no reboot or shutdown.
Have you tried running Memtest86+ for a few rounds or preferably overnight?

I would also run something like sysbench --threads="$(nproc)" --time=0 cpu run and watch CPU temps (say with ipmitool sensor). (apt update && apt install sysbench ipmitool)

If that succeeds you could also try running fio to stress the RAID card / disks. The goal being to trigger a freeze outside of Proxmox.

Make sure the RAID card has the latest firmware installed too.
 
Have you tried running Memtest86+ for a few rounds or preferably overnight?

I would also run something like sysbench --threads="$(nproc)" --time=0 cpu run and watch CPU temps (say with ipmitool sensor). (apt update && apt install sysbench ipmitool)

If that succeeds you could also try running fio to stress the RAID card / disks. The goal being to trigger a freeze outside of Proxmox.

Make sure the RAID card has the latest firmware installed too.
Memtest passed, 2 rounds took around 14hrs or something, tried fio as well with different settings and I did stress test on CPU previously temps seems to be fine, temps don't even touch warning figures and did update RAID firmware a week ago
I am really not able to understand where the problem is, I think im going to upgrade everything over time (hardware), maybe that will help, for now i will set backups on weekly bases and will also try backup for each VM's on 10mins difference
 
Were you able to find a fix for this issue? We just set up a five-node cluster with cephs on five Dell R660 nodes, all new hardware, and on four separate occasions, the servers froze. The only way to bring them back online was to reboot the physical host servers.
 
Backups to to a slow Proxmox Backup Server can slowdown source VM up to freeze or crash.
There is the fleecing function to workaround.
 
Backups to to a slow Proxmox Backup Server can slowdown source VM up to freeze or crash.
There is the fleecing function to workaround.
Thank you for the quick reply Gabriel, I'm actually backing up to a dell server with all ssd in it utilizing a 25GB NIC. The physical servers is what freezes up not the VMs. The fleecing options works well if you have slow hard drives on the PV servers but we have new Micron 5400 Pro SSD hard drives and 50 of them.
 
Were you able to find a fix for this issue? We just set up a five-node cluster with cephs on five Dell R660 nodes, all new hardware, and on four separate occasions, the servers froze. The only way to bring them back online was to reboot the physical host servers.
Not really, I tried everything, and the sad part is that after a few months, my server's RAID card died. Since it was my home server, I didn't really bother to fix it. Instead, I started using a non-Dell server, and the backups work fine on it. One thing I didn't try is running without hardware RAID; I don't think that would help, but it's something I didn't experiment with. The only solution I could see was to run some third-party backup software and manage everything manually.
 
Not really, I tried everything, and the sad part is that after a few months, my server's RAID card died. Since it was my home server, I didn't really bother to fix it. Instead, I started using a non-Dell server, and the backups work fine on it. One thing I didn't try is running without hardware RAID; I don't think that would help, but it's something I didn't experiment with. The only solution I could see was to run some third-party backup software and manage everything manually.
We have about 400 VMs on the cluster it would be hard to manage the backups manually. Thank you for the quick reply; I have a ticket open with Proxmox for this issue; if we find a solution, I will let you know.
 
We have about 400 VMs on the cluster it would be hard to manage the backups manually. Thank you for the quick reply; I have a ticket open with Proxmox for this issue; if we find a solution, I will let you know.
Alright thanks and yeah it would be really hard to do that manually.
 
We have about 400 VMs on the cluster it would be hard to manage the backups manually. Thank you for the quick reply; I have a ticket open with Proxmox for this issue; if we find a solution, I will let you know.
So, I found this thread (https://bugzilla.kernel.org/show_bug.cgi?id=199727) about how using the VirtIO SCSI Single controller with IO thread enabled and Async IO threads fixes the freezing issue for VM, it does not say anything about the host but I figured I would try it and I've changed all of my VMs to SCSI Single controller and I've enabled IO thread and changed Async IO to threads on the virtual disks and ran some high IO test on my 5 node cluster and I was not able to get any of the physical hosts to freeze.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!