PBS is locking up

kransom

Member
Aug 22, 2023
31
1
13
My proxmox backup server is locking up frequently causing no SSH or console access. Only a hard reboot will restore communication to the server. I believe it is a kernel panic or some sort of hardware failure. I try to view the logs, but it seems they aren't capture at the time of failure. I also tried setting up iDRAC on my server, but it is not logging those events. I am wondering if my setup was prone to failing from the beginning and would like any insight on it.

My server is a Dell PowerEdge R330. My external RAID controller is a Nexsan E48. The controller is connected via 2 fiber channel cards and configured using multipath. After configuring a 42 disk RAID 6 array on the controller, a partition map was created on the multipath device using gdisk. Then an ext4 file system was created on that partition. A directory was also created to serve as a mount point for the datastore. In the PBS GUI, a datastore was created with the backing path /mnt/ext_raid.

I have 5 PVE clusters (multiple VMs and containers) and a few standalone hosts backing up to my PBS server. My server had never failed like this before adding the Nexsan, so I am assuming that is what's causing it to lock up. PBS was installed with ZFS.

proxmox-backup-manager versions
proxmox-backup-server 4.1.4-1 running version: 4.1.4
 
Hi, @kransom
If you have or can have a monitor (I mean a physical display) connected to the server, there may be some errors displayed on it when for any reason the system isn't already able to log anything to the files.
 
Hi, @kransom
If you have or can have a monitor (I mean a physical display) connected to the server, there may be some errors displayed on it when for any reason the system isn't already able to log anything to the files.
pbs-error.jpg

Let me know if I need to provide any other information. I tried looking this up before and it looks like a kernel panic.
 
Quite possible. There may be more info above the visible area. Sometimes you can scroll some more visible pages up by means of Shift+PgUp on the keyboard connected to the server (unless also the display is completely locked).

There do exist ways of finding the reason of a panic - with these messages, but I can't help from the top of my head, I'm sorry.
I think one can "google" for the exact method, though.
 
you can try the pstore interface, a serial console or a netconsole to get the full log..