ERROR PROXMOX DELL POWER EDGE R640

Jul 5, 2018
1
1
8
33
I have installed Proxmox on a Dell POWER Edge r640 server with the following configuration:
one raid 1, 2 disks of 300GB (the operating system is installed here), one raid 0
between 2 SSD disks of 480GB (i have a data base operating here), and a HHD disk
of 2TB, the problem is that at least two times in the day the server gets
completely disabled, which means that it doesnt answer to any of the network cards
and the services installed on the virtual systems.

This error lasts like 5 minutes, after that everything comes back to normal.
The error begin 12:20:00 and finally 12:22:52
I was checking the syslog and in the moment that error comes up, the syslong shows
this:

Apr 03 12:20:00 SrvProxDellTDC systemd[1]: Starting Proxmox VE replication runner...
Apr 03 12:22:06 SrvProxDellTDC kernel: sd 0:2:1:0: [sdb] tag#1 task abort called for scmd(00000000e55e546f)
Apr 03 12:22:06 SrvProxDellTDC kernel: sd 0:2:1:0: [sdb] tag#1 CDB: Write(10) 2a 00 68 e0 2e 98 00 00 10 00
Apr 03 12:22:06 SrvProxDellTDC kernel: sd 0:2:1:0: task abort: FAILED scmd(00000000e55e546f)
Apr 03 12:22:06 SrvProxDellTDC kernel: sd 0:2:1:0: [sdb] tag#0 task abort called for scmd(00000000492cb9aa)
Apr 03 12:22:06 SrvProxDellTDC kernel: sd 0:2:1:0: [sdb] tag#0 CDB: Write(10) 2a 00 00 2e b6 18 00 00 08 00
Apr 03 12:22:06 SrvProxDellTDC kernel: sd 0:2:1:0: task abort: FAILED scmd(00000000492cb9aa)
Apr 03 12:22:08 SrvProxDellTDC kernel: sd 0:2:2:0: [sdc] tag#2 task abort called for scmd(0000000006e22167)
Apr 03 12:22:08 SrvProxDellTDC kernel: sd 0:2:2:0: [sdc] tag#2 CDB: Write(10) 2a 00 37 c9 a7 78 00 00 10 00
Apr 03 12:22:08 SrvProxDellTDC kernel: sd 0:2:2:0: task abort: FAILED scmd(0000000006e22167)
Apr 03 12:22:10 SrvProxDellTDC kernel: sd 0:2:0:0: [sda] tag#4 task abort called for scmd(00000000ea1dcd05)
Apr 03 12:22:10 SrvProxDellTDC kernel: sd 0:2:0:0: [sda] tag#4 CDB: Write(10) 2a 00 05 d8 aa b0 00 00 08 00
Apr 03 12:22:10 SrvProxDellTDC kernel: sd 0:2:0:0: task abort: FAILED scmd(00000000ea1dcd05)
Apr 03 12:22:10 SrvProxDellTDC kernel: sd 0:2:0:0: [sda] tag#3 task abort called for scmd(000000004aff976f)
Apr 03 12:22:10 SrvProxDellTDC kernel: sd 0:2:0:0: [sda] tag#3 CDB: Write(10) 2a 00 05 4f ea 08 00 00 18 00
Apr 03 12:22:10 SrvProxDellTDC kernel: sd 0:2:0:0: task abort: FAILED scmd(000000004aff976f)
Apr 03 12:22:52 SrvProxDellTDC kernel: sd 0:2:1:0: target reset called for scmd(00000000e55e546f)
Apr 03 12:22:52 SrvProxDellTDC kernel: sd 0:2:1:0: [sdb] tag#1 megasas: target reset FAILED!!
Apr 03 12:22:52 SrvProxDellTDC kernel: sd 0:2:2:0: target reset called for scmd(0000000006e22167)
Apr 03 12:22:52 SrvProxDellTDC kernel: sd 0:2:2:0: [sdc] tag#2 megasas: target reset FAILED!!
Apr 03 12:22:52 SrvProxDellTDC kernel: sd 0:2:0:0: target reset called for scmd(00000000ea1dcd05)
Apr 03 12:22:52 SrvProxDellTDC kernel: sd 0:2:0:0: [sda] tag#4 megasas: target reset FAILED!!
Apr 03 12:22:52 SrvProxDellTDC kernel: sd 0:2:0:0: [sda] tag#3 Controller reset is requested due to IO timeout
SCSI command pointer: (000000004aff976f) SCSI host state: 5 SCSI
Apr 03 12:22:52 SrvProxDellTDC kernel: IO request frame:

Apr 03 12:22:52 SrvProxDellTDC kernel: f1000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 35a00120
Apr 03 12:22:52 SrvProxDellTDC kernel: 00600002
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000020
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00003000
Apr 03 12:22:52 SrvProxDellTDC kernel:

Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 0000000a
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 01000000
Apr 03 12:22:52 SrvProxDellTDC kernel:

Apr 03 12:22:52 SrvProxDellTDC kernel: 4f05002a
Apr 03 12:22:52 SrvProxDellTDC kernel: 000008ea
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000018
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel:

Apr 03 12:22:52 SrvProxDellTDC kernel: 00140012
Apr 03 12:22:52 SrvProxDellTDC kernel: 0000000a
Apr 03 12:22:52 SrvProxDellTDC kernel: 054fea08
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000018
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00020300
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel:

Apr 03 12:22:52 SrvProxDellTDC kernel: 398d4000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000007
Apr 03 12:22:52 SrvProxDellTDC kernel: 00001000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 0febb000
Apr 03 12:22:52 SrvProxDellTDC kernel: 0000000c
Apr 03 12:22:52 SrvProxDellTDC kernel: 00001000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel:

Apr 03 12:22:52 SrvProxDellTDC kernel: f4f73000
Apr 03 12:22:52 SrvProxDellTDC kernel: 0000000b
Apr 03 12:22:52 SrvProxDellTDC kernel: 00001000
Apr 03 12:22:52 SrvProxDellTDC kernel: 40000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel:

Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel:

Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel: 00000000
Apr 03 12:22:52 SrvProxDellTDC kernel:
Apr 03 12:22:52 SrvProxDellTDC kernel: megaraid_sas 0000:17:00.0: [ 0]waiting for 5 commands to complete for scsi0
Apr 03 12:22:52 SrvProxDellTDC kernel: megaraid_sas 0000:17:00.0: [ 5]waiting for 5 commands to complete for scsi0
Apr 03 12:22:52 SrvProxDellTDC kernel: megaraid_sas 0000:17:00.0: [10]waiting for 5 commands to complete for scsi0
Apr 03 12:22:52 SrvProxDellTDC kernel: megaraid_sas 0000:17:00.0: [15]waiting for 5 commands to complete for scsi0
Apr 03 12:22:52 SrvProxDellTDC kernel: megaraid_sas 0000:17:00.0: [20]waiting for 5 commands to complete for scsi0
Apr 03 12:22:52 SrvProxDellTDC kernel: megaraid_sas 0000:17:00.0: [25]waiting for 5 commands to complete for scsi0
Apr 03 12:22:52 SrvProxDellTDC kernel: megaraid_sas 0000:17:00.0: [30]waiting for 5 commands to complete for scsi0
Apr 03 12:22:52 SrvProxDellTDC pve-firewall[1836]: firewall update time (220.054 seconds)
Apr 03 12:22:52 SrvProxDellTDC systemd[1]: Started Proxmox VE replication runner.
Apr 03 12:22:52 SrvProxDellTDC systemd[1]: Starting Proxmox VE replication runner...
Apr 03 12:22:52 SrvProxDellTDC pvestatd[1840]: status update time (220.424 seconds)
Apr 03 12:22:52 SrvProxDellTDC pmxcfs[1779]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/SrvProxDellTDC/HDD1: -1
Apr 03 12:22:52 SrvProxDellTDC pmxcfs[1779]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/SrvProxDellTDC/HDD1
 
Well, seems like the raid controller does not answer. Are there firmware updates available for the server and especially the raid controller?