Hello,
Linux virtu01 4.4.59-1-pve #1 SMP PVE 4.4.59-87
have here an DELL R530 Server with PERC H730 Mini (Integriert) 1024MB Cache. Since some weeks one VM of the server (a Centos7 Mailserver) crashes sporadically. Der are no helping logs in the VM, never. The log is ending... and then you can see the VM is booting. (attached vm-log.png)
But on the Proxmoxserver there seems to be some interessting logs. There is nothing with oomkiller or similar, but an smartmessage:
Looks like a Problem with the Raidcontroller? Had this problem ever someone?
Thanks for Help.
Linux virtu01 4.4.59-1-pve #1 SMP PVE 4.4.59-87
have here an DELL R530 Server with PERC H730 Mini (Integriert) 1024MB Cache. Since some weeks one VM of the server (a Centos7 Mailserver) crashes sporadically. Der are no helping logs in the VM, never. The log is ending... and then you can see the VM is booting. (attached vm-log.png)
But on the Proxmoxserver there seems to be some interessting logs. There is nothing with oomkiller or similar, but an smartmessage:
Code:
May 16 08:14:29 srv-virtu01 smartd[2314]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 80 to 81
May 16 08:14:29 srv-virtu01 smartd[2314]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 54 to 55
May 16 08:14:29 srv-virtu01 smartd[2314]: Device: /dev/bus/0 [megaraid_disk_01] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 81 to 82
May 16 08:14:29 srv-virtu01 smartd[2314]: Device: /dev/bus/0 [megaraid_disk_01] [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 57 to 55
May 16 08:14:29 srv-virtu01 smartd[2314]: Device: /dev/bus/0 [megaraid_disk_02] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 76 to 78
May 16 08:14:29 srv-virtu01 smartd[2314]: Device: /dev/bus/0 [megaraid_disk_02] [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 52 to 54
Thanks for Help.