I am fairly new to proxmox and virtualization in general, and have had some issues getting proxmox up and running with consistency.
Over the past week or two I have been dealing with my proxmox host rebooting and the whole system crashing. I was able to narrow the issue down to poor power delivery with the system not being in a battery backup.
During one of the crashes, the system completely froze (i was unable to even put in CLI code on the host itself) and was forced to hard restart the system.
I now only have TrueNAS crashing and i'm seeing some hardware errors within syslog.
Now i assume this is from my forced restart and i have corrupted some system files. does proxmox have a feature like SFC scannow like windows has for checking system integrity?
If i'm not correct in my assumption, which in all possibility i'm probably wrong, what other things can i do to get this error cleared up and keep TrueNAS from crashing?
I can provide any other details needed. Thanks in advance for the help.
Over the past week or two I have been dealing with my proxmox host rebooting and the whole system crashing. I was able to narrow the issue down to poor power delivery with the system not being in a battery backup.
During one of the crashes, the system completely froze (i was unable to even put in CLI code on the host itself) and was forced to hard restart the system.
I now only have TrueNAS crashing and i'm seeing some hardware errors within syslog.
Oct 14 21:53:47 proxmox kernel: mce: [Hardware Error]: Machine check events logged
Oct 14 21:53:47 proxmox kernel: [Hardware Error]: Uncorrected, software restartable error.
Oct 14 21:53:47 proxmox kernel: [Hardware Error]: CPU:16 (19:21:2) MC0_STATUS[-|UE|MiscV|AddrV|-|-|-|-|Poison|-]: 0xbc00080001010135
Oct 14 21:53:47 proxmox kernel: [Hardware Error]: Error Addr: 0x0000000338e45e80
Oct 14 21:53:47 proxmox kernel: [Hardware Error]: IPID: 0x001000b000000000
Oct 14 21:53:47 proxmox kernel: [Hardware Error]: Load Store Unit Ext. Error Code: 1, An ECC error or L2 poison was detected on a data cache read by a load.
Oct 14 21:53:47 proxmox kernel: [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD
Oct 14 21:53:47 proxmox kernel: mce: Uncorrected hardware memory error in user-access at 338e45e80
Oct 14 21:53:47 proxmox kernel: Memory failure: 0x338e45: recovery action for unsplit thp: Ignored
Oct 14 21:53:47 proxmox kernel: mce: Memory error not recovered
Oct 14 21:53:47 proxmox kernel: sda: sda1 sda2
Oct 14 21:53:47 proxmox kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Oct 14 21:53:47 proxmox kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Oct 14 21:53:47 proxmox kernel: sdc: sdc1 sdc2
Oct 14 21:53:47 proxmox systemd[1]: 100.scope: Deactivated successfully.
Oct 14 21:53:47 proxmox systemd[1]: 100.scope: Consumed 43min 40.429s CPU time
Now i assume this is from my forced restart and i have corrupted some system files. does proxmox have a feature like SFC scannow like windows has for checking system integrity?
If i'm not correct in my assumption, which in all possibility i'm probably wrong, what other things can i do to get this error cleared up and keep TrueNAS from crashing?
I can provide any other details needed. Thanks in advance for the help.