Since a while I have intermittent crashes.
Like these:
(although it says it reboots, it does NOT)
It happened a lot:
My current setup:
Intel NUC8i5BEH with 16GB RAM 500GB Samsung 980 PRO SSD
pve-manager/6.4-6/be2fa32c (running kernel: 5.4.114-1-pve)
All bios for NUC & SSD up-to-date
What I experience(d):
Before I had VERY frequent crashes with pve 6.4.5 and the version before (forgot number).
With the previous SSD firmware I had: https://forum.proxmox.com/threads/d..._find_block-failed-error-5.87344/#post-389506 (this has now not happened anymore).
What I tried:
- updating firmwares
- updating pve
- this: https://forum.proxmox.com/threads/r...mox-ve-6-1-auf-ex62-nvme-hetzner.63597/page-3
But it keeps crahing now and then. I must admit that every time I tried anything it got better but still it crashes every 2-4 days (!)...
What can I do about this?
Here is the error written out (might be typos):
Like these:
(although it says it reboots, it does NOT)
It happened a lot:
My current setup:
Intel NUC8i5BEH with 16GB RAM 500GB Samsung 980 PRO SSD
pve-manager/6.4-6/be2fa32c (running kernel: 5.4.114-1-pve)
All bios for NUC & SSD up-to-date
What I experience(d):
Before I had VERY frequent crashes with pve 6.4.5 and the version before (forgot number).
With the previous SSD firmware I had: https://forum.proxmox.com/threads/d..._find_block-failed-error-5.87344/#post-389506 (this has now not happened anymore).
What I tried:
- updating firmwares
- updating pve
- this: https://forum.proxmox.com/threads/r...mox-ve-6-1-auf-ex62-nvme-hetzner.63597/page-3
But it keeps crahing now and then. I must admit that every time I tried anything it got better but still it crashes every 2-4 days (!)...
What can I do about this?
Here is the error written out (might be typos):
Code:
mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 10: be2000c00002010b
mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff870f62ec> {mutex_spin_on_owner+0x6c/0xa0}
mce: [Hardware Error]: TSC 154e1c68e3c2e ADDR dedb80001d0f36 MISC 66119c0
mce: [Hardware Error]: PROCESSOR 0:806ea TIME 1621405756 SOCKET 0 APIC 0 microcode e0
mce: [Hardware Error]: Run the above through 'mcelog --ascii"
mce: [Hardware Error]: Machine check: Processor context corrupt
Kernel panic - not syncing: Fatal machine check
Shutting down cpus with NMI
Kernel offset: 0x6000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Rebooting in 30 seconds..