Random Restarts

ryanauhl

Member
Jun 18, 2018
1
0
21
35
I've had some random restarts over the past couple weeks that I can't seem to narrow down. The only thing consistent with them are the errors the kernel writes to the syslog when it reboots (included below).

4/5 guests are running a linux flavor with virtio drivers and the CPU host flag. The other guest is a Windows 10 vm with virtio drivers, the Opteron_G3 flag, and PCI-passthrough of a video card. I'm running an Epyc processor, but I can't get the host flag or a new CPU flag on the windows guest to boot.

Any ideas out there? Errors are vague and haven't been able to turn up anything by searching online.

Code:
Aug 01 08:24:23 kvm kernel: BERT: Error records from previous boot:
Aug 01 08:24:23 kvm kernel: [Hardware Error]: event severity: info
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 0, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 00200800 00000000  .......... .....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff9  ...]............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 1, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000003 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 03200800 00000000  .......... .....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: 855c2dcb 0101ffff 00000000 00000000  .-\.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 2, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 0000000b 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 0b200800 00000000  .......... .....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: c09c7545 0101ffff 00000000 00000000  Eu..............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 3, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000010 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 10200800 00000000  .......... .....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 4, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000011 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 11200800 00000000  .......... .....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: c04beceb 0101ffff 00000000 00000000  ..K.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 5, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 0000001b 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 1b200800 00000000  .......... .....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: c04beceb 0101ffff 00000000 00000000  ..K.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 6, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000020 00000000  ........ .......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 20200800 00000000  ..........  ....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 7, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000023 00000000  ........#.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 23200800 00000000  .......... #....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: c04beceb 0101ffff 00000000 00000000  ..K.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 8, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000028 00000000  ........(.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 28200800 00000000  .......... (....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: c04beceb 0101ffff 00000000 00000000  ..K.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 9, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000030 00000000  ........0.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 30200800 00000000  .......... 0....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: 853f6468 0001ffff 00000000 00000000  hd?.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 10, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 00000030 00000000  ........0.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 30200800 00000000  .......... 0....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: 48ab7f57 4f6cdc34 b5b0d3a7 1443a7b0  W..H4.lO......C.
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 00980027 00000000  ........'.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000458 00000000 00000000  ..P.X...........
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000016 00000000 0000080b faa00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000035 00000007  ........5.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 5d000000 00000000 00000000 d0140ff7  ...]............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000002 0001002e  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  Error 11, type: fatal
Aug 01 08:24:23 kvm kernel: [Hardware Error]:  fru_text: ProcessorError
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section type: unknown, dc3ea0b0-a144-4797-b95b-53fa242b6e1d
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   section length: 0xd0
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000000: 00000007 00000000 0000003b 00000000  ........;.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000010: 00800f12 00000000 3b200800 00000000  .......... ;....
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000020: 76d8320b 00000000 178bfbff 00000000  .2.v............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000030: a55701f5 43dee3ef 9b2472ac 2cad3f57  ..W....C.r$.W?.,
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000040: 00000001 00000000 0602001f 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000050: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000060: 00000000 00000000 00000000 00000000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000070: 00500002 00000414 00000000 00000000  ..P.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000080: 00000005 00000000 00000108 bea00000  ................
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   00000090: c04beceb 0101ffff 00000000 00000000  ..K.............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000a0: 00000000 00000000 00000031 00000003  ........1.......
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000b0: 4d000000 00000000 00000000 d0140ff6  ...M............
Aug 01 08:24:23 kvm kernel: [Hardware Error]:   000000c0: 00000000 00000000 00000000 000500b0  ................
 
We are now running into this same error and had a full on crash last night.

We're using an AMD Epyc CPU as well, a number of KVM guests and a 2-node Corosync cluster.

Did you find anything?
 
We've run in to this problem in the last couple of months. 3 full crashes since January.

We're using AMD EPYC CPUs (76** series) in a 4-node cluster. Every time it happens, it's random. Never at a certain time or anything.

By the errors (same as above) I would assume there is something wrong in the hardware that's causing it to fritz out. Has anyone found the cause of this yet?
 
Has anyone found any info that is non-Proxmox? I would think this is a Linux kernel issue, could it be Proxmox-specific? Otherwise I would expect a lot of people with this problem online.
 
Other research I've found seems to be largely focused on SuperMicro machines. I've seen people having issues with both AMD and Intel processors - but the number 1 search item was the Proxmox forum on my searches with the error.
 
Other research I've found seems to be largely focused on SuperMicro machines. I've seen people having issues with both AMD and Intel processors - but the number 1 search item was the Proxmox forum on my searches with the error.
We are also on SuperMicro, specifically a H11SSL-i motherboard. We will test updated BIOS, I think we're one revision behind.
 
We have seen such an issue on a system with Debian 10 and KVM (not running Proxmox).
The mainboard is a Supermicro H11DSi-NT (revision 1) with BIOS version 1.0c.

We think that maybe the new BIOS version 1.3 could fix it, as there is an updated AGESA and in it.

As the reboots happen randomly, we cannot say for sure right now whether or not the BIOS update fixes the issue.

You can find updated information on this on the following German wiki article:
https://www.thomas-krenn.com/de/wiki/Random_Reboots_AMD_EPYC_Server
 
Anyone figure out more here? I already had IOMMU and SR-IOV enabled and am on bios 2.1, but I still get identical errors as these. Epyc, H11 series SuperMicro board, KVM guests.

It's really starting to frustrate me.
 
Anyone figure out more here? I already had IOMMU and SR-IOV enabled and am on bios 2.1, but I still get identical errors as these. Epyc, H11 series SuperMicro board, KVM guests.

It's really starting to frustrate me.
Our problems disappeared, either from BIOS updates or kernel updates, would be my guess. Are you on latest for both?
 
Our problems disappeared, either from BIOS updates or kernel updates, would be my guess. Are you on latest for both?
Yep, unfortunately!

I've got a support case opened with Supermicro, hopefully they've got some ideas.
 
Didn't want to leave anyone hanging that might encounter this via Google:

It was my LSI HBA firmware. I noticed that it tended to happen when IO activity was heavy, so I started looking in that direction. Turns out my LSI card came with ~2012 era firmware. Moved to "PH20.00.07.00-IT" and the reboots went away.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!