Search results

  1. P

    AMD EPYC based systems rebooting

    Tried that. No success.
  2. P

    AMD EPYC based systems rebooting

    I changed the PCI-E SFP+ Card to a Supermicro AOC-STGN-I2S No change. But now I can see the error in dmesg: BERT: Error records from previous boot: [Hardware Error]: event severity: info [Hardware Error]: Error 0, type: fatal [Hardware Error]: fru_text: DIMM# Sourced [Hardware Error]...
  3. P

    AMD EPYC based systems rebooting

    No. Updates: Some changes now: This time it did not reboot. Now it freezed! Shorty before the reboot I can find these lines in the logs: Oct 19 08:46:18 xyz kernel: [671031.235982] clocksource: timekeeping watchdog on CPU4: hpet retried 2 times before success Oct 19 08:46:18 xyz kernel...
  4. P

    AMD EPYC based systems rebooting

    pve-manager/7.0-11/63d82f4e (running kernel: 5.11.22-5-pve) I did check RAM 4 weeks ago running full memtest. No errors. I also did a 48h cpu stress test with no issues. I have two identical nodes.
  5. P

    AMD EPYC based systems rebooting

    I've been having the same issue with my nodes SuperMicro M11SDV-8C-LN4F AMD EPYC 3251 I've did what they described in this wiki entry (https://www.thomas-krenn.com/de/wiki/Random_Reboots_AMD_EPYC_Server) and disabled c-states. I also updated the firmware of my Intel X710-DA2 (2 x SFP) from...