Constant restart

brnogu

Member
Dec 13, 2020
14
0
6
27
Hi everyone!

A few months ago I have installed Proxmox for the first time and I have to say it's my favourite Hypervisor :)

Since I have installed it I'm digging in this forums for an answer on why it is always randomly restarting. Sometimes it restarts after 2 days, other times it lasts 5 days (it was the max). First I've seen it could be an incompatibility with the BIOS, so I have updated proxmox to the very last version and same for the BIOS. It didn't solve the issue.

Then I did a memtest and no errors detected. :(

Finally I've seen some issues about the ZFS ARC size that couldn't be larger than the available RAM with all the machines combined. I have decreased it and still no change....
The Power Supply have no issue, I had this PC is use for another purpose and it was fine. After checking, I can't find any logs that justify this, just normal operation and suddenly the server is starting

I'm getting a bit desperate and not sure what to do next :( The truth is that it keeps happening. I'm using an MSI motherboard

Can I please ask for your help? :)

Thanks in advance

1608147882963.png
 
Check /var/log/kern.log or the rotated version /var/log/kern.log.1 for log entries made at the time of the reboot.
 
Hi @thenickdude

Sorry for the delay. I was waiting for it to happen again but it is still up (Probably will happend in 1 or 2 days)
But last time I check the kern.log it started after the machine reboots. I tried to look into the rotated version but everything was older then the day when it restarted.

Thanks!
 
Hi @thenickdude

Yesterday it happened. It was on the 24th Dec around 20h30

The file below is the kern.log. I can see that there's no errors and suddenly it boot. I have no idea what can be :(

Dec 20 01:51:23 universe kernel: [16525.656100] hrtimer: interrupt took 28052 ns
Dec 23 03:07:24 universe kernel: [280290.676963] usb 1-9: reset low-speed USB device number 2 using xhci_hcd
Dec 23 03:07:24 universe kernel: [280291.063035] kvm: vcpu 0: requested 122070 ns lapic timer period limited to 200000 ns
Dec 23 03:07:32 universe kernel: [280299.373160] usb 1-9: reset low-speed USB device number 2 using xhci_hcd
Dec 24 20:30:04 universe kernel: [ 0.000000] Linux version 5.4.73-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) $
Dec 24 20:30:04 universe kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.73-1-pve root=/dev/mapper/pve-roo$
Dec 24 20:30:04 universe kernel: [ 0.000000] KERNEL supported cpus:
Dec 24 20:30:04 universe kernel: [ 0.000000] Intel GenuineIntel
Dec 24 20:30:04 universe kernel: [ 0.000000] AMD AuthenticAMD
Dec 24 20:30:04 universe kernel: [ 0.000000] Hygon HygonGenuine
Dec 24 20:30:04 universe kernel: [ 0.000000] Centaur CentaurHauls
Dec 24 20:30:04 universe kernel: [ 0.000000] zhaoxin Shanghai
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
Dec 24 20:30:04 universe kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'com$
Dec 24 20:30:04 universe kernel: [ 0.000000] BIOS-provided physical RAM map:
...
 
That's unfortunate that there's no log messages as clues. At least you can rule out running out of RAM, since this would log errors from the oom killer (out of memory killer).
 
Hi everyone!

I have installed a new disk and mounted it as LVM and destroyed the ZFS pool. Now I don't have any ZFS pool but it keeps restarting randomly.

I'm really out of ideas... Anyone knows what can this be?

ThanksQ
 
I have also noticed something
On the /var/log/syslog file, when the system hangs I can see:
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^

1611261835918.png
 
Unfortunately not :(
The solution I have found to save the VMs from corruption from these sudden restarts was with cron programed the server to restart every day at 5 AM

Thanks!
 
It would be really great to have a permanent solution

I have already tried to disable some Power Saving features as suggested in BIOS. Tried to update the server to the most recent kernel
Also removed all the ZFS pools. Issue persisted.

I have no idea what it can be at this point :(
 
Hi @Stefan_R

I'm sorry to bother you, I've seen you on anther I have spent lots of lots of hours looking into a solution for this. Also tried to change some BIOS settings but the issue persists :(

I was wondering if you have any suggestions.

Thanks!
 
Usually not that appreciated to just ping people on unrelated topics, especially on an older thread where you've already received answers. We try to monitor the forum to the best of our ability anyway.

Now since I'm here already: The "^@^@^@^@" lines usually are a strong indicator for a hardware error. Swap out parts until it works, probably best to start with RAM.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!