Proxmox host frequently rebooting

f.somenzi

Member
Mar 15, 2021
20
1
8
41
Hi everyone,
last month we installed Proxmox on a server at work, and we are running a guest Centos machine.
We notice that it frequently reboots. Each of these logs indicates a reboot, in fact you can see the time count restarts from zero

Sep 19 00:48:28 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 20 00:40:35 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 20 05:17:11 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 20 06:28:09 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 20 12:23:11 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 20 16:37:17 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 20 22:37:47 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 21 00:08:15 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 21 06:52:21 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 21 16:34:54 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 22 20:51:59 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 23 01:51:13 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 23 14:41:07 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 23 19:59:22 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 23 22:22:09 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 24 01:09:18 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 24 15:40:01 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 25 17:24:40 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 25 23:21:32 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 26 16:35:54 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()
Sep 27 13:05:17 pve kernel: [ 0.000000] Linux version 5.4.106-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100) ()

There are no physical reboots of the server, and we didn't notice any power surges.

What I noticed is that "/var/log/messages" reports some errors
Sep 27 16:07:40 pve kernel: [ 46.523873] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:01:00.0
Sep 27 16:08:35 pve kernel: [ 102.181274] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Sep 27 16:08:48 pve kernel: [ 115.154151] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Sep 27 16:10:20 pve kernel: [ 206.449160] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Sep 27 16:10:20 pve kernel: [ 206.449344] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:01:00.0
Sep 27 16:12:48 pve kernel: [ 354.496021] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:01:00.0
Sep 27 16:14:03 pve kernel: [ 430.163636] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Sep 27 16:41:49 pve kernel: [ 2095.547897] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:01:00.0

and to correct it, googling somewhere, they advice to modify grub bootloader by adding pcie_aspm=off in the configuration. I modified it, updated grub (proxmox-boot-tool refresh - right?) but the error persists. Probably, this error is not related to the sudden reboots.

Can you help me?
 
Please provide a complete syslog (/var/log/syslog) containing at least 30 minutes before the reboot happened, until about 30 minutes after the reboot is done.
 
Please provide a complete syslog (/var/log/syslog) containing at least 30 minutes before the reboot happened, until about 30 minutes after the reboot is done.
Hello, you can find attached the syslog. Last reboot happened today at 13.10. The syslog I provided goes from 12.39 to 13.41.
Thank you in advance
 

Attachments

  • syslog.txt
    144.3 KB · Views: 2
Code:
Sep 29 13:08:26 pve kernel: [49414.139512] tg3 0000:01:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Sep 29 13:08:26 pve kernel: [49414.139514] tg3 0000:01:00.0: AER:   device [14e4:1657] error status/mask=00001000/00002000
Sep 29 13:08:26 pve kernel: [49414.139516] tg3 0000:01:00.0: AER:    [12] Timeout
This could be related to the error.
Do you use BIOS or UEFI? If you use UEFI, do you use ZFS for root (/)? If so, then the tool is the right way to update it.
Otherwise you have to do it via the update-grub command, see https://pve.proxmox.com/pve-docs-6/pve-admin-guide.html#sysboot (3.11.4 and 3.11.6) for more information.
 
Do you use BIOS or UEFI?
I am not sure about it, I think UEFI, but I must check. Is this a way to find this information?

I can tell you that this is the current grub configuration: (/etc/default/grub)
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs"

When I tried to apply the changes with update-grub, a message alerted me to run the proxmox-boot-tool (and I ran it)

But at the moment, the issue persists
 
Last edited:
See section 3.11.3 of the documentation.
What's the output if you run the efibootmgr -v command?
 
What's the output if you run the efibootmgr -v command?
root@pve:~# efibootmgr -v
BootCurrent: 0015
Timeout: 1 seconds
BootOrder: 0015,0014,0004,0005,0006,0007,000D,000E,000F,0010,0011,0003,0009,001B,001C,0002,0001,0000,000C,000B,000A,0008,0012,0013
Boot0000* Windows Boot Manager VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)WINDOWS.........x...B.C.D.O.B.J.E.C.T.=.{.9.d.e.a.8.6.2.c.-.5.c.d.d.-.4.e.7.0.-.a.c.c.1.-.f.3.2.b.3.4.4.d.4.7.9.5.}...1...............
Boot0001* proxmox VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot0002* Linux Boot Manager VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot0003* UEFI: IP4 HP Ethernet 1Gb 4-port 331T Adapter - NIC PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(3ca82aeba98c,0)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0004* UEFI: IP4 HP Ethernet 1Gb 4-port 331T Adapter - NIC PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x1)/MAC(3ca82aeba98d,0)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0005* UEFI: IP4 HP Ethernet 1Gb 4-port 331T Adapter - NIC PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x2)/MAC(3ca82aeba98e,0)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0006* UEFI: IP4 HP Ethernet 1Gb 4-port 331T Adapter - NIC PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x3)/MAC(3ca82aeba98f,0)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0007* UEFI: IP4 Intel(R) Ethernet Connection (H) I219-LM PciRoot(0x0)/Pci(0x1f,0x6)/MAC(94188209514c,0)/IPv4(0.0.0.00.0.0.0,0,0)..BO
Boot0008* Linux Boot Manager VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot0009* UEFI: Built-in EFI Shell VenMedia(5023b95c-db26-429b-a648-bd47664c8012)..BO
Boot000A* Linux Boot Manager VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot000B* Linux Boot Manager VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot000C* CentOS VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot000D* UEFI: IP6 HP Ethernet 1Gb 4-port 331T Adapter - NIC PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x0)/MAC(3ca82aeba98c,0)/IPv6([::]:<->[::]:,0,0)..BO
Boot000E* UEFI: IP6 HP Ethernet 1Gb 4-port 331T Adapter - NIC PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x1)/MAC(3ca82aeba98d,0)/IPv6([::]:<->[::]:,0,0)..BO
Boot000F* UEFI: IP6 HP Ethernet 1Gb 4-port 331T Adapter - NIC PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x2)/MAC(3ca82aeba98e,0)/IPv6([::]:<->[::]:,0,0)..BO
Boot0010* UEFI: IP6 HP Ethernet 1Gb 4-port 331T Adapter - NIC PciRoot(0x0)/Pci(0x1d,0x0)/Pci(0x0,0x3)/MAC(3ca82aeba98f,0)/IPv6([::]:<->[::]:,0,0)..BO
Boot0011* UEFI: IP6 Intel(R) Ethernet Connection (H) I219-LM PciRoot(0x0)/Pci(0x1f,0x6)/MAC(94188209514c,0)/IPv6([::]:<->[::]:,0,0)..BO
Boot0012* Linux Boot Manager VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot0013* Linux Boot Manager VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot0014* Linux Boot Manager HD(2,GPT,4ac8c549-d04e-4b8d-a771-091535a292b6,0x800,0x100000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
Boot0015* Linux Boot Manager HD(2,GPT,d41d929e-30cf-4a59-8f35-b79e1bb51799,0x800,0x100000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
Boot001B* UEFI OS HD(2,GPT,4ac8c549-d04e-4b8d-a771-091535a292b6,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot001C* UEFI OS HD(2,GPT,d41d929e-30cf-4a59-8f35-b79e1bb51799,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
 
Code:
Boot0014* Linux Boot Manager   
HD(2,GPT,4ac8c549-d04e-4b8d-a771-091535a292b6,0x800,0x100000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
Boot0015* Linux Boot Manager     HD(2,GPT,d41d929e-30cf-4a59-8f35-b79e1bb51799,0x800,0x100000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
This means you're using systemd-boot.
Please follow the systemd-boot part of 3.11.6 (https://pve.proxmox.com/pve-docs-6/pve-admin-guide.html#sysboot).
 
Code:
Boot0014* Linux Boot Manager  
HD(2,GPT,4ac8c549-d04e-4b8d-a771-091535a292b6,0x800,0x100000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
Boot0015* Linux Boot Manager     HD(2,GPT,d41d929e-30cf-4a59-8f35-b79e1bb51799,0x800,0x100000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
This means you're using systemd-boot.
Please follow the systemd-boot part of 3.11.6 (https://pve.proxmox.com/pve-docs-6/pve-admin-guide.html#sysboot).
Thank you for the informations.
Anyway, I still have some doubts, as I am not that smart on configuring Linux and kernels.
Does it mean that I should add the option "pcie_aspm=off" directly into the /etc/kernel/cmdline file, and then run proxmox-boot-tool refresh?
No need to reboot the system?
 
Yes, those are the steps.
A reboot is necessary afterwards.
 
Yes, those are the steps.
A reboot is necessary afterwards.
Hi, nothing changed, the system still reboots frequently. I made the changes this Wednesday, 6th October at 11.41 AM and rebooted manually. After that, Proxmox rebooted 4 times, once yesterday, and 3 times in 4 hours this night (04.41, 07.18, 08.21).
pcie_aspm=off is working, since I don't receive the previous error messages, but something else happens.
I attach the new syslog file, hoping you can help me find and solve the problem.
 

Attachments

  • syslog.txt
    481.6 KB · Views: 2
There's nothing in the syslog. Have you checked the iLO for any information?
Your BIOS is from 2017, consider updating it to the latest version.
 
There's nothing in the syslog. Have you checked the iLO for any information?
Your BIOS is from 2017, consider updating it to the latest version.
Looking for the updates available for our server (HPE Proliant ML10 Gen9)

I think this is what I need. As mentioned before, I'm a newbie and need to be guided step-by-step:

https://support.hpe.com/hpesc/public/swd/detail?swItemId=MTX_4ab55f4f209f43308e0e508d39

Do you think I need other updates available in the HPE website?
https://support.hpe.com/connect/s/p...sAndSoftware&driversAndSoftwareFilter=8000029
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!