PVE 8.0.4 Random reboot

d_b_r

New Member
Nov 5, 2023
6
0
1
Hello,

I have a PVE Host with this configuration:
16 x Intel(R) Xeon(R) E-2388G CPU @ 3.20GHz (1 Socket)
64G Ram
OVH baremetal server

PVE 8.0.4 with Linux 6.2.16-19-pve

The server restarts randomly but frequently, I can have an uptime of 20+ days as it can restart after 5h of uptime.

There is the syslog before reboot:
Code:
2023-11-04T00:00:03.432670+01:00 proxmox pvescheduler[255929]: INFO: starting new backup job: vzdump 104 --notes-template '{{guestname}}' --quiet 1 --mailto mymail@mail.com --mailnotification always --mode snapshot --node proxmox --compress zstd --storage pbs-it1
2023-11-04T00:00:03.433156+01:00 proxmox pvescheduler[255929]: INFO: Starting Backup of VM 104 (lxc)
2023-11-04T00:00:06.788556+01:00 proxmox systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
2023-11-04T00:00:06.789921+01:00 proxmox systemd[1]: Starting logrotate.service - Rotate log files...
2023-11-04T00:00:06.793093+01:00 proxmox systemd[1]: dpkg-db-backup.service: Deactivated successfully.
2023-11-04T00:00:06.793163+01:00 proxmox systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
2023-11-04T00:00:06.809291+01:00 proxmox systemd[1]: Reloading pveproxy.service - PVE API Proxy Server...
2023-11-04T00:00:07.277712+01:00 proxmox pveproxy[255985]: send HUP to 2782
2023-11-04T00:00:07.277930+01:00 proxmox pveproxy[2782]: received signal HUP
2023-11-04T00:00:07.277982+01:00 proxmox pveproxy[2782]: server closing
2023-11-04T00:00:07.278007+01:00 proxmox pveproxy[2782]: server shutdown (restart)
2023-11-04T00:00:07.286229+01:00 proxmox systemd[1]: Reloaded pveproxy.service - PVE API Proxy Server.
2023-11-04T00:00:07.344526+01:00 proxmox systemd[1]: Reloading spiceproxy.service - PVE SPICE Proxy Server...
2023-11-04T00:00:07.564291+01:00 proxmox spiceproxy[256412]: send HUP to 2789
2023-11-04T00:00:07.564408+01:00 proxmox spiceproxy[2789]: received signal HUP
2023-11-04T00:00:07.564436+01:00 proxmox spiceproxy[2789]: server closing
2023-11-04T00:00:07.564473+01:00 proxmox spiceproxy[2789]: server shutdown (restart)
2023-11-04T00:00:07.567855+01:00 proxmox systemd[1]: Reloaded spiceproxy.service - PVE SPICE Proxy Server.
2023-11-04T00:00:07.584781+01:00 proxmox pvefw-logger[387346]: received terminate request (signal)
2023-11-04T00:00:07.584846+01:00 proxmox pvefw-logger[387346]: stopping pvefw logger
2023-11-04T00:00:07.584871+01:00 proxmox systemd[1]: Stopping pvefw-logger.service - Proxmox VE firewall logger...
2023-11-04T00:00:07.766110+01:00 proxmox spiceproxy[2789]: restarting server
2023-11-04T00:00:07.766217+01:00 proxmox spiceproxy[2789]: starting 1 worker(s)
2023-11-04T00:00:07.766983+01:00 proxmox spiceproxy[2789]: worker 256760 started
2023-11-04T00:00:07.810123+01:00 proxmox pveproxy[2782]: restarting server
2023-11-04T00:00:07.810224+01:00 proxmox pveproxy[2782]: starting 3 worker(s)
2023-11-04T00:00:07.811934+01:00 proxmox pveproxy[2782]: worker 256761 started
2023-11-04T00:00:07.813331+01:00 proxmox pveproxy[2782]: worker 256762 started
2023-11-04T00:00:07.814865+01:00 proxmox pveproxy[2782]: worker 256763 started
2023-11-04T00:00:07.835414+01:00 proxmox systemd[1]: pvefw-logger.service: Deactivated successfully.
2023-11-04T00:00:07.835490+01:00 proxmox systemd[1]: Stopped pvefw-logger.service - Proxmox VE firewall logger.
2023-11-04T00:00:07.835517+01:00 proxmox systemd[1]: pvefw-logger.service: Consumed 5.222s CPU time.
2023-11-04T00:00:07.888438+01:00 proxmox systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall logger...
2023-11-04T00:00:07.890117+01:00 proxmox pvefw-logger[256766]: starting pvefw logger
2023-11-04T00:00:07.890197+01:00 proxmox systemd[1]: Started pvefw-logger.service - Proxmox VE firewall logger.
2023-11-04T00:00:07.891630+01:00 proxmox systemd[1]: logrotate.service: Deactivated successfully.
2023-11-04T00:00:07.891780+01:00 proxmox systemd[1]: Finished logrotate.service - Rotate log files.
2023-11-04T00:00:12.584201+01:00 proxmox kernel: [830406.028129] CIFS: __readahead_batch() returned 11/16
2023-11-04T00:00:12.767550+01:00 proxmox spiceproxy[387340]: worker exit
2023-11-04T00:00:12.774603+01:00 proxmox spiceproxy[2789]: worker 387340 finished
2023-11-04T00:00:12.815362+01:00 proxmox pveproxy[387342]: worker exit
2023-11-04T00:00:12.815561+01:00 proxmox pveproxy[387341]: worker exit
2023-11-04T00:00:12.815675+01:00 proxmox pveproxy[387343]: worker exit
2023-11-04T00:00:12.832305+01:00 proxmox pveproxy[2782]: worker 387341 finished
2023-11-04T00:00:12.832391+01:00 proxmox pveproxy[2782]: worker 387343 finished
2023-11-04T00:00:12.835390+01:00 proxmox pveproxy[2782]: worker 387342 finished
2023-11-04T00:00:24.764526+01:00 proxmox pvescheduler[255929]: INFO: Finished Backup of VM 104 (00:00:21)
2023-11-04T00:00:24.810543+01:00 proxmox pvescheduler[255929]: INFO: Backup job finished successfully

And the second crash, a few hours after:
(I have manually rebooted the server on Nov 05 12:34:11)

Code:
Nov 04 15:00:00 proxmox pvescheduler[570061]: INFO: Starting Backup of VM 104 (lxc)
Nov 04 15:00:20 proxmox pvescheduler[570061]: INFO: Finished Backup of VM 104 (00:00:20)
Nov 04 15:00:20 proxmox pvescheduler[570061]: INFO: Backup job finished successfully
Nov 04 15:00:20 proxmox postfix/pickup[547338]: 5C027407F7: uid=0 from=<root>
Nov 04 15:00:20 proxmox postfix/cleanup[571062]: 5C027407F7: message-id=<20231104140020.5C027407F7@proxmox.mail.com>
Nov 04 15:00:20 proxmox postfix/qmgr[2823]: 5C027407F7: from=<root@proxmox.mail.com>, size=6826, nrcpt=1 (queue active)
Nov 04 15:00:20 proxmox postfix/smtp[571064]: connect to smtp.mail.com[2a00:1450:400c:c03::6c]:587: Network is unreachable
Nov 04 15:00:32 proxmox postfix/smtp[571064]: 5C027407F7: to=<mymail@mail.com>, relay=smtp.mail.com[66.102.1.108]:587, delay=12, delays=0.01/0.01/0.66/11, dsn=2.0.0, status=sent (250 2.0.0 OK  1699106432 k17-20020a5d6e91000000b0032d9382e6e0sm4455235wrz.45 - gsmtp)
Nov 04 15:00:32 proxmox postfix/qmgr[2823]: 5C027407F7: removed
Nov 04 15:17:01 proxmox CRON[615211]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 04 15:17:01 proxmox CRON[615212]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 04 15:17:01 proxmox CRON[615211]: pam_unix(cron:session): session closed for user root
-- Boot e9debad94ead4394b6b24c98333796b5 --
Nov 05 12:34:11 proxmox kernel: Linux version 6.2.16-19-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-19 (2023-10-24T12:07Z) ()
Nov 05 12:34:11 proxmox kernel: Command line: initrd=\EFI\proxmox\6.2.16-19-pve\initrd.img-6.2.16-19-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
Nov 05 12:34:11 proxmox kernel: KERNEL supported cpus:
Nov 05 12:34:11 proxmox kernel:   Intel GenuineIntel
Nov 05 12:34:11 proxmox kernel:   AMD AuthenticAMD
Nov 05 12:34:11 proxmox kernel:   Hygon HygonGenuine
Nov 05 12:34:11 proxmox kernel:   Centaur CentaurHauls
Nov 05 12:34:11 proxmox kernel:   zhaoxin   Shanghai
Nov 05 12:34:11 proxmox kernel: BIOS-provided physical RAM map:
Nov 05 12:34:11 proxmox kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000005dfff] usable
Nov 05 12:34:11 proxmox kernel: BIOS-e820: [mem 0x000000000005e000-0x000000000005efff] reserved
Nov 05 12:34:11 proxmox kernel: BIOS-e820: [mem 0x000000000005f000-0x000000000009ffff] usable
Nov 05 12:34:11 proxmox kernel: BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
Nov 05 12:34:11 proxmox kernel: BIOS-e820: [mem 0x0000000000100000-0x000000005c240fff] usable
Nov 05 12:34:11 proxmox kernel: BIOS-e820: [mem 0x000000005c241000-0x000000005c241fff] reserved


I can provide more logs,

Thanks for helping
 
Revert to old Linux 5.15.108-1-pve kernel, pve8.0.4

proxmox-ve: 8.0.2 (running kernel: 5.15.108-1-pve) pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)pve-kernel-6.2: 8.0.5proxmox-kernel-helper: 8.0.3 pve-kernel-5.15: 7.4-4 proxmox-kernel-6.2.16-19-pve: 6.2.16-19proxmox-kernel-6.2: 6.2.16-19proxmox-kernel-6.2.16-15-pve: 6.2.16-15proxmox-kernel-6.2.16-10-pve: 6.2.16-10proxmox-kernel-6.2.16-6-pve: 6.2.16-7pve-kernel-6.2.16-3-pve: 6.2.16-3pve-kernel-5.15.108-1-pve: 5.15.108-1pve-kernel-5.15.107-2-pve: 5.15.107-2pve-kernel-5.15.102-1-pve: 5.15.102-1ceph-fuse: 16.2.11+ds-2corosync: 3.1.7-pve3criu: 3.17.1-2glusterfs-client: 10.3-5ifupdown2: 3.2.0-1+pmx5ksm-control-daemon: 1.4-1libjs-extjs: 7.0.0-4libknet1: 1.28-pve1libproxmox-acme-perl: 1.4.6libproxmox-backup-qemu0: 1.4.0libproxmox-rs-perl: 0.3.1libpve-access-control: 8.0.5libpve-apiclient-perl: 3.3.1libpve-common-perl: 8.0.9libpve-guest-common-perl: 5.0.5libpve-http-server-perl: 5.0.4libpve-rs-perl: 0.8.5libpve-storage-perl: 8.0.2libspice-server1: 0.15.1-1lvm2: 2.03.16-2lxc-pve: 5.0.2-4lxcfs: 5.0.3-pve3novnc-pve: 1.4.0-2proxmox-backup-client: 3.0.4-1proxmox-backup-file-restore: 3.0.4-1proxmox-kernel-helper: 8.0.3proxmox-mail-forward: 0.2.0proxmox-mini-journalreader: 1.4.0proxmox-widget-toolkit: 4.0.9pve-cluster: 8.0.4pve-container: 5.0.5pve-docs: 8.0.5pve-edk2-firmware: 3.20230228-4pve-firewall: 5.0.3pve-firmware: 3.8-3pve-ha-manager: 4.0.2pve-i18n: 3.0.7pve-qemu-kvm: 8.0.2-7pve-xtermjs: 4.16.0-3qemu-server: 8.0.7smartmontools: 7.3-pve1spiceterm: 3.3.0swtpm: 0.8.0+pve1vncterm: 1.8.0zfsutils-linux: 2.1.13-pve1

1hour without restart
 
I had this same with similarly weird logs happening on a completely different hardware (Intel Tiger Lake), but with a cluster. I thought it was related to syncing issues, then without much explanation the issue resolved itself. The only thing I could find in the logs before the reboots was OOM kills for a VM, but that was quite more time before the actual reboot. Anything more interesting longer back before the reboot occurs?
 
I don't have anything more apart from these logs, I can provide the kernel booting logs. For information I use VMs and a CT LXC with Docker on it. In my case, i don't have a cluster. What version are you running?
 
I don't have anything more apart from these logs, I can provide the kernel booting logs. For information I use VMs and a CT LXC with Docker on it. In my case, i don't have a cluster. What version are you running?
You mean you lost all the logs beyond what you included? I meant the post mortem boot log out of which you only had last half an hour. I know it sounds unlikely, but is there anything suspicious in the log ever since the machine boots up? The shenanigans can happen way before it contributes to the reboot.
 
You mean you lost all the logs beyond what you included? I meant the post mortem boot log out of which you only had last half an hour. I know it sounds unlikely, but is there anything suspicious in the log ever since the machine boots up? The shenanigans can happen way before it contributes to the reboot.

There is the detailed booting logs in attachment.
 

Attachments

There is the detailed booting logs in attachment.
Sorry for late reply, I was trying to get to my part of the syslogs from when it was happening. And yes, I omitted in my answer this was PVE 8.0.4 as well.

The log you attached is not full from the start, I was wondering about anything that could indicate any instability from e.g. hardware. I do not know if you are clipping the logs, but the way you have suddenly --boot-- it would mean it's abruptly rebooting, not triggered by any kernel panic or such, so hardware?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!