Unwanted reboot every few weeks

Sep 5, 2019
20
0
41
35
Switzerland, Berne
Hello together

I have several proxmox hosts running and one of the hosts restarts every few weeks without my influence. Maybe someone has an idea how it comes to the reboots. To me the syslog looks normal, but maybe I am missing something here:

Code:
May 03 22:45:40 Host-24 postfix/smtp[1105472]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 03 22:45:40 Host-24 postfix/smtp[1105472]: F1C0C10094D: to=<administrator@domain.com>, relay=none, delay=6899, delays=6839/0.14/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 03 22:53:54 Host-24 pveproxy[1043877]: worker exit
May 03 22:53:54 Host-24 pveproxy[3650]: worker 1043877 finished
May 03 22:53:54 Host-24 pveproxy[3650]: starting 1 worker(s)
May 03 22:53:54 Host-24 pveproxy[3650]: worker 1122502 started
May 03 22:54:40 Host-24 postfix/qmgr[2786424]: 04057100DF1: from=<root@Host-24.domain.com>, size=1634, nrcpt=1 (queue active)
May 03 22:55:10 Host-24 postfix/smtp[1124064]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 03 22:55:40 Host-24 postfix/smtp[1124064]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 03 22:55:40 Host-24 postfix/smtp[1124064]: 04057100DF1: to=<administrator@domain.com>, relay=none, delay=330501, delays=330441/0.15/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 03 23:00:49 Host-24 pvedaemon[988739]: <root@pam> successful auth for user 'root@pam'
May 03 23:15:25 Host-24 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
May 03 23:15:25 Host-24 kernel: ata1.00: configured for UDMA/133
May 03 23:17:01 Host-24 CRON[1165677]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 03 23:17:01 Host-24 CRON[1165678]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 03 23:17:01 Host-24 CRON[1165677]: pam_unix(cron:session): session closed for user root
May 03 23:29:40 Host-24 postfix/qmgr[2786424]: 46C2B100FC0: from=<root@Host-24.domain.com>, size=28858, nrcpt=1 (queue active)
May 03 23:30:10 Host-24 postfix/smtp[1189530]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 03 23:30:40 Host-24 postfix/smtp[1189530]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 03 23:30:40 Host-24 postfix/smtp[1189530]: 46C2B100FC0: to=<administrator@domain.com>, relay=none, delay=280007, delays=279947/0.15/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 03 23:40:47 Host-24 smartd[3268]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 63
May 03 23:40:47 Host-24 smartd[3268]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 37
May 03 23:54:40 Host-24 postfix/qmgr[2786424]: F1C0C10094D: from=<root@Host-24.domain.com>, size=2792, nrcpt=1 (queue active)
May 03 23:55:10 Host-24 postfix/smtp[1237113]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 03 23:55:40 Host-24 postfix/smtp[1237113]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 03 23:55:41 Host-24 postfix/smtp[1237113]: F1C0C10094D: to=<administrator@domain.com>, relay=none, delay=11100, delays=11040/0.16/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 00:00:19 Host-24 systemd[1]: Starting Rotate log files...
May 04 00:00:19 Host-24 systemd[1]: Starting Daily man-db regeneration...
May 04 00:00:19 Host-24 systemd[1]: Reloading PVE API Proxy Server.
May 04 00:00:19 Host-24 systemd[1]: man-db.service: Succeeded.
May 04 00:00:19 Host-24 systemd[1]: Finished Daily man-db regeneration.
May 04 00:00:20 Host-24 pveproxy[1248052]: send HUP to 3650
May 04 00:00:20 Host-24 pveproxy[3650]: received signal HUP
May 04 00:00:20 Host-24 pveproxy[3650]: server closing
May 04 00:00:20 Host-24 pveproxy[3650]: server shutdown (restart)
May 04 00:00:20 Host-24 systemd[1]: Reloaded PVE API Proxy Server.
May 04 00:00:20 Host-24 systemd[1]: Reloading PVE SPICE Proxy Server.
May 04 00:00:21 Host-24 spiceproxy[1248062]: send HUP to 3656
May 04 00:00:21 Host-24 spiceproxy[3656]: received signal HUP
May 04 00:00:21 Host-24 spiceproxy[3656]: server closing
May 04 00:00:21 Host-24 spiceproxy[3656]: server shutdown (restart)
May 04 00:00:21 Host-24 systemd[1]: Reloaded PVE SPICE Proxy Server.
May 04 00:00:21 Host-24 systemd[1]: Stopping Proxmox VE firewall logger...
May 04 00:00:21 Host-24 pvefw-logger[2727474]: received terminate request (signal)
May 04 00:00:21 Host-24 pvefw-logger[2727474]: stopping pvefw logger
May 04 00:00:21 Host-24 systemd[1]: pvefw-logger.service: Succeeded.
May 04 00:00:21 Host-24 systemd[1]: Stopped Proxmox VE firewall logger.
May 04 00:00:21 Host-24 systemd[1]: pvefw-logger.service: Consumed 5.198s CPU time.
May 04 00:00:21 Host-24 spiceproxy[3656]: restarting server
May 04 00:00:21 Host-24 spiceproxy[3656]: starting 1 worker(s)
May 04 00:00:21 Host-24 spiceproxy[3656]: worker 1248078 started
May 04 00:00:21 Host-24 systemd[1]: Starting Proxmox VE firewall logger...
May 04 00:00:21 Host-24 pvefw-logger[1248080]: starting pvefw logger
May 04 00:00:21 Host-24 systemd[1]: Started Proxmox VE firewall logger.
May 04 00:00:21 Host-24 systemd[1]: logrotate.service: Succeeded.
May 04 00:00:21 Host-24 systemd[1]: Finished Rotate log files.
May 04 00:00:21 Host-24 pveproxy[3650]: restarting server
May 04 00:00:21 Host-24 pveproxy[3650]: starting 3 worker(s)
May 04 00:00:21 Host-24 pveproxy[3650]: worker 1248088 started
May 04 00:00:21 Host-24 pveproxy[3650]: worker 1248089 started
May 04 00:00:21 Host-24 pveproxy[3650]: worker 1248090 started
May 04 00:00:26 Host-24 spiceproxy[2727480]: worker exit
May 04 00:00:26 Host-24 spiceproxy[3656]: worker 2727480 finished
May 04 00:00:26 Host-24 pveproxy[1122502]: worker exit
May 04 00:00:26 Host-24 pveproxy[1098342]: worker exit
May 04 00:00:26 Host-24 pveproxy[1040429]: worker exit
May 04 00:00:26 Host-24 pveproxy[3650]: worker 1098342 finished
May 04 00:00:26 Host-24 pveproxy[3650]: worker 1040429 finished
May 04 00:00:26 Host-24 pveproxy[3650]: worker 1122502 finished
May 04 00:04:40 Host-24 postfix/qmgr[2786424]: 04057100DF1: from=<root@Host-24.domain.com>, size=1634, nrcpt=1 (queue active)
May 04 00:05:10 Host-24 postfix/smtp[1256492]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 04 00:05:40 Host-24 postfix/smtp[1256492]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 04 00:05:40 Host-24 postfix/smtp[1256492]: 04057100DF1: to=<administrator@domain.com>, relay=none, delay=334702, delays=334641/0.17/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 00:10:47 Host-24 smartd[3268]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 63 to 62
May 04 00:10:47 Host-24 smartd[3268]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 37 to 38
May 04 00:17:01 Host-24 CRON[1280738]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 04 00:17:01 Host-24 CRON[1280739]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 04 00:17:01 Host-24 CRON[1280738]: pam_unix(cron:session): session closed for user root
May 04 00:24:01 Host-24 CRON[1294311]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 04 00:24:01 Host-24 CRON[1294312]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/trim ]; then /usr/lib/zfs-linux/trim; fi)
May 04 00:24:01 Host-24 CRON[1294311]: pam_unix(cron:session): session closed for user root
May 04 00:39:40 Host-24 postfix/qmgr[2786424]: 46C2B100FC0: from=<root@Host-24.domain.com>, size=28858, nrcpt=1 (queue active)
May 04 00:40:10 Host-24 postfix/smtp[1325019]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 04 00:40:40 Host-24 postfix/smtp[1325019]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 04 00:40:40 Host-24 postfix/smtp[1325019]: 46C2B100FC0: to=<administrator@domain.com>, relay=none, delay=284208, delays=284147/0.16/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 01:04:40 Host-24 postfix/qmgr[2786424]: F1C0C10094D: from=<root@Host-24.domain.com>, size=2792, nrcpt=1 (queue active)
May 04 01:05:10 Host-24 postfix/smtp[1374911]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 04 01:05:40 Host-24 postfix/smtp[1374911]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 04 01:05:40 Host-24 postfix/smtp[1374911]: F1C0C10094D: to=<administrator@domain.com>, relay=none, delay=15299, delays=15239/0.18/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 01:14:40 Host-24 postfix/qmgr[2786424]: 04057100DF1: from=<root@Host-24.domain.com>, size=1634, nrcpt=1 (queue active)
May 04 01:15:10 Host-24 postfix/smtp[1395447]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 04 01:15:40 Host-24 postfix/smtp[1395447]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 04 01:15:40 Host-24 postfix/smtp[1395447]: 04057100DF1: to=<administrator@domain.com>, relay=none, delay=338901, delays=338841/0.19/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 01:17:01 Host-24 CRON[1400318]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 04 01:17:01 Host-24 CRON[1400319]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 04 01:17:01 Host-24 CRON[1400318]: pam_unix(cron:session): session closed for user root
-- Reboot --
May 04 01:50:56 Host-24 kernel: Linux version 5.15.104-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.104-2 (2023-04-12T11:23Z) ()
May 04 01:50:56 Host-24 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.104-1-pve root=/dev/mapper/pve-root ro quiet
May 04 01:50:56 Host-24 kernel: KERNEL supported cpus:
May 04 01:50:56 Host-24 kernel:   Intel GenuineIntel
May 04 01:50:56 Host-24 kernel:   AMD AuthenticAMD
May 04 01:50:56 Host-24 kernel:   Hygon HygonGenuine
May 04 01:50:56 Host-24 kernel:   Centaur CentaurHauls
May 04 01:50:56 Host-24 kernel:   zhaoxin   Shanghai

Looks like a soft reboot? In any case not a classic hardware error or watchdog.

Code:
# uname -a
# Linux Host-24 5.15.104-1-pve #1 SMP PVE 5.15.104-2 (2023-04-12T11:23Z) x86_64 GNU/Linux

Thanks in advance for your input, there are only 100 Windows workstations on the host that a nightly reboot won't hurt. But I would like to understand what happened here, during the day this would be a loss of several hundred man hours and a corresponding financial loss for the company.


Many greetings
Josh
 
There is no graceful shutdown of VMs, so it looks like a hard reboot caused by power interuption (does your server restart automatically on power restore?) or another hardware issue where Proxmox cannot log anything. There
is about half an hour between the last log entry and the restart. Is this because the server takes a that amount of time for POST or it this because someone needs to go there and press the power button?
Proxmox is trying to e-mail administrator@domain.com (maybe about hardware problems?) but mail2.domain.com times out. I suggest trying to find out why the e-mailing doesn't work and what the e-mails are about.
 
Hi leesteken, thanks for the quick reply.
I assumed until now that the message --Reboot-- only appears if no system freeze or similar happened.
My guess is that the virtualizer still tried to shutdown all machines via ACPI shutdown in the time window without syslog entries and then did the reboot process. Am I wrong here? I can simulate this by unplugging a test system if need be.
During a shutdown, the server needs a maximum of 3 minutes for the POST process (Gigabyte MZ72-HB0 motherboard, dual AMD Epyc,..).
I am aware of that with the mail, here I have not finished the configuration yet due to lack of time - will be fixed.
A physical access to the virtualizer during the night by third parties I can definitely exclude.
 
Hi leesteken, thanks for the quick reply.
I assumed until now that the message --Reboot-- only appears if no system freeze or similar happened.
"-- Reboot --" is shown by the log-viewer because it detects a discontinuity between logs, it is not actually written before the reboot.
My guess is that the virtualizer still tried to shutdown all machines via ACPI shutdown in the time window without syslog entries and then did the reboot process. Am I wrong here? I can simulate this by unplugging a test system if need be.
I do think you're wrong. I don't see any signs in the log your showed that Proxmox tried to shutdown any machine. This does not look like a software controlled shutdown and restart.
During a shutdown, the server needs a maximum of 3 minutes for the POST process (Gigabyte MZ72-HB0 motherboard, dual AMD Epyc,..).
Does you server automatically power on when power is restored? It looks like it was off for more than half an hour. Maybe it took someone half an hour from a alarm on a dashboard somethere to go to the machine to power it back on?
I am aware of that with the mail, here I have not finished the configuration yet due to lack of time - will be fixed.
A physical access to the virtualizer during the night by third parties I can definitely exclude.
The server was off or frozen/handing/stuck (without any logs) or the drives were not working for half an hour. I nobody pressed the power button then some automated thing hard rebooted the system (after half an hour) or power was out for half an hour. Maybe setup remote logging to another system, to maybe get some more information next time?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!