Hello together
I have several proxmox hosts running and one of the hosts restarts every few weeks without my influence. Maybe someone has an idea how it comes to the reboots. To me the syslog looks normal, but maybe I am missing something here:
Looks like a soft reboot? In any case not a classic hardware error or watchdog.
Thanks in advance for your input, there are only 100 Windows workstations on the host that a nightly reboot won't hurt. But I would like to understand what happened here, during the day this would be a loss of several hundred man hours and a corresponding financial loss for the company.
Many greetings
Josh
I have several proxmox hosts running and one of the hosts restarts every few weeks without my influence. Maybe someone has an idea how it comes to the reboots. To me the syslog looks normal, but maybe I am missing something here:
Code:
May 03 22:45:40 Host-24 postfix/smtp[1105472]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 03 22:45:40 Host-24 postfix/smtp[1105472]: F1C0C10094D: to=<administrator@domain.com>, relay=none, delay=6899, delays=6839/0.14/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 03 22:53:54 Host-24 pveproxy[1043877]: worker exit
May 03 22:53:54 Host-24 pveproxy[3650]: worker 1043877 finished
May 03 22:53:54 Host-24 pveproxy[3650]: starting 1 worker(s)
May 03 22:53:54 Host-24 pveproxy[3650]: worker 1122502 started
May 03 22:54:40 Host-24 postfix/qmgr[2786424]: 04057100DF1: from=<root@Host-24.domain.com>, size=1634, nrcpt=1 (queue active)
May 03 22:55:10 Host-24 postfix/smtp[1124064]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 03 22:55:40 Host-24 postfix/smtp[1124064]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 03 22:55:40 Host-24 postfix/smtp[1124064]: 04057100DF1: to=<administrator@domain.com>, relay=none, delay=330501, delays=330441/0.15/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 03 23:00:49 Host-24 pvedaemon[988739]: <root@pam> successful auth for user 'root@pam'
May 03 23:15:25 Host-24 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
May 03 23:15:25 Host-24 kernel: ata1.00: configured for UDMA/133
May 03 23:17:01 Host-24 CRON[1165677]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 03 23:17:01 Host-24 CRON[1165678]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 03 23:17:01 Host-24 CRON[1165677]: pam_unix(cron:session): session closed for user root
May 03 23:29:40 Host-24 postfix/qmgr[2786424]: 46C2B100FC0: from=<root@Host-24.domain.com>, size=28858, nrcpt=1 (queue active)
May 03 23:30:10 Host-24 postfix/smtp[1189530]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 03 23:30:40 Host-24 postfix/smtp[1189530]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 03 23:30:40 Host-24 postfix/smtp[1189530]: 46C2B100FC0: to=<administrator@domain.com>, relay=none, delay=280007, delays=279947/0.15/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 03 23:40:47 Host-24 smartd[3268]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 63
May 03 23:40:47 Host-24 smartd[3268]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 37
May 03 23:54:40 Host-24 postfix/qmgr[2786424]: F1C0C10094D: from=<root@Host-24.domain.com>, size=2792, nrcpt=1 (queue active)
May 03 23:55:10 Host-24 postfix/smtp[1237113]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 03 23:55:40 Host-24 postfix/smtp[1237113]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 03 23:55:41 Host-24 postfix/smtp[1237113]: F1C0C10094D: to=<administrator@domain.com>, relay=none, delay=11100, delays=11040/0.16/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 00:00:19 Host-24 systemd[1]: Starting Rotate log files...
May 04 00:00:19 Host-24 systemd[1]: Starting Daily man-db regeneration...
May 04 00:00:19 Host-24 systemd[1]: Reloading PVE API Proxy Server.
May 04 00:00:19 Host-24 systemd[1]: man-db.service: Succeeded.
May 04 00:00:19 Host-24 systemd[1]: Finished Daily man-db regeneration.
May 04 00:00:20 Host-24 pveproxy[1248052]: send HUP to 3650
May 04 00:00:20 Host-24 pveproxy[3650]: received signal HUP
May 04 00:00:20 Host-24 pveproxy[3650]: server closing
May 04 00:00:20 Host-24 pveproxy[3650]: server shutdown (restart)
May 04 00:00:20 Host-24 systemd[1]: Reloaded PVE API Proxy Server.
May 04 00:00:20 Host-24 systemd[1]: Reloading PVE SPICE Proxy Server.
May 04 00:00:21 Host-24 spiceproxy[1248062]: send HUP to 3656
May 04 00:00:21 Host-24 spiceproxy[3656]: received signal HUP
May 04 00:00:21 Host-24 spiceproxy[3656]: server closing
May 04 00:00:21 Host-24 spiceproxy[3656]: server shutdown (restart)
May 04 00:00:21 Host-24 systemd[1]: Reloaded PVE SPICE Proxy Server.
May 04 00:00:21 Host-24 systemd[1]: Stopping Proxmox VE firewall logger...
May 04 00:00:21 Host-24 pvefw-logger[2727474]: received terminate request (signal)
May 04 00:00:21 Host-24 pvefw-logger[2727474]: stopping pvefw logger
May 04 00:00:21 Host-24 systemd[1]: pvefw-logger.service: Succeeded.
May 04 00:00:21 Host-24 systemd[1]: Stopped Proxmox VE firewall logger.
May 04 00:00:21 Host-24 systemd[1]: pvefw-logger.service: Consumed 5.198s CPU time.
May 04 00:00:21 Host-24 spiceproxy[3656]: restarting server
May 04 00:00:21 Host-24 spiceproxy[3656]: starting 1 worker(s)
May 04 00:00:21 Host-24 spiceproxy[3656]: worker 1248078 started
May 04 00:00:21 Host-24 systemd[1]: Starting Proxmox VE firewall logger...
May 04 00:00:21 Host-24 pvefw-logger[1248080]: starting pvefw logger
May 04 00:00:21 Host-24 systemd[1]: Started Proxmox VE firewall logger.
May 04 00:00:21 Host-24 systemd[1]: logrotate.service: Succeeded.
May 04 00:00:21 Host-24 systemd[1]: Finished Rotate log files.
May 04 00:00:21 Host-24 pveproxy[3650]: restarting server
May 04 00:00:21 Host-24 pveproxy[3650]: starting 3 worker(s)
May 04 00:00:21 Host-24 pveproxy[3650]: worker 1248088 started
May 04 00:00:21 Host-24 pveproxy[3650]: worker 1248089 started
May 04 00:00:21 Host-24 pveproxy[3650]: worker 1248090 started
May 04 00:00:26 Host-24 spiceproxy[2727480]: worker exit
May 04 00:00:26 Host-24 spiceproxy[3656]: worker 2727480 finished
May 04 00:00:26 Host-24 pveproxy[1122502]: worker exit
May 04 00:00:26 Host-24 pveproxy[1098342]: worker exit
May 04 00:00:26 Host-24 pveproxy[1040429]: worker exit
May 04 00:00:26 Host-24 pveproxy[3650]: worker 1098342 finished
May 04 00:00:26 Host-24 pveproxy[3650]: worker 1040429 finished
May 04 00:00:26 Host-24 pveproxy[3650]: worker 1122502 finished
May 04 00:04:40 Host-24 postfix/qmgr[2786424]: 04057100DF1: from=<root@Host-24.domain.com>, size=1634, nrcpt=1 (queue active)
May 04 00:05:10 Host-24 postfix/smtp[1256492]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 04 00:05:40 Host-24 postfix/smtp[1256492]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 04 00:05:40 Host-24 postfix/smtp[1256492]: 04057100DF1: to=<administrator@domain.com>, relay=none, delay=334702, delays=334641/0.17/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 00:10:47 Host-24 smartd[3268]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 63 to 62
May 04 00:10:47 Host-24 smartd[3268]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 37 to 38
May 04 00:17:01 Host-24 CRON[1280738]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 04 00:17:01 Host-24 CRON[1280739]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 04 00:17:01 Host-24 CRON[1280738]: pam_unix(cron:session): session closed for user root
May 04 00:24:01 Host-24 CRON[1294311]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 04 00:24:01 Host-24 CRON[1294312]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/trim ]; then /usr/lib/zfs-linux/trim; fi)
May 04 00:24:01 Host-24 CRON[1294311]: pam_unix(cron:session): session closed for user root
May 04 00:39:40 Host-24 postfix/qmgr[2786424]: 46C2B100FC0: from=<root@Host-24.domain.com>, size=28858, nrcpt=1 (queue active)
May 04 00:40:10 Host-24 postfix/smtp[1325019]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 04 00:40:40 Host-24 postfix/smtp[1325019]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 04 00:40:40 Host-24 postfix/smtp[1325019]: 46C2B100FC0: to=<administrator@domain.com>, relay=none, delay=284208, delays=284147/0.16/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 01:04:40 Host-24 postfix/qmgr[2786424]: F1C0C10094D: from=<root@Host-24.domain.com>, size=2792, nrcpt=1 (queue active)
May 04 01:05:10 Host-24 postfix/smtp[1374911]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 04 01:05:40 Host-24 postfix/smtp[1374911]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 04 01:05:40 Host-24 postfix/smtp[1374911]: F1C0C10094D: to=<administrator@domain.com>, relay=none, delay=15299, delays=15239/0.18/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 01:14:40 Host-24 postfix/qmgr[2786424]: 04057100DF1: from=<root@Host-24.domain.com>, size=1634, nrcpt=1 (queue active)
May 04 01:15:10 Host-24 postfix/smtp[1395447]: connect to mail.domain.com[yyy.yyy.yyy.yyy]:25: Connection timed out
May 04 01:15:40 Host-24 postfix/smtp[1395447]: connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out
May 04 01:15:40 Host-24 postfix/smtp[1395447]: 04057100DF1: to=<administrator@domain.com>, relay=none, delay=338901, delays=338841/0.19/60/0, dsn=4.4.1, status=deferred (connect to mail2.domain.com[xxx.xxx.xxx.xxx]:25: Connection timed out)
May 04 01:17:01 Host-24 CRON[1400318]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 04 01:17:01 Host-24 CRON[1400319]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 04 01:17:01 Host-24 CRON[1400318]: pam_unix(cron:session): session closed for user root
-- Reboot --
May 04 01:50:56 Host-24 kernel: Linux version 5.15.104-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.104-2 (2023-04-12T11:23Z) ()
May 04 01:50:56 Host-24 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.104-1-pve root=/dev/mapper/pve-root ro quiet
May 04 01:50:56 Host-24 kernel: KERNEL supported cpus:
May 04 01:50:56 Host-24 kernel: Intel GenuineIntel
May 04 01:50:56 Host-24 kernel: AMD AuthenticAMD
May 04 01:50:56 Host-24 kernel: Hygon HygonGenuine
May 04 01:50:56 Host-24 kernel: Centaur CentaurHauls
May 04 01:50:56 Host-24 kernel: zhaoxin Shanghai
Looks like a soft reboot? In any case not a classic hardware error or watchdog.
Code:
# uname -a
# Linux Host-24 5.15.104-1-pve #1 SMP PVE 5.15.104-2 (2023-04-12T11:23Z) x86_64 GNU/Linux
Thanks in advance for your input, there are only 100 Windows workstations on the host that a nightly reboot won't hurt. But I would like to understand what happened here, during the day this would be a loss of several hundred man hours and a corresponding financial loss for the company.
Many greetings
Josh