Proxmox reboots

lumen

New Member
Mar 2, 2016
5
0
1
40
I have a problem. But I not use zfs. My server rebooted. And a see then my virutal machine normaly shutdown.
I see the time server shutdown, ,y server normaly shutdown virutal machine:
daemon.log
Mar 2 00:28:49 vm3 zed[1662]: Exiting
Mar 2 00:28:49 vm3 zabbix-agent[22995]: zabbix_agentd stopping...done.
Mar 2 00:28:49 vm3 postfix[22996]: Stopping Postfix Mail Transport Agent: postfix.
Mar 2 00:28:50 vm3 hwclock[22986]: hwclock from util-linux 2.25.2
Mar 2 00:28:50 vm3 hwclock[22986]: Using the /dev interface to the clock.
Mar 2 00:28:50 vm3 hwclock[22986]: Last drift adjustment done at 1454418075 seconds after 1969
Mar 2 00:28:50 vm3 hwclock[22986]: Last calibration done at 1454418075 seconds after 1969
Mar 2 00:28:50 vm3 hwclock[22986]: Hardware clock is on UTC time
Mar 2 00:28:50 vm3 hwclock[22986]: Assuming hardware clock is kept in UTC time.
Mar 2 00:28:50 vm3 hwclock[22986]: Waiting for clock tick...
Mar 2 00:28:50 vm3 hwclock[22986]: ...got clock tick
Mar 2 00:28:50 vm3 hwclock[22986]: Time read from Hardware Clock: 2016/03/01 20:28:50
Mar 2 00:28:50 vm3 hwclock[22986]: Hw clock time : 2016/03/01 20:28:50 = 1456864130 seconds since 1969
Mar 2 00:28:50 vm3 hwclock[22986]: 1456864130.500000 is close enough to 1456864130.500000 (0.000000 < 0.001000)
Mar 2 00:28:50 vm3 hwclock[22986]: Set RTC to 1456864130 (1456864130 + 0; refsystime = 1456864130.000000)
Mar 2 00:28:50 vm3 hwclock[22986]: Setting Hardware Clock to 20:28:50 = 1456864130 seconds since 1969
Mar 2 00:28:50 vm3 hwclock[22986]: ioctl(RTC_SET_TIME) was successful.
Mar 2 00:28:50 vm3 hwclock[22986]: Clock drifted 0.0 seconds in the past 2446055 seconds in spite of a drift factor of 0.000106 seconds/day.
Mar 2 00:28:50 vm3 hwclock[22986]: Adjusting drift factor by 0.000036 seconds/day
Mar 2 00:28:51 vm3 pve-manager[23108]: shutdown VM 140006: UPID:vm3:00005A44:0E9429CD:56D5FB83:qmshutdown:140006:root@pam:
Mar 2 00:28:51 vm3 pve-manager[23110]: shutdown VM 140005: UPID:vm3:00005A46:0E9429CE:56D5FB83:qmshutdown:140005:root@pam:
Mar 2 00:28:51 vm3 pve-manager[23112]: shutdown VM 140004: UPID:vm3:00005A48:0E9429D3:56D5FB83:qmshutdown:140004:root@pam:
Mar 2 00:28:51 vm3 pve-manager[23113]: shutdown VM 303: UPID:vm3:00005A49:0E9429D3:56D5FB83:qmshutdown:303:root@pam:
Mar 2 00:28:51 vm3 pve-manager[23116]: shutdown VM 302: UPID:vm3:00005A4C:0E9429D5:56D5FB83:qmshutdown:302:root@pam:
Mar 2 00:28:51 vm3 hwclock[23072]: hwclock from util-linux 2.25.2
Mar 2 00:28:51 vm3 hwclock[23072]: Using the /dev interface to the clock.
Mar 2 00:28:51 vm3 hwclock[23072]: Last drift adjustment done at 1456864130 seconds after 1969
Mar 2 00:28:51 vm3 hwclock[23072]: Last calibration done at 1456864130 seconds after 1969
Mar 2 00:28:51 vm3 hwclock[23072]: Hardware clock is on UTC time
Mar 2 00:28:51 vm3 hwclock[23072]: Assuming hardware clock is kept in UTC time.
Mar 2 00:28:51 vm3 hwclock[23072]: Waiting for clock tick...
Mar 2 00:28:51 vm3 hwclock[23072]: ...got clock tick
Mar 2 00:28:51 vm3 hwclock[23072]: Time read from Hardware Clock: 2016/03/01 20:28:51
Mar 2 00:28:51 vm3 hwclock[23072]: Hw clock time : 2016/03/01 20:28:51 = 1456864131 seconds since 1969
Mar 2 00:28:51 vm3 hwclock[23072]: 1456864131.500000 is close enough to 1456864131.500000 (0.000000 < 0.001000)
Mar 2 00:28:51 vm3 hwclock[23072]: Set RTC to 1456864131 (1456864131 + 0; refsystime = 1456864131.000000)
Mar 2 00:28:51 vm3 hwclock[23072]: Setting Hardware Clock to 20:28:51 = 1456864131 seconds since 1969
Mar 2 00:28:51 vm3 hwclock[23072]: ioctl(RTC_SET_TIME) was successful.
Mar 2 00:28:51 vm3 hwclock[23072]: Not adjusting drift factor because it has been less than a day since the last calibration.
Mar 2 00:28:52 vm3 hwclock[23121]: hwclock from util-linux 2.25.2
Mar 2 00:28:52 vm3 hwclock[23121]: Using the /dev interface to the clock.
Mar 2 00:28:52 vm3 hwclock[23121]: Last drift adjustment done at 1456864131 seconds after 1969
Mar 2 00:28:52 vm3 hwclock[23121]: Last calibration done at 1456864131 seconds after 1969
Mar 2 00:28:52 vm3 hwclock[23121]: Hardware clock is on UTC time
Mar 2 00:28:52 vm3 hwclock[23121]: Assuming hardware clock is kept in UTC time.
Mar 2 00:28:52 vm3 hwclock[23121]: Waiting for clock tick...
Mar 2 00:28:52 vm3 hwclock[23121]: ...got clock tick
Mar 2 00:28:52 vm3 hwclock[23121]: Time read from Hardware Clock: 2016/03/01 20:28:52
Mar 2 00:28:52 vm3 hwclock[23121]: Hw clock time : 2016/03/01 20:28:52 = 1456864132 seconds since 1969
Mar 2 00:28:52 vm3 hwclock[23121]: 1456864132.500000 is close enough to 1456864132.500000 (0.000000 < 0.001000)
Mar 2 00:28:52 vm3 hwclock[23121]: Set RTC to 1456864132 (1456864132 + 0; refsystime = 1456864132.000000)
Mar 2 00:28:52 vm3 hwclock[23121]: Setting Hardware Clock to 20:28:52 = 1456864132 seconds since 1969
Mar 2 00:28:52 vm3 hwclock[23121]: ioctl(RTC_SET_TIME) was successful.
Mar 2 00:28:52 vm3 hwclock[23121]: Not adjusting drift factor because it has been less than a day since the last calibration.
Mar 2 00:28:53 vm3 hwclock[23153]: hwclock from util-linux 2.25.2
Mar 2 00:28:53 vm3 hwclock[23153]: Using the /dev interface to the clock.
Mar 2 00:28:53 vm3 hwclock[23153]: Last drift adjustment done at 1456864132 seconds after 1969
Mar 2 00:28:53 vm3 hwclock[23153]: Last calibration done at 1456864132 seconds after 1969
Mar 2 00:28:53 vm3 hwclock[23153]: Hardware clock is on UTC time
Mar 2 00:28:53 vm3 hwclock[23153]: Assuming hardware clock is kept in UTC time.
Mar 2 00:28:53 vm3 hwclock[23153]: Waiting for clock tick...
Mar 2 00:28:53 vm3 hwclock[23153]: ...got clock tick
Mar 2 00:28:53 vm3 hwclock[23153]: Time read from Hardware Clock: 2016/03/01 20:28:53
Mar 2 00:28:53 vm3 hwclock[23153]: Hw clock time : 2016/03/01 20:28:53 = 1456864133 seconds since 1969
Mar 2 00:28:53 vm3 hwclock[23153]: 1456864133.500000 is close enough to 1456864133.500000 (0.000000 < 0.001000)
Mar 2 00:28:53 vm3 hwclock[23153]: Set RTC to 1456864133 (1456864133 + 0; refsystime = 1456864133.000000)
Mar 2 00:28:53 vm3 hwclock[23153]: Setting Hardware Clock to 20:28:53 = 1456864133 seconds since 1969
Mar 2 00:28:53 vm3 hwclock[23153]: ioctl(RTC_SET_TIME) was successful.
Mar 2 00:28:53 vm3 hwclock[23153]: Not adjusting drift factor because it has been less than a day since the last calibration.
Mar 2 00:28:54 vm3 hwclock[23184]: hwclock from util-linux 2.25.2
Mar 2 00:28:54 vm3 hwclock[23184]: Using the /dev interface to the clock.
Mar 2 00:28:54 vm3 hwclock[23184]: Last drift adjustment done at 1456864133 seconds after 1969
Mar 2 00:28:54 vm3 hwclock[23184]: Last calibration done at 1456864133 seconds after 1969
Mar 2 00:28:54 vm3 hwclock[23184]: Hardware clock is on UTC time
Mar 2 00:28:54 vm3 hwclock[23184]: Assuming hardware clock is kept in UTC time.
Mar 2 00:28:54 vm3 hwclock[23184]: Waiting for clock tick...
Mar 2 00:28:54 vm3 hwclock[23184]: ...got clock tick
Mar 2 00:28:54 vm3 hwclock[23184]: Time read from Hardware Clock: 2016/03/01 20:28:54
Mar 2 00:28:54 vm3 hwclock[23184]: Hw clock time : 2016/03/01 20:28:54 = 1456864134 seconds since 1969
Mar 2 00:28:54 vm3 hwclock[23184]: 1456864134.500000 is close enough to 1456864134.500000 (0.000000 < 0.001000)
Mar 2 00:28:54 vm3 hwclock[23184]: Set RTC to 1456864134 (1456864134 + 0; refsystime = 1456864134.000000)
Mar 2 00:28:54 vm3 hwclock[23184]: Setting Hardware Clock to 20:28:54 = 1456864134 seconds since 1969
Mar 2 00:28:54 vm3 hwclock[23184]: ioctl(RTC_SET_TIME) was successful.
Mar 2 00:28:54 vm3 hwclock[23184]: Not adjusting drift factor because it has been less than a day since the last calibration.

auth.log
Mar 2 00:34:48 vm3 systemd-logind[1668]: Watching system buttons on /dev/input/event1 (Power Button)
Mar 2 00:34:48 vm3 systemd-logind[1668]: Watching system buttons on /dev/input/event0 (Power Button)
Mar 2 00:34:48 vm3 systemd-logind[1668]: New seat seat0.


Please help me!!
 
And a see then my virutal machine normaly shutdown.

That means that (someone) triggers a reboot/shutdown as the services get cleanly shutdown (no watchdog).

daemon.log
Mar 2 00:28:49 vm3 zed[1662]: Exiting
Mar 2 00:28:49 vm3 zabbix-agent[22995]: zabbix_agentd stopping...done.
Mar 2 00:28:49 vm3 postfix[22996]: Stopping Postfix Mail Transport Agent: postfix.
Mar 2 00:28:50 vm3 hwclock[22986]: hwclock from util-linux 2.25.2
Mar 2 00:28:50 vm3 hwclock[22986]: Using the /dev interface to the clock.
Mar 2 00:28:50 vm3 hwclock[22986]: Last drift adjustment done at 1454418075 seconds after 1969
Mar 2 00:28:50 vm3 hwclock[22986]: Last calibration done at 1454418075 seconds after 1969
Mar 2 00:28:50 vm3 hwclock[22986]: Hardware clock is on UTC time
Mar 2 00:28:50 vm3 hwclock[22986]: Assuming hardware clock is kept in UTC time.
Mar 2 00:28:50 vm3 hwclock[22986]: Waiting for clock tick...
Mar 2 00:28:50 vm3 hwclock[22986]: ...got clock tick

At the beginning from the posted log the shutdown/reboot has already started. What happens previous to that?
 
I do not see suspicious activities. But only auth.log
kern.log
Mar 1 00:00:01 vm3 kernel: [2357849.095678] Process accounting resumed
Mar 2 00:00:01 vm3 kernel: [2444251.438185] Process accounting resumed
Mar 2 00:28:51 vm3 pve-manager[23075]: <root@pam> starting task UPID:vm3:00005A43:0E9429CB:56D5FB83:stopall::root@pam:
..
auth.log
Mar 2 00:17:01 vm3 CRON[22320]: pam_unix(cron:session): session closed for user root
Mar 2 00:28:49 vm3 login[1749]: pam_unix(login:auth): check pass; user unknown
Mar 2 00:28:49 vm3 login[1749]: pam_unix(login:auth): authentication failure; logname=LOGIN uid=0 euid=0 tty=/dev/tty1 ruser= rhost=
Mar 2 00:34:48 vm3 sshd[1662]: Server listening on 0.0.0.0 port 22.
Mar 2 00:34:48 vm3 sshd[1662]: Server listening on :: port 22.
Mar 2 00:34:48 vm3 systemd-logind[1668]: Watching system buttons on /dev/input/event1 (Power Button)
Mar 2 00:34:48 vm3 systemd-logind[1668]: Watching system buttons on /dev/input/event0 (Power Button)
Mar 2 00:34:48 vm3 systemd-logind[1668]: New seat seat0.
Mar 2 01:13:01 vm3 CRON[4329]: pam_unix(cron:session): session opened for user root by (uid=0)
Mar 2 01:13:01 vm3 CRON[4329]: pam_unix(cron:session): session closed for user root
Mar 2 01:17:01 vm3 CRON[4564]: pam_unix(cron:session): session opened for user root by (uid=0)
...
messages
Mar 2 00:00:01 vm3 kernel: [2444251.438185] Process accounting resumed
Mar 2 00:28:51 vm3 pve-manager[23075]: <root@pam> starting task UPID:vm3:00005A43:0E9429CB:56D5FB83:stopall::root@pam:
Mar 2 00:28:51 vm3 pve-manager[23107]: <root@pam> starting task UPID:vm3:00005A44:0E9429CD:56D5FB83:qmshutdown:140006:root@pam:
Mar 2 00:28:51 vm3 pve-manager[23107]: <root@pam> starting task UPID:vm3:00005A46:0E9429CE:56D5FB83:qmshutdown:140005:root@pam:

daemon.log
Mar 2 00:04:01 vm3 rrdcached[1692]: started new journal /var/lib/rrdcached/journal/rrd.journal.1456862641.930145
Mar 2 00:04:01 vm3 rrdcached[1692]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1456855441.930115
Mar 2 00:28:49 vm3 zed[1662]: Exiting
Mar 2 00:28:49 vm3 zabbix-agent[22995]: zabbix_agentd stopping...done.
Mar 2 00:28:49 vm3 postfix[22996]: Stopping Postfix Mail Transport Agent: postfix.
Mar 2 00:28:50 vm3 hwclock[22986]: hwclock from util-linux 2.25.2
Mar 2 00:28:50 vm3 hwclock[22986]: Using the /dev interface to the clock.
Mar 2 00:28:50 vm3 hwclock[22986]: Last drift adjustment done at 1454418075 seconds after 1969
Mar 2 00:28:50 vm3 hwclock[22986]: Last calibration done at 1454418075 seconds after 1969
Mar 2 00:28:50 vm3 hwclock[22986]: Hardware clock is on UTC time
Mar 2 00:28:50 vm3 hwclock[22986]: Assuming hardware clock is kept in UTC time.
Mar 2 00:28:50 vm3 hwclock[22986]: Waiting for clock tick...
Mar 2 00:28:50 vm3 hwclock[22986]: ...got clock tick
Mar 2 00:28:50 vm3 hwclock[22986]: Time read from Hardware Clock: 2016/03/01 20:28:50
Mar 2 00:28:50 vm3 hwclock[22986]: Hw clock time : 2016/03/01 20:28:50 = 1456864130 seconds since 1969
Mar 2 00:28:50 vm3 hwclock[22986]: 1456864130.500000 is close enough to 1456864130.500000 (0.000000 < 0.001000)
Mar 2 00:28:50 vm3 hwclock[22986]: Set RTC to 1456864130 (1456864130 + 0; refsystime = 1456864130.000000)
Mar 2 00:28:50 vm3 hwclock[22986]: Setting Hardware Clock to 20:28:50 = 1456864130 seconds since 1969
Mar 2 00:28:50 vm3 hwclock[22986]: ioctl(RTC_SET_TIME) was successful.
Mar 2 00:28:50 vm3 hwclock[22986]: Clock drifted 0.0 seconds in the past 2446055 seconds in spite of a drift factor of 0.000106 seconds/day.
Mar 2 00:28:50 vm3 hwclock[22986]: Adjusting drift factor by 0.000036 seconds/day

debug
Mar 2 00:04:01 vm3 rrdcached[1692]: started new journal /var/lib/rrdcached/journal/rrd.journal.1456862641.930145
Mar 2 00:04:01 vm3 rrdcached[1692]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1456855441.930115
Mar 2 00:34:48 vm3 rrdcached[1698]: started new journal /var/lib/rrdcached/journal/rrd.journal.1456864487.953783
Mar 2 00:34:48 vm3 kernel: [ 0.000000] DMI: HP ProLiant DL180 G6 , BIOS O20 07/01/2013
Mar 2 00:34:48 vm3 kernel: [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Mar 2 00:34:48 vm3 kernel: [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
Mar 2 00:34:48 vm3 kernel: [ 0.000000] MTRR default type: uncachable
Mar 2 00:34:48 vm3 kernel: [ 0.000000] MTRR fixed ranges enabled:

syslog.1
Mar 2 00:00:01 vm3 CRON[21304]: (root) CMD (invoke-rc.d atop _cron)
Mar 2 00:00:01 vm3 kernel: [2444251.438185] Process accounting resumed
Mar 2 00:04:01 vm3 rrdcached[1692]: flushing old values
Mar 2 00:04:01 vm3 rrdcached[1692]: rotating journals
Mar 2 00:04:01 vm3 rrdcached[1692]: started new journal /var/lib/rrdcached/journal/rrd.journal.1456862641.930145
Mar 2 00:04:01 vm3 rrdcached[1692]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1456855441.930115
Mar 2 00:06:59 vm3 systemd-timesyncd[904]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.025s/0.002s/+6ppm
Mar 2 00:13:01 vm3 CRON[22075]: (root) CMD (if test -x /usr/sbin/apticron; then /usr/sbin/apticron --cron; else true; fi)
Mar 2 00:17:01 vm3 CRON[22321]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 2 00:28:49 vm3 zed[1662]: Exiting
Mar 2 00:28:49 vm3 zabbix-agent[22995]: zabbix_agentd stopping...done.
Mar 2 00:28:49 vm3 postfix/master[1854]: terminating on signal 15
Mar 2 00:28:49 vm3 postfix[22996]: Stopping Postfix Mail Transport Agent: postfix.
Mar 2 00:28:50 vm3 hwclock[22986]: hwclock from util-linux 2.25.2
Mar 2 00:28:50 vm3 hwclock[22986]: Using the /dev interface to the clock.
Mar 2 00:28:50 vm3 hwclock[22986]: Last drift adjustment done at 1454418075 seconds after 1969
Mar 2 00:28:50 vm3 hwclock[22986]: Last calibration done at 1454418075 seconds after 1969
Mar 2 00:28:50 vm3 hwclock[22986]: Hardware clock is on UTC time
Mar 2 00:28:50 vm3 hwclock[22986]: Assuming hardware clock is kept in UTC time.
 
Mar 2 00:28:49 vm3 login[1749]: pam_unix(login:auth): check pass; user unknown
Mar 2 00:28:49 vm3 login[1749]: pam_unix(login:auth): authentication failure; logname=LOGIN uid=0 euid=0 tty=/dev/tty1 ruser= rhost=

Hmm, that's the single strange thing I see, a bit suspicious. I'd guess that there tampered someone with the node.

Have a persistent journal enabled? (in other words, does the directory under /var/log/journal exists?)
If yes attach:
Code:
journalctl --since "2016-03-02" --until "2016-03-03" > strange-reboot.log
here.

Is there a IPMI (or something similar) on the node, something which could send an ACPI power down command?
 
root@vm3:~# ls /var/log/journal
ls: cannot access /var/log/journal: No such file or directory

I not configure ipmi...

I upload file with strange-reboot.log.
 

Attachments

root@vm3:~# ls /var/log/journal
ls: cannot access /var/log/journal: No such file or directory

Ok, without the persistent journal this shows only the current boot and not the reason for the reboot.

You can make it persistent with:
Code:
mkdir /var/log/journal

and if this happens again and was not only a hiccup form the node, post the
Code:
journalctl --since "2016-03-02" --until "2016-03-03" > strange-reboot.log
again, with adapted dates (they have the format "yyyy-mm-dd")

Currently I cannot really determine what caused this.

Did that only happen once or more often?
 
Tthat only happen once.

Hmm, strange. But then I'd wait a bit, possible hiccup. Setup persistent logging and if it happens again post it here then we can see what triggered the reboot.

For now, change root password and if possible restrict with the firewall who can login to the 22 ssh port (attention, look that you do not lock yourself out if this is a remote cluster!) to your IP range (this would need a persistent IP from your side).

If you have user who can login to the web interface check also that they do not have the permission tho powerof/reboot the node if you do not trust them.