power cycling

johndoe297

New Member
Jul 9, 2023
14
1
3
so i have an issue that i've come to notice on proxmox, for some reason my system shuts itself down seemingly randomly, until manual reboot. i'm not sure if it's hardware, or some kind of watchdog setting, i had talked to someone and confirmed there is a watchdog on there, but i don't know how to or what to do with it. any help would be great.
 

Attachments

  • log 1.png
    log 1.png
    205.1 KB · Views: 19
  • screen 2.png
    screen 2.png
    73.2 KB · Views: 20
  • screen 3.png
    screen 3.png
    29.2 KB · Views: 18
  • screen 4.png
    screen 4.png
    70.5 KB · Views: 19
Hi,
is this a standalone node or part of a cluster. If it's a cluster, do you maybe have HA enabled but no quorum? Please also check the system logs for more information.
 
Hi,
is this a standalone node or part of a cluster. If it's a cluster, do you maybe have HA enabled but no quorum? Please also check the system logs for more information.
standalone node, i don't believe HA is enabled unless it is by default, i'm not sure what quorum is. i have checked the logs, at least surface level as i'm not an IT guy, but it appears one time there was something about a light power button push, but no one touches that machine. the only pattern i can seem to find because with the help of GPT i haven't found any errors that are standing out; is that my truenas VM always shuts down first and then there is another task to stop all CTs and VMs. so i don't know if there's some issue in that VM, but nothing major stands out in those logs either. at least to me and admittedly i don't really know what i'm looking for.
 
Hi,
is this a standalone node or part of a cluster. If it's a cluster, do you maybe have HA enabled but no quorum? Please also check the system logs for more information.
[timestamp] [machine_name] systemd[1]: Failed to start zfs-import-scan.service - Import ZFS pools by d>
[timestamp] [machine_name] smartd[722]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) secto>
[timestamp] [machine_name] pvedaemon[1070]: authentication failure; rhost=::ffff:172.27.88.7 user=root>
[timestamp] [machine_name] smartd[722]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) secto>
[timestamp] [machine_name] pveproxy[75110]: got inotify poll request in wrong process - disabling inot>
[timestamp] [machine_name] pvedaemon[81135]: command '/usr/bin/termproxy 5900 --path /vms/104 --perm V>
[timestamp] [machine_name] pvedaemon[1071]: <root@pam> end task UPID:[machine_name]:00013CEF:0015F37F:64D>

[timestamp] [machine_name] postfix/smtp[427356]: connect to mx.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427355]: connect to mx.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427354]: connect to mx.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427356]: connect to mx2.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427354]: connect to mx2.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427355]: connect to mx2.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427356]: connect to mx3.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427354]: connect to mx3.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427355]: connect to mx3.zoho.com[IP_ADDRESS]:25: Connection timed out
[timestamp] [machine_name] postfix/smtp[427356]: B4EC9C0FB1: to=<user@domain.com>, relay=none, delay=242298, delays=242208/0.02/90/0, dsn=4.4.1, status=deferred (connect to mx3.zoho.com[IP_ADDRESS]:25: Connection timed out)
[timestamp] [machine_name] postfix/smtp[427354]: 9B0F9C0FAD: to=<user@domain.com>, relay=none, delay=253099, delays=253008/0.01/90/0, dsn=4.4.1, status=deferred (connect to mx3.zoho.com[IP_ADDRESS]:25: Connection timed out)
[timestamp] [machine_name] postfix/smtp[427355]: E5DA0C0FD3: to=<user@domain.com>, relay=none, delay=161298, delays=161208/0.01/90/0, dsn=4.4.1, status=deferred (connect to mx3.zoho.com[IP_ADDRESS]:25: Connection timed out)
[timestamp] [machine_name] smartd[741]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 133 to 130
[timestamp] [machine_name] CRON[439071]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
[timestamp] [machine_name] CRON[439072]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
[timestamp] [machine_name] CRON[439071]: pam_unix(cron:session): session closed for user root
[timestamp] [machine_name] smartd[741]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 38 to 39
[timestamp] [machine_name] systemd-logind[742]: Power key pressed short.
[timestamp] [machine_name] systemd-logind[742]: Powering off...
[timestamp] [machine_name] systemd-logind[742]: System is powering down.
[timestamp] [machine_name] systemd[1]: 101.scope: Deactivated successfully.
[timestamp] [machine_name] systemd[1]: Stopped 101.scope.
[timestamp] [machine_name] systemd[1]: 101.scope: Consumed 2h 31min 25.700s CPU time.
[timestamp] [machine_name] systemd[1]: 105.scope: Deactivated successfully.
[timestamp] [machine_name] systemd[1]: Stopped 105.scope.
[timestamp] [machine_name] systemd[1]: 105.scope: Consumed 1h 3min 59.883s CPU time.
[timestamp] [machine_name] systemd[1]: Removed slice qemu.slice - Slice /qemu.
[timestamp] [machine_name] systemd[1]: qemu.slice: Consumed 3h 38min 29.304s CPU time.
[timestamp] [machine_name] systemd[1]: Removed slice system-modprobe.slice - Slice /system/modprobe.
[timestamp] [machine_name] systemd[1]: Stopped target graphical.target - Graphical Interface.
[timestamp] [machine_name] systemd[1]: Stopped target multi-user.target - Multi-User System.
[timestamp] [machine_name] systemd[1]: Stopped target getty.target - Login Prompts.
[timestamp] [machine_name] systemd[1]: Stopped target rpc_pipefs.target.
[timestamp] [machine_name] systemd[1]: Stopped target rpcbind.target - RPC Port Mapper.
[timestamp] [machine_name] systemd[1]: Stopped target sound.target - Sound Card.
[timestamp] [machine_name] systemd[1]: Stopped target timers.target - Timer Units.
[timestamp] [machine_name] systemd[1]: apt-daily-upgrade.timer: Deactivated successfully.
[timestamp] [machine_name] systemd[1]: Stopped apt-daily-upgrade.timer - Daily apt upgrade and clean activities.
[timestamp] [machine_name] systemd[1]: apt-daily.timer: Deactivated successfully.
[timestamp] [machine_name] systemd[1]: Stopped apt-daily.timer - Daily apt download activities.
 
[timestamp] [machine_name] smartd[722]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) secto>
Du solltest mal deine Disk checken.

[timestamp] [machine_name] systemd-logind[742]: Power key pressed short.
[timestamp] [machine_name] systemd-logind[742]: Powering off...
[timestamp] [machine_name] systemd-logind[742]: System is powering down.
Ähnliches Problem hat schon letztens wer berichtet. Vielleicht doch ein Software-Bug?:
https://forum.proxmox.com/threads/strange-incident-server-self-powered-off.131826/#post-580178
 
Sorry, wrote in German. Should mean:

[timestamp] [machine_name] smartd[722]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) secto>
You should check your disk.

[timestamp] [machine_name] systemd-logind[742]: Power key pressed short.
[timestamp] [machine_name] systemd-logind[742]: Powering off...
[timestamp] [machine_name] systemd-logind[742]: System is powering down.
Someone else posted a similar problem some days ago. Maybe really a software bug?:
https://forum.proxmox.com/threads/strange-incident-server-self-powered-off.131826/#post-580178
 
i am not sure. but the fix i am currently trying out is to disable the power button in the OS.
Did this ever work for you - disabling the power button?

I'm facing this exact issue. It doesn't seem to be a temp issue, CPU temp sits in high 30C to low 40C. I will get the following:
Code:
Nov 27 21:22:27 server systemd-logind[851]: Power key pressed short.
Nov 27 21:22:27 server systemd-logind[851]: Powering off...
Nov 27 21:22:27 server systemd-logind[851]: System is powering down.
Sometimes, it will reboot but most of the time it does not.
Not sure where to look.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!