Rnadom PowerOFFs on PVE node

ngbatnicdotmil

New Member
Oct 13, 2022
11
1
3
I am experiancing random poweroffs on PVE node, but the syslog won't show any details.
node is powered via a UPS and other nodes on that UPS do not poweroff, the weird thing is that it is set to start on power in BIOS
Any suggestion on locating the cause?

Aug 31 02:34:02 PVE pmxcfs[2770]: [status] notice: received log
Aug 31 02:34:03 PVE pveproxy[1621240]: worker exit
Aug 31 02:34:03 PVE pveproxy[2940]: worker 1621240 finished
Aug 31 02:34:03 PVE pveproxy[2940]: starting 1 worker(s)
Aug 31 02:34:03 PVE pveproxy[2940]: worker 1636780 started
Aug 31 02:34:08 PVE pmxcfs[2770]: [status] notice: received log
Aug 31 02:34:32 PVE pveproxy[1636780]: Clearing outdated entries from certificate cache
Aug 31 02:36:28 PVE NetworkManager[2177]: <info> [1693434988.1008] device (wlp69s0): set-hw-addr: set MAC address to B2:6A:FC:69:92:96 (scanning)
Aug 31 02:39:01 PVE CRON[1637633]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 31 02:39:01 PVE CRON[1637634]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Aug 31 02:39:01 PVE CRON[1637633]: pam_unix(cron:session): session closed for user root
Aug 31 02:39:25 PVE systemd[1]: Starting Clean php session files...
Aug 31 02:39:25 PVE systemd[1]: phpsessionclean.service: Succeeded.
Aug 31 02:39:25 PVE systemd[1]: Finished Clean php session files.
Aug 31 02:42:44 PVE pveproxy[1632083]: Clearing outdated entries from certificate cache
Aug 31 02:43:23 PVE NetworkManager[2177]: <info> [1693435403.0045] device (wlp69s0): set-hw-addr: set MAC address to 16:B4:3E:FA:24:C3 (scanning)
Aug 31 02:43:33 PVE pveproxy[1627141]: Clearing outdated entries from certificate cache
Aug 31 02:47:32 PVE pvedaemon[1545133]: <root@pam> successful auth for user 'root@pam'
Aug 31 02:48:16 PVE smartd[2187]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 71 to 72
Aug 31 02:50:16 PVE NetworkManager[2177]: <info> [1693435816.1005] device (wlp69s0): set-hw-addr: set MAC address to 92:C0:A2:D2:A4:56 (scanning)
Aug 31 02:51:12 PVE pmxcfs[2770]: [status] notice: received log
Aug 31 02:51:15 PVE pmxcfs[2770]: [status] notice: received log
Aug 31 02:57:09 PVE NetworkManager[2177]: <info> [1693436229.1004] device (wlp69s0): set-hw-addr: set MAC address to 16:B5:37:69:41:BE (scanning)
Aug 31 03:02:32 PVE pvedaemon[1540201]: <root@pam> successful auth for user 'root@pam'
Aug 31 03:04:02 PVE NetworkManager[2177]: <info> [1693436642.0448] device (wlp69s0): set-hw-addr: set MAC address to 96:4D:ED:83:FF:41 (scanning)
Aug 31 03:04:34 PVE pveproxy[1636780]: Clearing outdated entries from certificate cache
Aug 31 03:09:01 PVE CRON[1642870]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 31 03:09:01 PVE CRON[1642871]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Aug 31 03:09:01 PVE CRON[1642870]: pam_unix(cron:session): session closed for user root
Aug 31 03:09:25 PVE systemd[1]: Starting Clean php session files...
Aug 31 03:09:25 PVE systemd[1]: phpsessionclean.service: Succeeded.
Aug 31 03:09:25 PVE systemd[1]: Finished Clean php session files.
Aug 31 03:10:01 PVE CRON[1643103]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 31 03:10:01 PVE CRON[1643104]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Aug 31 03:10:01 PVE CRON[1643103]: pam_unix(cron:session): session closed for user root
Aug 31 03:10:55 PVE NetworkManager[2177]: <info> [1693437055.1011] device (wlp69s0): set-hw-addr: set MAC address to 32:E3:C4:1F:22:3D (scanning)
Aug 31 03:12:51 PVE pveproxy[1632083]: Clearing outdated entries from certificate cache
Aug 31 03:13:38 PVE pveproxy[1627141]: Clearing outdated entries from certificate cache
-- Reboot --
Aug 31 11:11:38 PVE kernel: Linux version 5.15.108-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.108-1 (2023-06-17T09:41Z) ()
Aug 31 11:11:38 PVE kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.108-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on
 
Hello,
Could you post journalctl -b -1 after you restarted from a power loss?
 
First I was seeing a lot of Martian source messages back in few months, although it should not be relevant to the reboot I have removed the LXC container generating them.
Now the very last messages are referring to a wireless interface, which in fact is not used, and I have disabled it after this reboot.
 

Attachments

  • journal1.txt
    461.5 KB · Views: 3
Is there a reason you have a NetworkManager up and running? I would recommend removing it otherwise.

also what is the output of ipmi?
 
NetworkManager isn't used, I don't remember when/how/why it got there, thanks for the suggestion.
motherboard has no OOB management interface (no IPMI)
 
NetworkManager isn't used, I don't remember when/how/why it got there, thanks for the suggestion.
It's usually installed as a dependency when installing a desktop environment like gnome and can screw up the networking of PVE and therefore should be disabled.
 
Another poweroff happened recently (uptime ~20 days).
 

Attachments

  • journal.txt
    699.2 KB · Views: 2
Last edited:
Hello.

I can not see a boot in your journal. What command did you use to create this journal?

Do I understand "poweroff" correct that the device is suddenly not powered? Or is it more like a freeze situation?

Even if the other node does not shut down, I would not rule out the UPS straight away. Can you try switching the Plugs for the devices or plug the misbehaving node directly without UPS.

EDIT: Since this is a cluster, can you give me the output from both hosts from journalctl --since '2023-09-13' > $(hostname)-report.txt
 
Last edited:
Hi Philipp,

I have trimmed the logged to the last few days as it was pretty large for the forum,

Your understanding is correct the device is suddenly not powered (and no one's around at that time).

Please find the logs attached, PVE is the faulty node.

Thanks for the suggestion I'll also try plugging it directly.
 

Attachments

  • PVE-report.txt
    806.6 KB · Views: 1
  • PVE-S2-report.txt
    475.7 KB · Views: 1

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!