All VMs shutdown after Package update

cdsJerry

Renowned Member
Sep 12, 2011
218
8
83
At 4:51pm today my Task list says "Update package database" Then at 17:13 the task list shows VM100 Shutdown, VM101 Shutdown etc. At 17:16 it shows "Stop all VMs and Containers".

What would cause Proxmox to shut down my VMs like that? I didn't issue any such commands. Would the package update have issued shutdown tasks like that leaving all my VMs off?

PS. Have changed the password as a precaution however the old password as complex/secure. It's doubtful it had been cracked.
 
Did you enable unattended-upgrades on the host?
It's possible that an upgrade caused this but not the update of repository feeds.
Just browse "journalctl -r" back to 17:13 and you will find your cause.
 
Did you enable unattended-upgrades on the host?
It's possible that an upgrade caused this but not the update of repository feeds.
Just browse "journalctl -r" back to 17:13 and you will find your cause.
I can only go back as far as 17:21 It looks like it may have been rebooted then?

Where can I check to see if unattended-upgrades is set?
I do see on the Task list is shows
Jan 07 -3:52 Update package database​
Jan 06 04:51 Update package database​
Jan 05 04:10 Update package database​
So maybe it is doing updates. But that's not when the problems started. They started at 17:13.

Also, at 17:22:09 there's a Start all VMs and Containers task. I didn't create that one either.

Code:
Jan 07 17:22:14 pve systemd[1]: Reached target Host and Network Name Lookups.
Jan 07 17:22:14 pve systemd[1]: Starting Preprocess NFS configuration...
Jan 07 17:22:14 pve systemd[1]: /lib/systemd/system/rpc-statd.service:13: PIDFile= references path below legacy directory /var/run/, updating /var/
Jan 07 17:22:14 pve kernel: Key type id_legacy registered
Jan 07 17:22:14 pve kernel: Key type id_resolver registered
Jan 07 17:22:14 pve kernel: NFS: Registering the id_resolver key type
Jan 07 17:22:13 pve kernel: FS-Cache: Netfs 'nfs' registered for caching
Jan 07 17:22:13 pve kernel: FS-Cache: Loaded
Jan 07 17:22:09 pve systemd[1]: Startup finished in 27.032s (kernel) + 16.936s (userspace) = 43.969s.
Jan 07 17:22:09 pve systemd[1]: Started Update UTMP about System Runlevel Changes.
Jan 07 17:22:09 pve systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jan 07 17:22:09 pve systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jan 07 17:22:09 pve systemd[1]: Reached target Graphical Interface.
Jan 07 17:22:09 pve systemd[1]: Reached target Multi-User System.
Jan 07 17:22:09 pve systemd[1]: Started PVE guests.
Jan 07 17:22:09 pve pve-guests[1429]: <root@pam> end task UPID:pve:00000596:0000111B:5E150491:startall::root@pam: OK
Jan 07 17:22:09 pve pve-guests[1429]: <root@pam> starting task UPID:pve:00000596:0000111B:5E150491:startall::root@pam:
Jan 07 17:22:08 pve systemd[1]: Starting PVE guests...
Jan 07 17:22:08 pve systemd[1]: Started PVE Local HA Resource Manager Daemon.
Jan 07 17:22:08 pve pve-ha-lrm[1427]: status change startup => wait_for_agent_lock
Jan 07 17:22:08 pve pve-ha-lrm[1427]: starting server
Jan 07 17:22:07 pve systemd[1]: Started PVE SPICE Proxy Server.
Jan 07 17:22:07 pve spiceproxy[1425]: worker 1426 started
Jan 07 17:22:07 pve spiceproxy[1425]: starting 1 worker(s)
Jan 07 17:22:07 pve spiceproxy[1425]: starting server
Jan 07 17:22:07 pve systemd[1]: Starting PVE Local HA Resource Manager Daemon...
Jan 07 17:22:07 pve systemd[1]: Starting PVE SPICE Proxy Server...
Jan 07 17:22:07 pve systemd[1]: Started PVE API Proxy Server.
Jan 07 17:22:07 pve pveproxy[1416]: worker 1419 started
Jan 07 17:22:07 pve pveproxy[1416]: worker 1418 started
Jan 07 17:22:07 pve pveproxy[1416]: worker 1417 started
Jan 07 17:22:07 pve pveproxy[1416]: starting 3 worker(s)
Jan 07 17:22:07 pve pveproxy[1416]: starting server
Jan 07 17:22:05 pve systemd[1]: Started PVE Cluster Resource Manager Daemon.
Jan 07 17:22:05 pve pve-ha-crm[1415]: status change startup => wait_for_quorum
Jan 07 17:22:05 pve pve-ha-crm[1415]: starting server
Jan 07 17:22:04 pve systemd[1]: Starting PVE Cluster Resource Manager Daemon...
Jan 07 17:22:04 pve systemd[1]: Starting PVE API Proxy Server...
Jan 07 17:22:04 pve systemd[1]: Started PVE API Daemon.
Jan 07 17:22:04 pve pvedaemon[1407]: worker 1410 started
Jan 07 17:22:04 pve pvedaemon[1407]: worker 1409 started
Jan 07 17:22:04 pve pvedaemon[1407]: worker 1408 started
Jan 07 17:22:04 pve pvedaemon[1407]: starting 3 worker(s)
Jan 07 17:22:04 pve pvedaemon[1407]: starting server
Jan 07 17:22:03 pve systemd[1]: Started PVE Status Daemon.
Jan 07 17:22:03 pve systemd[1]: Started Proxmox VE replication runner.
Jan 07 17:22:03 pve systemd[1]: pvesr.service: Succeeded.
Jan 07 17:22:03 pve systemd[1]: Started Proxmox VE firewall.
Jan 07 17:22:03 pve pvestatd[1386]: starting server
Jan 07 17:22:03 pve pve-firewall[1382]: starting server
Jan 07 17:22:01 pve kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready
Jan 07 17:22:01 pve kernel: vmbr1: port 1(eno2) entered forwarding state
Jan 07 17:22:01 pve kernel: vmbr1: port 1(eno2) entered blocking state
Jan 07 17:22:01 pve kernel: tg3 0000:01:00.1 eno2: EEE is disabled
Jan 07 17:22:01 pve kernel: tg3 0000:01:00.1 eno2: Flow control is off for TX and off for RX
Jan 07 17:22:01 pve kernel: tg3 0000:01:00.1 eno2: Link is up at 1000 Mbps, full duplex
Jan 07 17:22:01 pve cron[1376]: (CRON) INFO (Running @reboot jobs)
Jan 07 17:22:01 pve cron[1376]: (CRON) INFO (pidfile fd = 3)
Jan 07 17:22:01 pve systemd[1]: Starting Proxmox VE firewall...
 

Attachments

  • Proxmox snip.JPG
    Proxmox snip.JPG
    67.8 KB · Views: 5
hi,

"Update package database"

this just updates the database. it doesn't upgrade any packages.

Also, at 17:22:09 there's a Start all VMs and Containers task. I didn't create that one either.
that shows up during bootup (for starting VMs that are set to start on boot)

so both of these are completely normal and most likely have nothing to do with the reboot cause

What would cause Proxmox to shut down my VMs like that? I didn't issue any such commands.

check your syslog files in /var/log/
those usually go further than journalctl
 
check your syslog files in /var/log/
those usually go further than journalctl

I dug through the file. There are a lot of time-outs on the Guest-ping which makes sense. I can't get that service working correctly on a few VMs.

I didn't see anything odd at 13:06 that caught my eye but later I see it doing all the shut downs. I just don't see who or what did them. Here's a cleaned version of the log (I removed a lot of the lines that looked normal, otherwise it won't let me post it.

Sorry for the slow response but I'm still trying to figure out how he got past my pfSense firewall plus my router firewall.

Code:
[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: 104.scope: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped 104.scope.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: systemd-rfkill.socket: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: 100.scope: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped 100.scope.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: 105.scope: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped 105.scope.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping Session 347 of user root.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping LVM event activation on device 8:3...[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping Availability of block devices...[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping NFS status monitor for NFSv2/3 locking....[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: lvm2-lvmpolld.socket: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Closed LVM2 poll daemon socket.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped target RPC Port Mapper.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: 101.scope: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped 101.scope.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Removed slice qemu.slice.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped target Timers.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: apt-daily-upgrade.timer: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped Daily apt upgrade and clean activities.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: man-db.timer: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped Daily rotation of log files.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped Daily PVE download activities.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: systemd-tmpfiles-clean.timer: Succeeded.s.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping Session 460 of user root.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Unmounting RPC Pipe File System...[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped target Graphical Interface.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve smartd[843]: smartd received signal 15: Terminated[/INDENT]
[INDENT=2][/INDENT]
[INDENT=2][/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.569623] vmbr0: port 5(fwpr100p0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.570050] device fwln100i0 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.570057] fwbr100i0: port 1(fwln100i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.613850] device fwpr100p0 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.613856] vmbr0: port 5(fwpr100p0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548455.528436] fwbr100i1: port 2(tap100i1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.573488] fwbr100i1: port 1(fwln100i1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.573711] vmbr1: port 2(fwpr100p1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.574140] device fwln100i1 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.574144] fwbr100i1: port 1(fwln100i1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.601710] device fwpr100p1 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.601715] vmbr1: port 2(fwpr100p1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:26 pve pve-guests[22736]: end task UPID:pve:000058D6:093ABBE5:5E150274:qmshutdown:100:root@pam:[/INDENT]
[INDENT=2]Jan  7 17:13:28 pve pvedaemon[11028]: VM 105 qmp command failed - VM 105 qmp command 'guest-ping' failed - got timeout[/INDENT]
[INDENT=2]Jan  7 17:13:49 pve kernel: [1548480.543346] fwbr101i0: port 2(tap101i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.572120] fwbr101i0: port 1(fwln101i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.572208] vmbr0: port 3(fwpr101p0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.572603] device fwln101i0 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.572606] fwbr101i0: port 1(fwln101i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.597497] device fwpr101p0 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.597499] vmbr0: port 3(fwpr101p0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:51 pve pve-guests[22736]: end task UPID:pve:000058D5:093ABBE4:5E150274:qmshutdown:101:root@pam:[/INDENT]
[INDENT=2]Jan  7 17:16:08 pve kernel: [1548619.519244] vmbr0: port 4(tap104i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:16:10 pve pve-guests[22736]: end task UPID:pve:000058D4:093ABBE3:5E150274:qmshutdown:104:root@pam:[/INDENT]
[INDENT=2]Jan  7 17:16:19 pve pve-guests[22736]: end task UPID:pve:000058D3:093ABBE1:5E150274:qmshutdown:105:root@pam:[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve pve-guests[22736]: all VMs and CTs stopped[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve pve-guests[22730]: <root@pam> end task UPID:pve:000058D0:093ABBDD:5E150274:stopall::root@pam: OK[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: pve-guests.service: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopped PVE guests.[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopping PVE Local HA Resource Manager Daemon...[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopping PVE Status Daemon...[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopping PVE SPICE Proxy Server...[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopping Proxmox VE firewall...[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve spiceproxy[1272]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve spiceproxy[1272]: server closing[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve spiceproxy[1272]: server stopped[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pvestatd[1233]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pvestatd[1233]: server closing[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pvestatd[1233]: server stopped[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-firewall[1231]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-firewall[1231]: server closing[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-firewall[1231]: clear firewall rules[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-ha-lrm[1274]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-firewall[1231]: server stopped[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve systemd[1]: spiceproxy.service: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve systemd[1]: Stopped PVE SPICE Proxy Server.[/INDENT]
[INDENT=2]Jan  7 17:16:22 pve pve-ha-lrm[1274]: got shutdown request with shutdown policy 'conditional'[/INDENT]
[INDENT=2]Jan  7 17:16:22 pve pve-ha-lrm[1274]: reboot LRM, stop and freeze all services[/INDENT]
[INDENT=2][/INDENT]
[INDENT=2]Jan  7 17:16:37 pve pveproxy[1265]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:37 pve pveproxy[1265]: server closing[/INDENT]
[INDENT=2][/INDENT]
[INDENT=2][/INDENT]
 
Your log starts AFTER the cause, you need to attach some minutes of log before your snippet.

Actually I don't think it was remotely triggered (which is possible).
Is the server located at a trusted location or is it possible someone pressed CTRL + ALT + DEL, pressed the front button, issued ACPI through IPMI (iRMC, etc.)? Even a backup battery is able to trigger shutdowns using the serial port.

We actually use this in our large DCs to shutdown all nodes in case all power sources fail and backup batteries are running low. In that case, it either looks like someone pressed the power btn or the system received en event on the serial interface.
 
Your log starts AFTER the cause, you need to attach some minutes of log before your snippet.

I tried to include more of the log but it kept telling me my message was too long. I kept cutting it and cutting it. I didn't see anything that looked out of usual before where the log started. There were no commands showing that weren't there for hours.

The second event seemed more interesting. There were a lot more commands, mostly shutting down etc. but I had to limit what I posted due to size.
The server is in a secure location. No one else has access to it. There's no way someone did a shut down from there. And I'm the only one with a log-in to the unit. We're just a two-man shop so there aren't a lot of suspects. <g>
 
The server is at a trusted location. No one else has access to the area. I looked at the logs prior to what's posted but I didn't see anything unusual. I couldn't post more of the log because I was getting an error saying my message was too big. It was the same commands (mostly unable to connect to the agent, which for some reason only works on about half my VMs). There wasn't a trigger listed.

And I don't have a power shutdown on that system. It's on an UPS but my limited knowledge of linux has kept me from installing any sort of shut down on the server.
 
Did you wire the UPS using a serial cable to the server?
It was connected to my old Proxmox server but it never did anything. I didn't connect it to the new one since it wasn't doing anything. I never installed software to talk to it.
 
Ok, thought it might have been installed as a dependency or does not need any daemon anymore.

I have no more guesses but you can set up an external log collector server (rsyslog) in case you need to dig further.
It will be able to log everything until the point, where network goes down. As it's an external device, logs can't get removed on the machine, either by an intruder or just logrotate or the journalctl cleanup.
 
I tried to include more of the log but it kept telling me my message was too long. I kept cutting it and cutting it. I didn't see anything that looked out of usual before where the log started. There were no commands showing that weren't there for hours.

you can attach it as a file (as txt for example)
 
  • Like
Reactions: cdsJerry

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!