All VMs shutdown after Package update

cdsJerry · Jan 7, 2020

At 4:51pm today my Task list says "Update package database" Then at 17:13 the task list shows VM100 Shutdown, VM101 Shutdown etc. At 17:16 it shows "Stop all VMs and Containers".

What would cause Proxmox to shut down my VMs like that? I didn't issue any such commands. Would the package update have issued shutdown tasks like that leaving all my VMs off?

PS. Have changed the password as a precaution however the old password as complex/secure. It's doubtful it had been cracked.

syfy323 · Jan 8, 2020

Did you enable unattended-upgrades on the host?
It's possible that an upgrade caused this but not the update of repository feeds.
Just browse "journalctl -r" back to 17:13 and you will find your cause.

cdsJerry · Jan 8, 2020

syfy323 said:
Did you enable unattended-upgrades on the host?
It's possible that an upgrade caused this but not the update of repository feeds.
Just browse "journalctl -r" back to 17:13 and you will find your cause.

I can only go back as far as 17:21 It looks like it may have been rebooted then?

Where can I check to see if unattended-upgrades is set?
I do see on the Task list is shows

Jan 07 -3:52 Update package database

Jan 06 04:51 Update package database

Jan 05 04:10 Update package database

So maybe it is doing updates. But that's not when the problems started. They started at 17:13.

Also, at 17:22:09 there's a Start all VMs and Containers task. I didn't create that one either.

Code:

Jan 07 17:22:14 pve systemd[1]: Reached target Host and Network Name Lookups.
Jan 07 17:22:14 pve systemd[1]: Starting Preprocess NFS configuration...
Jan 07 17:22:14 pve systemd[1]: /lib/systemd/system/rpc-statd.service:13: PIDFile= references path below legacy directory /var/run/, updating /var/
Jan 07 17:22:14 pve kernel: Key type id_legacy registered
Jan 07 17:22:14 pve kernel: Key type id_resolver registered
Jan 07 17:22:14 pve kernel: NFS: Registering the id_resolver key type
Jan 07 17:22:13 pve kernel: FS-Cache: Netfs 'nfs' registered for caching
Jan 07 17:22:13 pve kernel: FS-Cache: Loaded
Jan 07 17:22:09 pve systemd[1]: Startup finished in 27.032s (kernel) + 16.936s (userspace) = 43.969s.
Jan 07 17:22:09 pve systemd[1]: Started Update UTMP about System Runlevel Changes.
Jan 07 17:22:09 pve systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jan 07 17:22:09 pve systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jan 07 17:22:09 pve systemd[1]: Reached target Graphical Interface.
Jan 07 17:22:09 pve systemd[1]: Reached target Multi-User System.
Jan 07 17:22:09 pve systemd[1]: Started PVE guests.
Jan 07 17:22:09 pve pve-guests[1429]: <root@pam> end task UPID:pve:00000596:0000111B:5E150491:startall::root@pam: OK
Jan 07 17:22:09 pve pve-guests[1429]: <root@pam> starting task UPID:pve:00000596:0000111B:5E150491:startall::root@pam:
Jan 07 17:22:08 pve systemd[1]: Starting PVE guests...
Jan 07 17:22:08 pve systemd[1]: Started PVE Local HA Resource Manager Daemon.
Jan 07 17:22:08 pve pve-ha-lrm[1427]: status change startup => wait_for_agent_lock
Jan 07 17:22:08 pve pve-ha-lrm[1427]: starting server
Jan 07 17:22:07 pve systemd[1]: Started PVE SPICE Proxy Server.
Jan 07 17:22:07 pve spiceproxy[1425]: worker 1426 started
Jan 07 17:22:07 pve spiceproxy[1425]: starting 1 worker(s)
Jan 07 17:22:07 pve spiceproxy[1425]: starting server
Jan 07 17:22:07 pve systemd[1]: Starting PVE Local HA Resource Manager Daemon...
Jan 07 17:22:07 pve systemd[1]: Starting PVE SPICE Proxy Server...
Jan 07 17:22:07 pve systemd[1]: Started PVE API Proxy Server.
Jan 07 17:22:07 pve pveproxy[1416]: worker 1419 started
Jan 07 17:22:07 pve pveproxy[1416]: worker 1418 started
Jan 07 17:22:07 pve pveproxy[1416]: worker 1417 started
Jan 07 17:22:07 pve pveproxy[1416]: starting 3 worker(s)
Jan 07 17:22:07 pve pveproxy[1416]: starting server
Jan 07 17:22:05 pve systemd[1]: Started PVE Cluster Resource Manager Daemon.
Jan 07 17:22:05 pve pve-ha-crm[1415]: status change startup => wait_for_quorum
Jan 07 17:22:05 pve pve-ha-crm[1415]: starting server
Jan 07 17:22:04 pve systemd[1]: Starting PVE Cluster Resource Manager Daemon...
Jan 07 17:22:04 pve systemd[1]: Starting PVE API Proxy Server...
Jan 07 17:22:04 pve systemd[1]: Started PVE API Daemon.
Jan 07 17:22:04 pve pvedaemon[1407]: worker 1410 started
Jan 07 17:22:04 pve pvedaemon[1407]: worker 1409 started
Jan 07 17:22:04 pve pvedaemon[1407]: worker 1408 started
Jan 07 17:22:04 pve pvedaemon[1407]: starting 3 worker(s)
Jan 07 17:22:04 pve pvedaemon[1407]: starting server
Jan 07 17:22:03 pve systemd[1]: Started PVE Status Daemon.
Jan 07 17:22:03 pve systemd[1]: Started Proxmox VE replication runner.
Jan 07 17:22:03 pve systemd[1]: pvesr.service: Succeeded.
Jan 07 17:22:03 pve systemd[1]: Started Proxmox VE firewall.
Jan 07 17:22:03 pve pvestatd[1386]: starting server
Jan 07 17:22:03 pve pve-firewall[1382]: starting server
Jan 07 17:22:01 pve kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready
Jan 07 17:22:01 pve kernel: vmbr1: port 1(eno2) entered forwarding state
Jan 07 17:22:01 pve kernel: vmbr1: port 1(eno2) entered blocking state
Jan 07 17:22:01 pve kernel: tg3 0000:01:00.1 eno2: EEE is disabled
Jan 07 17:22:01 pve kernel: tg3 0000:01:00.1 eno2: Flow control is off for TX and off for RX
Jan 07 17:22:01 pve kernel: tg3 0000:01:00.1 eno2: Link is up at 1000 Mbps, full duplex
Jan 07 17:22:01 pve cron[1376]: (CRON) INFO (Running @reboot jobs)
Jan 07 17:22:01 pve cron[1376]: (CRON) INFO (pidfile fd = 3)
Jan 07 17:22:01 pve systemd[1]: Starting Proxmox VE firewall...

syfy323 · Jan 8, 2020

You can check if you have this file on your system:
/etc/apt/apt.conf.d/20auto-upgrades

If yes, check it's content against this article:
https://wiki.debian.org/UnattendedUpgrades

You should definitely enable persistent logs:
https://gist.github.com/JPvRiel/b7c185833da32631fa6ce65b40836887

Logs will be rotated but without that dir, all logs will be lost on reboot.

This won't help for previous logs but might help in the future.

oguz · Jan 8, 2020

hi,

cdsJerry said:
"Update package database"

this just updates the database. it doesn't upgrade any packages.

cdsJerry said:
Also, at 17:22:09 there's a Start all VMs and Containers task. I didn't create that one either.

that shows up during bootup (for starting VMs that are set to start on boot)

so both of these are completely normal and most likely have nothing to do with the reboot cause

cdsJerry said:
What would cause Proxmox to shut down my VMs like that? I didn't issue any such commands.

check your syslog files in /var/log/
those usually go further than journalctl

cdsJerry · Jan 8, 2020

oguz said:
check your syslog files in /var/log/
those usually go further than journalctl

I dug through the file. There are a lot of time-outs on the Guest-ping which makes sense. I can't get that service working correctly on a few VMs.

I didn't see anything odd at 13:06 that caught my eye but later I see it doing all the shut downs. I just don't see who or what did them. Here's a cleaned version of the log (I removed a lot of the lines that looked normal, otherwise it won't let me post it.

Sorry for the slow response but I'm still trying to figure out how he got past my pfSense firewall plus my router firewall.

Code:

[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: 104.scope: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped 104.scope.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: systemd-rfkill.socket: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: 100.scope: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped 100.scope.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: 105.scope: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped 105.scope.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping Session 347 of user root.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping LVM event activation on device 8:3...[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping Availability of block devices...[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping NFS status monitor for NFSv2/3 locking....[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: lvm2-lvmpolld.socket: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Closed LVM2 poll daemon socket.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped target RPC Port Mapper.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: 101.scope: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped 101.scope.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Removed slice qemu.slice.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped target Timers.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: apt-daily-upgrade.timer: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped Daily apt upgrade and clean activities.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: man-db.timer: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped Daily rotation of log files.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped Daily PVE download activities.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: systemd-tmpfiles-clean.timer: Succeeded.s.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopping Session 460 of user root.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Unmounting RPC Pipe File System...[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve systemd[1]: Stopped target Graphical Interface.[/INDENT]
[INDENT=2]Jan  7 17:13:05 pve smartd[843]: smartd received signal 15: Terminated[/INDENT]
[INDENT=2][/INDENT]
[INDENT=2][/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.569623] vmbr0: port 5(fwpr100p0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.570050] device fwln100i0 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.570057] fwbr100i0: port 1(fwln100i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.613850] device fwpr100p0 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548454.613856] vmbr0: port 5(fwpr100p0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:24 pve kernel: [1548455.528436] fwbr100i1: port 2(tap100i1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.573488] fwbr100i1: port 1(fwln100i1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.573711] vmbr1: port 2(fwpr100p1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.574140] device fwln100i1 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.574144] fwbr100i1: port 1(fwln100i1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.601710] device fwpr100p1 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:25 pve kernel: [1548455.601715] vmbr1: port 2(fwpr100p1) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:26 pve pve-guests[22736]: end task UPID:pve:000058D6:093ABBE5:5E150274:qmshutdown:100:root@pam:[/INDENT]
[INDENT=2]Jan  7 17:13:28 pve pvedaemon[11028]: VM 105 qmp command failed - VM 105 qmp command 'guest-ping' failed - got timeout[/INDENT]
[INDENT=2]Jan  7 17:13:49 pve kernel: [1548480.543346] fwbr101i0: port 2(tap101i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.572120] fwbr101i0: port 1(fwln101i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.572208] vmbr0: port 3(fwpr101p0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.572603] device fwln101i0 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.572606] fwbr101i0: port 1(fwln101i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.597497] device fwpr101p0 left promiscuous mode[/INDENT]
[INDENT=2]Jan  7 17:13:50 pve kernel: [1548480.597499] vmbr0: port 3(fwpr101p0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:13:51 pve pve-guests[22736]: end task UPID:pve:000058D5:093ABBE4:5E150274:qmshutdown:101:root@pam:[/INDENT]
[INDENT=2]Jan  7 17:16:08 pve kernel: [1548619.519244] vmbr0: port 4(tap104i0) entered disabled state[/INDENT]
[INDENT=2]Jan  7 17:16:10 pve pve-guests[22736]: end task UPID:pve:000058D4:093ABBE3:5E150274:qmshutdown:104:root@pam:[/INDENT]
[INDENT=2]Jan  7 17:16:19 pve pve-guests[22736]: end task UPID:pve:000058D3:093ABBE1:5E150274:qmshutdown:105:root@pam:[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve pve-guests[22736]: all VMs and CTs stopped[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve pve-guests[22730]: <root@pam> end task UPID:pve:000058D0:093ABBDD:5E150274:stopall::root@pam: OK[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: pve-guests.service: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopped PVE guests.[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopping PVE Local HA Resource Manager Daemon...[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopping PVE Status Daemon...[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopping PVE SPICE Proxy Server...[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve systemd[1]: Stopping Proxmox VE firewall...[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve spiceproxy[1272]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve spiceproxy[1272]: server closing[/INDENT]
[INDENT=2]Jan  7 17:16:20 pve spiceproxy[1272]: server stopped[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pvestatd[1233]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pvestatd[1233]: server closing[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pvestatd[1233]: server stopped[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-firewall[1231]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-firewall[1231]: server closing[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-firewall[1231]: clear firewall rules[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-ha-lrm[1274]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve pve-firewall[1231]: server stopped[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve systemd[1]: spiceproxy.service: Succeeded.[/INDENT]
[INDENT=2]Jan  7 17:16:21 pve systemd[1]: Stopped PVE SPICE Proxy Server.[/INDENT]
[INDENT=2]Jan  7 17:16:22 pve pve-ha-lrm[1274]: got shutdown request with shutdown policy 'conditional'[/INDENT]
[INDENT=2]Jan  7 17:16:22 pve pve-ha-lrm[1274]: reboot LRM, stop and freeze all services[/INDENT]
[INDENT=2][/INDENT]
[INDENT=2]Jan  7 17:16:37 pve pveproxy[1265]: received signal TERM[/INDENT]
[INDENT=2]Jan  7 17:16:37 pve pveproxy[1265]: server closing[/INDENT]
[INDENT=2][/INDENT]
[INDENT=2][/INDENT]

syfy323 · Jan 8, 2020

Your log starts AFTER the cause, you need to attach some minutes of log before your snippet.

Actually I don't think it was remotely triggered (which is possible).
Is the server located at a trusted location or is it possible someone pressed CTRL + ALT + DEL, pressed the front button, issued ACPI through IPMI (iRMC, etc.)? Even a backup battery is able to trigger shutdowns using the serial port.

We actually use this in our large DCs to shutdown all nodes in case all power sources fail and backup batteries are running low. In that case, it either looks like someone pressed the power btn or the system received en event on the serial interface.

cdsJerry · Jan 8, 2020

syfy323 said:
Your log starts AFTER the cause, you need to attach some minutes of log before your snippet.

I tried to include more of the log but it kept telling me my message was too long. I kept cutting it and cutting it. I didn't see anything that looked out of usual before where the log started. There were no commands showing that weren't there for hours.

The second event seemed more interesting. There were a lot more commands, mostly shutting down etc. but I had to limit what I posted due to size.
The server is in a secure location. No one else has access to it. There's no way someone did a shut down from there. And I'm the only one with a log-in to the unit. We're just a two-man shop so there aren't a lot of suspects. <g>

syfy323 · Jan 8, 2020

Do you run one or multiple hosts?

cdsJerry · Jan 9, 2020

The server is at a trusted location. No one else has access to the area. I looked at the logs prior to what's posted but I didn't see anything unusual. I couldn't post more of the log because I was getting an error saying my message was too big. It was the same commands (mostly unable to connect to the agent, which for some reason only works on about half my VMs). There wasn't a trigger listed.

And I don't have a power shutdown on that system. It's on an UPS but my limited knowledge of linux has kept me from installing any sort of shut down on the server.

syfy323 · Jan 9, 2020

Did you wire the UPS using a serial cable to the server?

cdsJerry · Jan 9, 2020

syfy323 said:
Did you wire the UPS using a serial cable to the server?

It was connected to my old Proxmox server but it never did anything. I didn't connect it to the new one since it wasn't doing anything. I never installed software to talk to it.

syfy323 · Jan 9, 2020

Ok, thought it might have been installed as a dependency or does not need any daemon anymore.

I have no more guesses but you can set up an external log collector server (rsyslog) in case you need to dig further.
It will be able to log everything until the point, where network goes down. As it's an external device, logs can't get removed on the machine, either by an intruder or just logrotate or the journalctl cleanup.

oguz · Jan 9, 2020

cdsJerry said:
I tried to include more of the log but it kept telling me my message was too long. I kept cutting it and cutting it. I didn't see anything that looked out of usual before where the log started. There were no commands showing that weren't there for hours.

you can attach it as a file (as txt for example)

Search

Search

All VMs shutdown after Package update

cdsJerry

Renowned Member

syfy323

Member

cdsJerry

Renowned Member

Attachments

syfy323

Member

oguz

Proxmox Retired Staff

cdsJerry

Renowned Member

syfy323

Member

cdsJerry

Renowned Member

syfy323

Member

cdsJerry

Renowned Member

syfy323

Member

cdsJerry

Renowned Member

syfy323

Member

oguz

Proxmox Retired Staff