Proxmox 4.0beta1 Guests don't shutdown cleanly on host shutdown

TrevorP · Sep 7, 2015

I have been testing a Proxmox 4.0 node. With previous versions of proxmox, when the power button on the host is pressed, a shutdown command is issued via terminal, or shutdown initiated via gui, the node would send a acpi message to any running guests, and wait the specified or default timeout for each guest to shut down before shutting down the node.

With V4 the when a shutdown is issued, all vms stop almost immediately, and the node shuts down.
What I think I have determined:
* 'pvesh --nooutput create /nodes/localhost/stopall' will cleanly shut down nodes, so will 'service pve-manager stop'. "Stop All VMs" in the gui works OK.
* pve-manager is not installed in /etc/rcX.d/, something is still causing VMs to start up automatically ( possibly via '/etc/init.d/pve-manager start', possibly another way, I see two instances of kvm ID 100 running when the node starts)
Taken from 'ps -elf' during startup:

Code:

4 S root      1094     1  0  80   0 -  1082 wait   00:29 ?        00:00:00 /bin/sh /etc/init.d/pve-manager start
4 S root      1096  1094 27  80   0 - 81021 poll_s 00:29 ?        00:00:00 /usr/bin/perl /usr/bin/pvesh --nooutput create /nodes/localhost/s
1 S root      1123  1096  0  80   0 - 81251 hrtime 00:29 ?        00:00:00 task UPID:pve2:00000463:00000921:55ED9F53:startall::rootpam:
5 S root      1127  1123  0  80   0 - 84640 poll_s 00:29 ?        00:00:00 task UPID:pve2:00000467:00000925:55ED9F54:qmstart:100:root pam:
6 S root      1140  1127  3  80   0 - 65138 pipe_w 00:29 ?        00:00:00 /usr/bin/kvm -id 100 -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -smbios type=1,uuid=e253ddd4-08c8-4833-9af7-a3c64cc28f93 -name test -smp 8,sockets=1,cores=8,maxcpus=8 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -k en-us -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian:01:512f79e8bf1 -drive file=/var/lib/vz/template/iso/ubuntu-14.04.1-desktop-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive file=/var/lib/vz/images/100/vm-100-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,cache=none,aio=native,detect-zeroes=on -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=22:08:69:1C:02:B1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300

7 R root      1143     1  7  80   0 - 2201354 -    00:29 ?        00:00:00 /usr/bin/kvm -id 100 -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -smbios type=1,uuid=e253ddd4-08c8-4833-9af7-a3c64cc28f93 -name test -smp 8,sockets=1,cores=8,maxcpus=8 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -k en-us -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian:01:512f79e8bf1 -drive file=/var/lib/vz/template/iso/ubuntu-14.04.1-desktop-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive file=/var/lib/vz/images/100/vm-100-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,cache=none,aio=native,detect-zeroes=on -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=22:08:69:1C:02:B1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300

* Adding a rcX.d script to run either 'pvesh --nooutput create /nodes/localhost/stopall' or 'service pve-manager stop' doesn't help. I've confirmed the script runs, the listed commands end immediately. Either the VMs have already been terminated by then, something else runs simultaneously to kill them, or those commands behave differently during shutdown.
Sample file registered with 'update-rc.d pve-manager-test defaults':

Code:

#!/bin/sh

### BEGIN INIT INFO
# Provides:        pve-manager-test
# Required-Start:  $remote_fs pve-firewall
# Required-Stop:   $remote_fs pve-firewall
# Default-Start:  
# Default-Stop:    0 1 6
# Short-Description: PVE VM Manager
### END INIT INFO

. /lib/lsb/init-functions

PATH=/sbin:/bin:/usr/bin:/usr/sbin
DESC="PVE Status Daemon"
PVESH=/usr/bin/pvesh

test -f $PVESH || exit 0

# Include defaults if available
if [ -f /etc/default/pve-manager ] ; then
    . /etc/default/pve-manager
fi

case "$1" in
        start)
                if [ "$START" = "no" ]; then
                    exit 0
                fi
                echo "Starting VMs and Containers"
                pvesh --nooutput create /nodes/localhost/startall
                ;;
        stop)
                echo "Stopping running Backup"
                date >> /testfile
                echo "Stopping running Backup" >> /testfile
                vzdump -stop
                echo "Stopping VMs and Containers"
                echo "Stopping VMs and Containers" >> /testfile
                pvesh --nooutput create /nodes/localhost/stopall
                service pve-manager stop
                echo "Done." >> /testfile
                ;;
        reload|restart|force-reload)
                # do nothing here
                ;;
        *)
                N=/etc/init.d/$NAME
                echo "Usage: $N {start|stop|reload|restart|force-reload}" >&2
                exit 1
                ;;
esac

exit 0

Am I missing something obvious? Can anyone confirm shutdown behaviour has changed? Does anyone know how to configure nodes to cleanly shut down guests on node shutdown?

dietmar · Sep 7, 2015

TrevorP said:
Am I missing something obvious?

PVE 4.0 uses systemd (whichg start/stop many things in parallel)

TrevorP · Sep 8, 2015

dietmar said:
PVE 4.0 uses systemd (whichg start/stop many things in parallel)

OKay, good call dietmar, I have some more homework to do.
/etc/systemd/system/multi-user.target.wants/pve-manager.service calls '/etc/init.d/pve-manager stop', which is definately called on shutdown.

From /var/log/syslog:

Code:

Sep  8 10:39:45 pve2 zed[893]: Exiting
Sep  8 10:39:45 pve2 pve-manager[2932]: Stopping running Backup
Sep  8 10:39:45 pve2 rrdcached[913]: caught SIGTERM
Sep  8 10:39:45 pve2 rrdcached[913]: starting shutdown
Sep  8 10:39:45 pve2 postfix/master[1053]: terminating on signal 15
Sep  8 10:39:45 pve2 postfix[2936]: Stopping Postfix Mail Transport Agent: postfix.
Sep  8 10:39:45 pve2 fusermount[2992]: /bin/fusermount: failed to unmount /var/lib/lxcfs: Invalid argument
Sep  8 10:39:45 pve2 kernel: [  214.698836] vmbr0: port 2(tap100i0) entered disabled state
Sep  8 10:39:45 pve2 rrdcached[913]: clean shutdown; all RRDs flushed
Sep  8 10:39:45 pve2 rrdcached[913]: removing journals
Sep  8 10:39:45 pve2 rrdcached[913]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1441672588.166653
Sep  8 10:39:45 pve2 rrdcached[913]: goodbye
Sep  8 10:39:45 pve2 pvestatd[1075]: received signal TERM
Sep  8 10:39:45 pve2 pvestatd[1075]: server closing
Sep  8 10:39:45 pve2 pvestatd[1075]: server stopped
Sep  8 10:39:45 pve2 pve-firewall[1066]: received signal TERM
Sep  8 10:39:45 pve2 pve-firewall[1066]: server closing
Sep  8 10:39:45 pve2 pve-firewall[1066]: clear firewall rules
Sep  8 10:39:45 pve2 pve-manager[2932]: Stopping VMs and Containers
Sep  8 10:39:45 pve2 pve-ha-lrm[1087]: received signal TERM
Sep  8 10:39:45 pve2 pve-ha-lrm[1087]: server stopped
Sep  8 10:39:45 pve2 pve-firewall[1066]: server stopped
Sep  8 10:39:45 pve2 spiceproxy[1127]: received signal TERM
Sep  8 10:39:45 pve2 spiceproxy[1127]: server closing
Sep  8 10:39:45 pve2 spiceproxy[1128]: worker exit
Sep  8 10:39:45 pve2 spiceproxy[1127]: worker 1128 finished
Sep  8 10:39:45 pve2 spiceproxy[1127]: server stopped
Sep  8 10:39:46 pve2 rrdcached[2940]: Stopping RRDtool data caching daemon: rrdcached.
Sep  8 10:39:46 pve2 pvesh: <root pam> starting task UPID:pve2:00000BCB:0000544D:55EE2E52:stopall::root pam:
Sep  8 10:39:46 pve2 pvesh: <root pam> end task UPID:pve2:00000BCB:0000544D:55EE2E52:stopall::root pam: OK
Sep  8 10:39:46 pve2 hwclock[2920]: hwclock from util-linux 2.25.2
Sep  8 10:39:46 pve2 hwclock[2920]: Using the /dev interface to the clock.
Sep  8 10:39:46 pve2 hwclock[2920]: Last drift adjustment done at 1441672433 seconds after 1969
Sep  8 10:39:46 pve2 hwclock[2920]: Last calibration done at 1441672433 seconds after 1969
Sep  8 10:39:46 pve2 hwclock[2920]: Hardware clock is on UTC time
Sep  8 10:39:46 pve2 hwclock[2920]: Assuming hardware clock is kept in UTC time.
Sep  8 10:39:46 pve2 hwclock[2920]: Waiting for clock tick...
Sep  8 10:39:46 pve2 hwclock[2920]: ...got clock tick
Sep  8 10:39:46 pve2 hwclock[2920]: Time read from Hardware Clock: 2015/09/08 00:39:46
Sep  8 10:39:46 pve2 hwclock[2920]: Hw clock time : 2015/09/08 00:39:46 = 1441672786 seconds since 1969
Sep  8 10:39:46 pve2 hwclock[2920]: missed it - 1441672785.992635 is too far past 1441672785.500000 (0.492635 > 0.001000)
Sep  8 10:39:46 pve2 hwclock[2920]: 1441672786.500000 is close enough to 1441672786.500000 (0.000000 < 0.002000)
Sep  8 10:39:46 pve2 hwclock[2920]: Set RTC to 1441672786 (1441672785 + 1; refsystime = 1441672785.000000)
Sep  8 10:39:46 pve2 hwclock[2920]: Setting Hardware Clock to 00:39:46 = 1441672786 seconds since 1969
Sep  8 10:39:46 pve2 hwclock[2920]: ioctl(RTC_SET_TIME) was successful.
Sep  8 10:39:46 pve2 hwclock[2920]: Not adjusting drift factor because it has been less than a day since the last calibration.
Sep  8 10:39:46 pve2 pvepw-logger[640]: received terminate request (signal)
Sep  8 10:39:46 pve2 pvepw-logger[640]: stopping pvefw logger
Sep  8 10:39:47 pve2 pve-ha-crm[1084]: received signal TERM
Sep  8 10:39:47 pve2 pve-ha-crm[1084]: server received shutdown request
Sep  8 10:39:47 pve2 pve-ha-crm[1084]: server stopped
Sep  8 10:39:47 pve2 pveproxy[1086]: received signal TERM
Sep  8 10:39:47 pve2 pveproxy[1086]: server closing
Sep  8 10:39:47 pve2 pveproxy[1088]: worker exit
Sep  8 10:39:47 pve2 pveproxy[1086]: worker 1088 finished
Sep  8 10:39:48 pve2 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="922" x-info="http www rsyslog com"] exiting on signal 15.

Running any of the following does not cause the VMs to stop:

Code:

systemctl stop zed
systemctl stop rrdcached
systemctl stop postfix
systemctl stop pvestatd
systemctl stop pve-firewall

I'll take a closer look at systemd and the proxmox config, see if I can come up with anything more useful.

Thanks.

Gert · Nov 20, 2015

Did you make any progress on this?

dietmar · Nov 21, 2015

The command is stop all VMs is:

# systemctl stop pve-manager

Taras_ · Nov 21, 2015

dietmar said:
The command is stop all VMs is:

# systemctl stop pve-manager

Dietmar, do you mean it's not a bug - https://bugzilla.proxmox.com/show_bug.cgi?id=783 ?

dietmar · Nov 21, 2015

Taras_ said:
Dietmar, do you mean it's not a bug - https://bugzilla.proxmox.com/show_bug.cgi?id=783 ?

Works for me. Also, this bugzilla bug and the bug reported in this thread seem to be something totally different (no shutdown vs. immediate shutdown).

Taras_ · Nov 21, 2015

dietmar said:
Works for me. Also, this bugzilla bug and the bug reported in this thread seem to be something totally different (no shutdown vs. immediate shutdown).

I think it's same bug:
- "Shutdown all VMs" works from GUI;
- pressing power button or "shutdown -P now" doesn't work for VMs (but works for containers).

"Works for me" means "I can reproduce" or "I can't reproduce"?

dietmar · Nov 21, 2015

Taras_ said:
"Works for me" means "I can reproduce" or "I can't reproduce"?

So far I am unable to reproduce the issue.

rinseaid · Nov 22, 2015

I have the same issue, all latest updates are applied. KVM VMs do not shut down cleanly when the host is shutdown.

Search

Search

Proxmox 4.0beta1 Guests don't shutdown cleanly on host shutdown

TrevorP

Active Member

dietmar

Proxmox Staff Member

TrevorP

Active Member

Gert

Member

dietmar

Proxmox Staff Member

Taras_

New Member

dietmar

Proxmox Staff Member

Taras_

New Member

dietmar

Proxmox Staff Member

rinseaid

Renowned Member