shutdown of node shuts down VM

mir

Famous Member
Apr 14, 2012
3,568
127
133
Copenhagen, Denmark
Hi all,

I am facing a problem which seems to be introduced in pve-2.2 since this problem have not been there before.

Synopsis.
When a node is shut down the following is expected:
1) Every HA enabled VM is migrated to other node
2) Every running non-HA VM is instructed to shutdown
3) When 1 and 2 is accomplished the shutdown of the node continues

What I experience:
1) Every VM is instructed to shutdown
2) The node continues the shutdown process without waiting for VM's to shutdown

Screen shots of a failing HA VM attached

Screenshot - 2012-11-12 - 04:19:18.png

Screenshot - 2012-11-12 - 04:19:36.png

Isn't the above indication of a bug or have I missed something?

pveversion -v
pve-manager: 2.2-26 (pve-manager/2.2/c1614c8c)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-80
pve-kernel-2.6.32-16-pve: 2.6.32-80
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-1
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-28
qemu-server: 2.0-64
pve-firmware: 1.0-21
libpve-common-perl: 1.0-37
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-34
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1
 
Just a question. Does pve mount iSCSI and NFS mounts using the option _netdev?

If not then the network will be brought down before these mounts are unmounted.
 
HA enabed VM´s will not migrate to other hosts, this is not yet implemented. you should do this manually before you reboot a node.
 
Ok. So automatic migration only happens if the node is brought down via fencing?

What about the other part of my problem that the network is brought down before the VM's has shut down and before the mount points has been cleanly unmounted (mount option _netdev)?
 
no, there in never a automatic migration. HA checks if the VM/CT is running, if not the ressource manager starts the VM/CT on one of the remaining hosts.

a "HA maintenance mode" is planned (see bugzilla). and make sure that you never loose quorum, always use fully redundant cluster network connection.

if you have cluster network or fencing problems, HA does not work as expected.
 
What about the other part of my problem that the network is brought down before the VM's has shut down and before the mount points has been cleanly unmounted (mount option _netdev)?

We do not use fstab, so _netdev is useless. But I agree there is a bug, seem shutdown does not wait long enough - will investigate further.
 
Please can you post the contents of the corresponding VM shutdown task log? How long does it take to shutdown that VM? (maybe we run into a timeout)
 
Just a question. Does pve mount iSCSI and NFS mounts using the option _netdev?

If not then the network will be brought down before these mounts are unmounted.

NFS is unmounted with umountnfs.sh, so there is no need to set _netdev for NFS?

Also, we do not mount iSCSI volumes (we use iscsi luns directly).

Do I miss something here?
 
Please can you post the contents of the corresponding VM shutdown task log? How long does it take to shutdown that VM? (maybe we run into a timeout)
task started by HA resource agent
/dev/qnap_vg/vm-115-disk-1: read failed after 0 of 4096 at 21474770944: Input/output error
/dev/qnap_vg/vm-115-disk-1: read failed after 0 of 4096 at 21474828288: Input/output error
/dev/qnap_vg/vm-115-disk-1: read failed after 0 of 4096 at 0: Input/output error
/dev/qnap_vg/vm-115-disk-1: read failed after 0 of 4096 at 4096: Input/output error
Volume group "qnap_vg" not found
can't deactivate LV '/dev/qnap_vg/vm-115-disk-1': Skipping volume group qnap_vg
volume deativation failed: qnap_lvm:vm-115-disk-1 at /usr/share/perl5/PVE/Storage.pm line 689.
TASK OK
 
NFS is unmounted with umountnfs.sh, so there is no need to set _netdev for NFS?

Also, we do not mount iSCSI volumes (we use iscsi luns directly).

Do I miss something here?
Watching the node console seems to indicate that node shutdown continues in parallel with the shutdown of VM's. Prior to pve-2.2 I seem to remember a shutdown sequence running*) which brought down all running VM's before the node shutdown were allowed to continue.

*) something like bellow were logged to the node console:
shutdown vm 101
shutdown vm 104
.....

A similar shutdown sequence is not seen anymore.
 
Also, please can you post your cluster.conf
<?xml version="1.0"?>
<cluster config_version="36" name="midgaard">
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<quorumd allow_kill="0" interval="3" label="proxmox1_qdisk" tko="10">
<heuristic interval="3" program="ping $GATEWAY -c1 -w1" score="1" tko="4"/>
<heuristic interval="3" program="ip addr | grep eth0 | grep -q UP" score="2" tko="3"/>
</quorumd>
<totem token="54000"/>
<fencedevices>
<fencedevice agent="fence_manual" name="human"/>
</fencedevices>
<clusternodes>
<clusternode name="esx1" nodeid="1" votes="1">
<fence>
<method name="single">
<device name="human" nodename="esx1"/>
</method>
</fence>
</clusternode>
<clusternode name="esx2" nodeid="2" votes="1">
<fence>
<method name="single">
<device name="human" nodename="esx2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="109"/>
<pvevm autostart="1" vmid="114"/>
<pvevm autostart="1" vmid="115"/>
<pvevm autostart="1" vmid="117"/>
<pvevm autostart="1" vmid="112"/>
</rm>
</cluster>
 
if you run HA, please tell more details about your system, e.g post your cluster.conf
 
Please check if rgmanager is stopped before open-iscsi. Maybe there is something wrong with the init script ordering?
 
This is the interesting parts from /etc/rc0.d

lrwxrwxrwx 1 root root 19 Jul 10 02:59 K01rgmanager -> ../init.d/rgmanager
lrwxrwxrwx 1 root root 19 Jul 10 02:59 K01rrdcached -> ../init.d/rrdcached
lrwxrwxrwx 1 root root 24 Jul 10 02:58 K01umountiscsi.sh -> ../init.d/umountiscsi.sh
lrwxrwxrwx 1 root root 17 Jul 10 02:57 K01urandom -> ../init.d/urandom
lrwxrwxrwx 1 root root 17 Jul 10 03:06 K02apache2 -> ../init.d/apache2
lrwxrwxrwx 1 root root 18 Jul 10 03:06 K02pvestatd -> ../init.d/pvestatd
lrwxrwxrwx 1 root root 21 Jul 10 03:06 K02qemu-server -> ../init.d/qemu-server
lrwxrwxrwx 1 root root 12 Jul 10 03:06 K02vz -> ../init.d/vz
lrwxrwxrwx 1 root root 20 Jul 10 03:06 K03open-iscsi -> ../init.d/open-iscsi
lrwxrwxrwx 1 root root 19 Jul 10 03:06 K03pvedaemon -> ../init.d/pvedaemon
lrwxrwxrwx 1 root root 18 Jul 10 03:06 K03vzeventd -> ../init.d/vzeventd
lrwxrwxrwx 1 root root 14 Jul 10 03:06 K04clvm -> ../init.d/clvm
lrwxrwxrwx 1 root root 14 Jul 10 03:06 K05cman -> ../init.d/cman
lrwxrwxrwx 1 root root 21 Jul 10 03:06 K06pve-cluster -> ../init.d/pve-cluster
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!