qm migrate caused server to reboot

cdukes

Active Member
Sep 11, 2015
88
5
28
Raleigh, NC
www.logzilla.net
pve-manager/4.2-18/158720b9 (running kernel: 4.4.16-1-pve)

This morning I tried (via the gui) to migrate a VM. The task said OK, but the VM did not move.
I found this page:
https://forum.proxmox.com/threads/vm-is-locked-migrate.9358/
And was having the same trouble (VM would not unlock)
So I tried restarting the cluster service as that page says.
After that, I tried the qm unlock and qm migrate, but it was still locked.
Next, I did an ls on the /var/lock directory and then, for no apparent reason and without warning, the entire server rebooted.
Her's my ssh log:

Code:
root@<masked> [~]: # qm migrate 318 <masked>
VM is locked (migrate)
root@<masked> [~]: # qm unlock 318
trying to acquire lock...can't lock file '/var/lock/qemu-server/lock-318.conf' - got timeout
root@<masked> [~]: # qm migrate 318 bfs
VM is locked (migrate)
root@<masked> [~]: # df
Filesystem                      Size  Used Avail Use% Mounted on
udev                             10M     0   10M   0% /dev
tmpfs                            19G   18M   19G   1% /run
/dev/dm-0                        34G  2.8G   29G   9% /
tmpfs                            48G   66M   48G   1% /dev/shm
tmpfs                           5.0M     0  5.0M   0% /run/lock
tmpfs                            48G     0   48G   0% /sys/fs/cgroup
cgmfs                           100K     0  100K   0% /run/cgmanager/fs
/dev/fuse                        30M   52K   30M   1% /etc/pve
<masked>:/nfs              22T   10T   11T  49% /mnt/pve/<masked>
<masked2>:/volume1/proxmox   22T  9.3T   13T  43% /mnt/pve/<masked2>
root@<masked> [~]: # pvecm status
Quorum information
------------------
Date:             Thu Sep 29 03:36:25 2016
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          3/556
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 <masked>
0x00000001          2 <masked> (local)
0x00000002          1 <masked>
root@<masked> [~]: # /etc/init.d/pve-cluster stop
[ ok ] Stopping pve-cluster (via systemctl): pve-cluster.service.
root@<masked> [~]: # /etc/init.d/pve-cluster start
[ ok ] Starting pve-cluster (via systemctl): pve-cluster.service.
root@<masked> [~]: # qm migrate 318 <masked>
VM is locked (migrate)
root@<masked> [~]: # qm unlock 318
trying to acquire lock...can't lock file '/var/lock/qemu-server/lock-318.conf' - got timeout
root@<masked> [~]: # ll /var/lock/
total 0
drwxrwxrwt  7 root     root      260 Sep 26 15:06 ./
drwxr-xr-x 24 root     root     1.3K Sep 29 03:36 ../
-rw-r--r--  1 root     root        0 Sep 26 15:06 corosync
drwxrwxr-x  2 root     root       40 Sep 26 15:06 lockdev/
drwx------  2 root     root       40 Sep 26 15:06 lvm/
drwxr-xr-x  2 root     root       40 Sep 26 15:06 lxc/
-rw-r--r--  1 root     root        0 Sep 26 15:06 pvedaemon.lck
-rw-r--r--  1 root     root        0 Sep 26 15:06 pvefw.lck
-rw-r--r--  1 root     root        0 Sep 26 15:06 pvefw-logger.lck
-rw-r--r--  1 www-data www-data    0 Sep 26 15:06 pveproxy.lck
drwxr-xr-x  2 root     root      760 Sep 27 07:33 qemu-server/
-rw-r--r--  1 www-data www-data    0 Sep 26 15:06 spiceproxy.lck
drwxr-xr-x  2 root     root       40 Sep 26 15:06 subsys/
 
Hi Dietmar,
I think I may know the cause - although it's probably worth checking into your code/unit tests for why this would cause the entire server to reboot without any warning.
After the server came back up, I found that the PVE host I was migrating to had the wrong IP in the local server's hosts file (wrong subnet).
So from the log entries below, I see there was an IPC failure, and then near the bottom of this log you can see where the server rebooted as indicated by the ^@'s (those are actually in the log, kind of strange).
(found these in the daemon.log)

Code:
Sep 29 03:36:53 <masked> pve-ha-crm[2455]: ipcc_send_rec failed: Connection refused
Sep 29 03:36:53 <masked> pve-ha-crm[2455]: ipcc_send_rec failed: Connection refused
Sep 29 03:36:53 <masked> pve-ha-crm[2455]: ipcc_send_rec failed: Connection refused
Sep 29 03:36:54 <masked> pve-ha-lrm[18192]: Task 'UPID:<masked>:0000471B:014C0E94:57ED09F6:qmigrate:318:root@pam:' still active, waiting
Sep 29 03:36:55 <masked> systemd[1]: Starting The Proxmox VE cluster filesystem...
Sep 29 03:36:56 <masked> pmxcfs[19590]: [status] notice: update cluster info (cluster name  <masked>, version = 9)
Sep 29 03:36:56 <masked> pmxcfs[19590]: [status] notice: node has quorum
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: members: 1/19590, 2/2688, 3/2373
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: starting data syncronisation
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: received sync request (epoch 1/19590/00000001)
Sep 29 03:36:56 <masked> pmxcfs[19590]: [status] notice: members: 1/19590, 2/2688, 3/2373
Sep 29 03:36:56 <masked> pmxcfs[19590]: [status] notice: starting data syncronisation
Sep 29 03:36:56 <masked> pmxcfs[19590]: [status] notice: received sync request (epoch 1/19590/00000001)
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: received all states
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: leader is 2/2688
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: synced members: 2/2688, 3/2373
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: waiting for updates from leader
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: update complete - trying to commit (got 4 inode updates)
Sep 29 03:36:56 <masked> pmxcfs[19590]: [dcdb] notice: all data is up to date
Sep 29 03:36:56 <masked> pmxcfs[19590]: [status] notice: received all states
Sep 29 03:36:56 <masked> pmxcfs[19590]: [status] notice: all data is up to date
Sep 29 03:36:57 <masked> systemd[1]: Started The Proxmox VE cluster filesystem.
Sep 29 03:36:59 <masked> pve-ha-lrm[18192]: Task 'UPID:<masked>:0000471B:014C0E94:57ED09F6:qmigrate:318:root@pam:' still active, waiting
Sep 29 03:37:03 <masked> pve-ha-crm[2455]: status change wait_for_quorum => slave
Sep 29 03:37:04 <masked> pve-ha-lrm[18192]: Task 'UPID:<masked>:0000471B:014C0E94:57ED09F6:qmigrate:318:root@pam:' still active, waiting
Sep 29 03:37:09 <masked> pve-ha-lrm[18192]: Task 'UPID:<masked>:0000471B:014C0E94:57ED09F6:qmigrate:318:root@pam:' still active, waiting
Sep 29 03:37:14 <masked> pve-ha-lrm[18192]: Task 'UPID:<masked>:0000471B:014C0E94:57ED09F6:qmigrate:318:root@pam:' still active, waiting
Sep 29 03:37:19 <masked> pveproxy[13284]: ipcc_send_rec failed: Transport endpoint is not connected
Sep 29 03:37:19 <masked> pve-ha-lrm[18192]: Task 'UPID:<masked>:0000471B:014C0E94:57ED09F6:qmigrate:318:root@pam:' still active, waiting
Sep 29 03:37:20 <masked> pvedaemon[2447]: ipcc_send_rec failed: Transport endpoint is not connected
Sep 29 03:37:24 <masked> pve-ha-lrm[18192]: Task 'UPID:<masked>:0000471B:014C0E94:57ED09F6:qmigrate:318:root@pam:' still active, waiting
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Sep 29 03:40:43 <masked> systemd-modules-load[992]: Module 'fuse' is builtin
Sep 29 03:40:43 <masked> systemd-modules-load[992]: Inserted module 'qlge'
Sep 29 03:40:43 <masked> systemd-modules-load[992]: Inserted module '8021q'
Sep 29 03:40:43 <masked> systemd[1]: Started Create Static Device Nodes in /dev.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!