Live migartion fails in KVM 1.3

Code:
Jan 03 11:19:56 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@10.10.0.200 qm resume 124 --skiplock' failed: exit code 2
Jan 03 11:19:59 ERROR: migration finished with problems (duration 00:00:30)
TASK ERROR: migration problems

Code:
Jan  3 11:19:31 proliant01 qm[96378]: VM 124 qmp command failed - unable to find configuration file for VM 124 - no such machine
Jan  3 11:19:31 proliant01 qm[96378]: VM 124 qmp command failed - unable to find configuration file for VM 124 - no such machine

complains, moves, vm is not restarted have to restart manually.

pveversion -v, all but last node of 3. Trying to migrate the few machines left to reboot that one.

Code:
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-74
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-71
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1


With all 3 servers synced with current apt-get dist-upgrade, migration seems to be fine.. just time to sit back and watch ;)

yuck, seeing some of the same symptoms.. Migration starts, end up with not being able to connect to vm as stated in logs above.
 
Last edited:
Just noting that I am seeing this problem too. I'm using:

root@proxmox1a:~# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-71
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1
AND
root@proxmox1b:~# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-71
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1



Our guests are mostly using virtio and it seems that those with more than 1GB of RAM and/or those that are under load fail to migrate properly. This is on a DRBD + LVM setup.

I'll note that I have a test setup with 4 hosts using sheepdog as the backend and migration works well.
 
I had 4 VMs on one node I could not move at all, finally just shut'em down and put the updates on and so far so good.. The migrated ok after being updated and rebooted.

One thing I did notice before hand, was a snapshot that did not finish. Hung or crashed. And it was not a VM that was included in any schedule yet. So ,not sure how it got started or circumstances that surrounded the failed attempt.
 
It appears that I'm also seeing this problem with our Windows 2K8R2 VM servers. Linux 2.6/3.x VMs with virtio are moving fine.

Servers are a HP DL360G7 and HP DL360pG8. Both servers match all versions on their pveversion command

Code:
root@vmhost04:~# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-71
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1

The storage for our VMs is an OpenFiler SAN running iscsi. We are using LVM Group (network backing with iSCSI targets).

So I wouldn't tear up our production VMs, I set up a new test VM (W2K8 R2 Enterprise 64 bit) 4GB of ram, 20GB Drive
- IDE Drive & E1000 NIC. migration: SUCCESS
- IDE Drive & VirtIO NIC. migration: SUCCESS
* tested with latest redhat VirtIO CD virtio-win-0.1-49.iso
- IDE Drive & VirtIO NIC. migration: SUCCESS
* tested w/ older redhat VirtIO CD virtio-win-0.1-30.iso
- VIRTIO Drive & VirtIO NIC. migration: SUCCESS
* tested w/ older redhat VirtIO CD virtio-win-0.1-30.iso
- VIRTIO Drive & VirtIO NIC. migration: SUCCESS
* tested with latest redhat VirtIO CD virtio-win-0.1-49.iso

So that didn't seem to help identify the problem. Next I checked one of my production systems that were failing to migrate. They had VirtIO Nic 61.63.103.3000 and VirtIO SCSI 61.61.101.5800 (4/4/2011). I upgraded both the VirtIO Nic and VirtIO SCSI to the latest 61.64.104.4900 (virtio-win-0.1-49.iso)

It continues to fail to complete a live mirgration. When I start seeing the "WARNING: unable to connect to VM 107 socket" entries in the syslog on the host where I'm migrating from, the only solution is to click the stop button for the migration. Then perform a qm unlock 107 followed by a qm stop 107 and then a qm start 107 to get it going again.

Here's the syslog entries from the last migration attempt I made on vm107 which failed:
vmhost01 (Where the VM is currently running)
Code:
Jan  4 13:36:06 vmhost01 pvedaemon[35295]: <root@pam> starting task UPID:vmhost01:00008BF1:006C1D1B:50E72F26:qmigrate:107:root@pam:
Jan  4 13:36:07 vmhost01 pmxcfs[2060]: [status] notice: received log
Jan  4 13:36:08 vmhost01 pmxcfs[2060]: [status] notice: received log
Jan  4 13:36:28 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:28 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:31 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:31 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:31 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:34 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:35 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:38 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:38 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:41 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:41 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:42 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:45 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:45 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:48 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:49 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:51 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:51 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:52 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:55 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:55 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:58 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:58 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:01 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:01 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:02 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:02 vmhost01 pvedaemon[35825]: WARNING: interrupted by signal
Jan  4 13:37:02 vmhost01 pvedaemon[35825]: VM 107 qmp command failed - VM 107 qmp command 'query-migrate' failed - interrupted by signal
Jan  4 13:37:02 vmhost01 pvedaemon[35825]: WARNING: query migrate failed: VM 107 qmp command 'query-migrate' failed - interrupted by signal#012
Jan  4 13:37:05 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:05 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:07 vmhost01 pvedaemon[35295]: <root@pam> end task UPID:vmhost01:00008BF1:006C1D1B:50E72F26:qmigrate:107:root@pam: unexpected status
Jan  4 13:37:09 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 32 retries
Jan  4 13:37:09 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:11 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:12 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:12 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:15 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:16 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:18 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:19 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:21 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:21 vmhost01 pmxcfs[2060]: [dcdb] notice: data verification successful
Jan  4 13:37:22 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:22 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:25 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:26 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:28 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:29 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:31 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:31 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
.... just keeps scrolling these messages until the migration is stopped.....

vmhost04 (where I was trying to migrate the VM to)
Code:
Jan  4 13:36:06 vmhost04 pmxcfs[2258]: [status] notice: received log
Jan  4 13:36:07 vmhost04 qm[673685]: <root@pam> starting task UPID:vmhost04:000A4796:00F19979:50E72F27:qmstart:107:root@pam:
Jan  4 13:36:07 vmhost04 qm[673686]: start VM 107: UPID:vmhost04:000A4796:00F19979:50E72F27:qmstart:107:root@pam:
Jan  4 13:36:07 vmhost04 multipathd: dm-12: add map (uevent)
Jan  4 13:36:07 vmhost04 multipathd: dm-3: add map (uevent)
Jan  4 13:36:07 vmhost04 multipathd: dm-3: devmap already registered
Jan  4 13:36:07 vmhost04 multipathd: dm-13: add map (uevent)
Jan  4 13:36:07 vmhost04 multipathd: dm-3: add map (uevent)
Jan  4 13:36:07 vmhost04 multipathd: dm-3: devmap already registered
Jan  4 13:36:07 vmhost04 kernel: device tap107i0 entered promiscuous mode
Jan  4 13:36:07 vmhost04 kernel: vmbr0: port 4(tap107i0) entering forwarding state
Jan  4 13:36:08 vmhost04 qm[673686]: VM 107 qmp command failed - unable to find configuration file for VM 107 - no such machine
Jan  4 13:36:08 vmhost04 qm[673686]: VM 107 qmp command failed - unable to find configuration file for VM 107 - no such machine
Jan  4 13:36:08 vmhost04 qm[673685]: <root@pam> end task UPID:vmhost04:000A4796:00F19979:50E72F27:qmstart:107:root@pam: OK
Jan  4 13:36:18 vmhost04 kernel: tap107i0: no IPv6 routers present
Jan  4 13:37:02 vmhost04 kernel: vmbr0: port 4(tap107i0) entering disabled state
Jan  4 13:37:02 vmhost04 kernel: vmbr0: port 4(tap107i0) entering disabled state
Jan  4 13:37:02 vmhost04 multipathd: dm-13: add map (uevent)
Jan  4 13:37:02 vmhost04 multipathd: dm-12: add map (uevent)
Jan  4 13:37:07 vmhost04 pmxcfs[2258]: [status] notice: received log
Jan  4 13:37:21 vmhost04 pmxcfs[2258]: [dcdb] notice: data verification successful
Jan  4 13:37:21 vmhost04 rrdcached[2219]: flushing old values
Jan  4 13:37:21 vmhost04 rrdcached[2219]: rotating journals
Jan  4 13:37:21 vmhost04 rrdcached[2219]: started new journal /var/lib/rrdcached/journal//rrd.journal.1357328241.964452
Jan  4 13:37:21 vmhost04 rrdcached[2219]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1357321041.964559

And here's the config file of vm107:
Code:
bootdisk: virtio0
cores: 1
ide2: none,media=cdrom,size=57240K
memory: 4096
name: SANDBOX02
net0: virtio=B2:E6:68:FE:40:F5,bridge=vmbr0
ostype: w2k8
sockets: 2
virtio0: SAN01:vm-107-disk-1,size=120G
virtio1: SAN01:vm-107-disk-2,size=240G

And here's the config of my test VM which moved back & forth without any issues:
Code:
bootdisk: virtio0
cores: 1
ide2: none,media=cdrom,size=57240K
memory: 4096
name: win2k8test
net0: virtio=4E:33:B8:3D:AA:6F,bridge=vmbr0
ostype: w2k8
sockets: 2
virtio0: SAN01:vm-400-disk-1,size=20G
virtio1: SAN01:vm-400-disk-2,size=250G

I hope this is enough information to help. I wish my Windows test VM exhibited the same problems as a production VM. But it does not. That makes me think it may be related to VM process load or something inside the VM as others have mentioned.

--
Brian
 
Last edited:
same problem to me...but i also had this problem with an debian 6 kvm with virtio...I had to hard shutdown the vm....but today every migration worked..strange
 
same problem to me...but i also had this problem with an debian 6 kvm with virtio...I had to hard shutdown the vm....but today every migration worked..strange
Does migration failed between kvm1.3 -> kvm1.3, or kvm1.2->kvm 1.3 ? (because kvm1.2->kvm1.3 doesn't work)
 
It appears that I'm also seeing this problem with our Windows 2K8R2 VM servers. Linux 2.6/3.x VMs with virtio are moving fine.

Servers are a HP DL360G7 and HP DL360pG8. Both servers match all versions on their pveversion command

Code:
root@vmhost04:~# pveversion -v
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-71
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1

The storage for our VMs is an OpenFiler SAN running iscsi. We are using LVM Group (network backing with iSCSI targets).

So I wouldn't tear up our production VMs, I set up a new test VM (W2K8 R2 Enterprise 64 bit) 4GB of ram, 20GB Drive
- IDE Drive & E1000 NIC. migration: SUCCESS
- IDE Drive & VirtIO NIC. migration: SUCCESS
* tested with latest redhat VirtIO CD virtio-win-0.1-49.iso
- IDE Drive & VirtIO NIC. migration: SUCCESS
* tested w/ older redhat VirtIO CD virtio-win-0.1-30.iso
- VIRTIO Drive & VirtIO NIC. migration: SUCCESS
* tested w/ older redhat VirtIO CD virtio-win-0.1-30.iso
- VIRTIO Drive & VirtIO NIC. migration: SUCCESS
* tested with latest redhat VirtIO CD virtio-win-0.1-49.iso

So that didn't seem to help identify the problem. Next I checked one of my production systems that were failing to migrate. They had VirtIO Nic 61.63.103.3000 and VirtIO SCSI 61.61.101.5800 (4/4/2011). I upgraded both the VirtIO Nic and VirtIO SCSI to the latest 61.64.104.4900 (virtio-win-0.1-49.iso)

It continues to fail to complete a live mirgration. When I start seeing the "WARNING: unable to connect to VM 107 socket" entries in the syslog on the host where I'm migrating from, the only solution is to click the stop button for the migration. Then perform a qm unlock 107 followed by a qm stop 107 and then a qm start 107 to get it going again.

Here's the syslog entries from the last migration attempt I made on vm107 which failed:
vmhost01 (Where the VM is currently running)
Code:
Jan  4 13:36:06 vmhost01 pvedaemon[35295]: <root@pam> starting task UPID:vmhost01:00008BF1:006C1D1B:50E72F26:qmigrate:107:root@pam:
Jan  4 13:36:07 vmhost01 pmxcfs[2060]: [status] notice: received log
Jan  4 13:36:08 vmhost01 pmxcfs[2060]: [status] notice: received log
Jan  4 13:36:28 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:28 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:31 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:31 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:31 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:34 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:35 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:38 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:38 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:41 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:41 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:42 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:45 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:45 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:48 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:49 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:51 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:51 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:52 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:55 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:55 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:58 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:36:58 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:01 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:01 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:02 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:02 vmhost01 pvedaemon[35825]: WARNING: interrupted by signal
Jan  4 13:37:02 vmhost01 pvedaemon[35825]: VM 107 qmp command failed - VM 107 qmp command 'query-migrate' failed - interrupted by signal
Jan  4 13:37:02 vmhost01 pvedaemon[35825]: WARNING: query migrate failed: VM 107 qmp command 'query-migrate' failed - interrupted by signal#012
Jan  4 13:37:05 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:05 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:07 vmhost01 pvedaemon[35295]: <root@pam> end task UPID:vmhost01:00008BF1:006C1D1B:50E72F26:qmigrate:107:root@pam: unexpected status
Jan  4 13:37:09 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 32 retries
Jan  4 13:37:09 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:11 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:12 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:12 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:15 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:16 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:18 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:19 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:21 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:21 vmhost01 pmxcfs[2060]: [dcdb] notice: data verification successful
Jan  4 13:37:22 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:22 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:25 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:26 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:28 vmhost01 pvedaemon[35295]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:29 vmhost01 pvedaemon[35439]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:31 vmhost01 pvestatd[2534]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
Jan  4 13:37:31 vmhost01 pvedaemon[35588]: WARNING: unable to connect to VM 107 socket - timeout after 31 retries
.... just keeps scrolling these messages until the migration is stopped.....

vmhost04 (where I was trying to migrate the VM to)
Code:
Jan  4 13:36:06 vmhost04 pmxcfs[2258]: [status] notice: received log
Jan  4 13:36:07 vmhost04 qm[673685]: <root@pam> starting task UPID:vmhost04:000A4796:00F19979:50E72F27:qmstart:107:root@pam:
Jan  4 13:36:07 vmhost04 qm[673686]: start VM 107: UPID:vmhost04:000A4796:00F19979:50E72F27:qmstart:107:root@pam:
Jan  4 13:36:07 vmhost04 multipathd: dm-12: add map (uevent)
Jan  4 13:36:07 vmhost04 multipathd: dm-3: add map (uevent)
Jan  4 13:36:07 vmhost04 multipathd: dm-3: devmap already registered
Jan  4 13:36:07 vmhost04 multipathd: dm-13: add map (uevent)
Jan  4 13:36:07 vmhost04 multipathd: dm-3: add map (uevent)
Jan  4 13:36:07 vmhost04 multipathd: dm-3: devmap already registered
Jan  4 13:36:07 vmhost04 kernel: device tap107i0 entered promiscuous mode
Jan  4 13:36:07 vmhost04 kernel: vmbr0: port 4(tap107i0) entering forwarding state
Jan  4 13:36:08 vmhost04 qm[673686]: VM 107 qmp command failed - unable to find configuration file for VM 107 - no such machine
Jan  4 13:36:08 vmhost04 qm[673686]: VM 107 qmp command failed - unable to find configuration file for VM 107 - no such machine
Jan  4 13:36:08 vmhost04 qm[673685]: <root@pam> end task UPID:vmhost04:000A4796:00F19979:50E72F27:qmstart:107:root@pam: OK
Jan  4 13:36:18 vmhost04 kernel: tap107i0: no IPv6 routers present
Jan  4 13:37:02 vmhost04 kernel: vmbr0: port 4(tap107i0) entering disabled state
Jan  4 13:37:02 vmhost04 kernel: vmbr0: port 4(tap107i0) entering disabled state
Jan  4 13:37:02 vmhost04 multipathd: dm-13: add map (uevent)
Jan  4 13:37:02 vmhost04 multipathd: dm-12: add map (uevent)
Jan  4 13:37:07 vmhost04 pmxcfs[2258]: [status] notice: received log
Jan  4 13:37:21 vmhost04 pmxcfs[2258]: [dcdb] notice: data verification successful
Jan  4 13:37:21 vmhost04 rrdcached[2219]: flushing old values
Jan  4 13:37:21 vmhost04 rrdcached[2219]: rotating journals
Jan  4 13:37:21 vmhost04 rrdcached[2219]: started new journal /var/lib/rrdcached/journal//rrd.journal.1357328241.964452
Jan  4 13:37:21 vmhost04 rrdcached[2219]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1357321041.964559

And here's the config file of vm107:
Code:
bootdisk: virtio0
cores: 1
ide2: none,media=cdrom,size=57240K
memory: 4096
name: SANDBOX02
net0: virtio=B2:E6:68:FE:40:F5,bridge=vmbr0
ostype: w2k8
sockets: 2
virtio0: SAN01:vm-107-disk-1,size=120G
virtio1: SAN01:vm-107-disk-2,size=240G

And here's the config of my test VM which moved back & forth without any issues:
Code:
bootdisk: virtio0
cores: 1
ide2: none,media=cdrom,size=57240K
memory: 4096
name: win2k8test
net0: virtio=4E:33:B8:3D:AA:6F,bridge=vmbr0
ostype: w2k8
sockets: 2
virtio0: SAN01:vm-400-disk-1,size=20G
virtio1: SAN01:vm-400-disk-2,size=250G

I hope this is enough information to help. I wish my Windows test VM exhibited the same problems as a production VM. But it does not. That makes me think it may be related to VM process load or something inside the VM as others have mentioned.

--
Brian

That is exactly the same as our problem. The only difference that it is with Linux 2.6/3.0 as we don't have any windows VM's
 
Hi Guy,
Can you try to send in monitor "migrate_set_downtime 0.1" before doing the migration ?

spirit,

this seems to work. How do I make it permanent so it remains when I power down the VM?

How do I find out what the current value of "migrate_set_downtime" is?

--
Brian
 
spirit,

this seems to work. How do I make it permanent so it remains when I power down the VM?

How do I find out what the current value of "migrate_set_downtime" is?

--
Brian

We have make fix for this in current proxmox git, but it's not yet release in pvetest.
you can add "migrate_downtime: xx" in your vm config file, but currently in stable, it's only integer...
you can try "migrate_downtime: 0", but not sure that migration will finish
 
We have make fix for this in current proxmox git, but it's not yet release in pvetest.
you can add "migrate_downtime: xx" in your vm config file, but currently in stable, it's only integer...
you can try "migrate_downtime: 0", but not sure that migration will finish

Thanks spirit,

I'll try migrate_downtime: 0 in my config file and report back.

This is sort of a MAJOR bug, any idea when the fix will get released into pvetest and the main updates. It seems like this is the worst problem I've encountered in Proxmox over the past 2 years.

--
Brian
 
spirit,

I tried adding "migrate_downtime: 0" to the config file and as expected the migration would not finish.

I also tried adding "migrate_downtime: 1"and the migration fails as originally reported.

In fact when trying anything greater than migrate_set_downtime 0.1 I would get the same migration failing error as originally reported.

Thanks for the help.
--
Brian
 
Hi Guy,

Can you try to send in monitor "migrate_set_downtime 0.1" before doing the migration ?

Yes this works! Finally live migrations will work!

And for the technical background. What do we change exactly, and why does that work? I have searched on the internet but the only thing I can find is:

migrate_downtime: num set maximum tolerated downtime (in seconds) for migrations. [default=1]

So we change the 1 to 0.1 seconds. But why does the transfer stall when the maximum downtime is set to 1 second and not if set to 0.1 second. And what does this setting exactly do?

Sorry but I am curious and I hope that if I know more, I can can contribute more...
 
Last edited:
Just noting that setting "migrate_set_downtime 0.1" in the monitor tab of my VMs works for me too. Migration takes a bit longer, but it's nice to have the smaller downtime too.

Note that my VMs are Linux, most of which are using virtio drivers for NIC and hard disk.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!