KVM Guest freeze after live migration only in one direction

icoronado

New Member
Apr 15, 2013
11
0
1
Hello. We are testing proxmox with one Dell r610 intel Xeon, one HP G7 165 with amd Opteron and a dell md3200i ISCSI storage system. At this time the HA is not configured so We are testing live migration only, no HA.

When we start a VM in the HP node we can't migrate to DELL node. The migration finished with a succesfull message but the VM is freeze. We have to stop the VM and start it again. In the other way we have no problem. The VM we tested are Win 2K8, win 7 and ubuntu 12.04, all with virtio drivers installed.

The only strange thing I see is errors with /dev/sd* in the migration process.

/dev/sdb: read failed after 0 of 4096 at 0: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdb: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdc: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdh: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdh: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdi: read failed after 0 of 4096 at 0: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdi: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdb: read failed after 0 of 4096 at 0: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdb: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdc: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdh: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdh: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdi: read failed after 0 of 4096 at 0: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdi: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdb: read failed after 0 of 4096 at 0: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdb: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdc: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdh: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdh: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdi: read failed after 0 of 4096 at 0: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdi: read failed after 0 of 4096 at 4096: Input/output error
Apr 19 16:17:35 starting migration of VM 102 to node 'Yoda' (200.200.201.2)
Apr 19 16:17:35 copying disk images
Apr 19 16:17:35 starting VM 102 on remote node 'Yoda'
Apr 19 16:17:37 starting migration tunnel
Apr 19 16:17:38 starting online/live migration on port 60000
Apr 19 16:17:38 migrate_set_speed: 8589934592
Apr 19 16:17:38 migrate_set_downtime: 0.1
Apr 19 16:17:40 migration status: active (transferred 111530793, remaining 2041954304), total 2164654080)
Apr 19 16:17:42 migration status: active (transferred 222764274, remaining 1830916096), total 2164654080)
Apr 19 16:17:44 migration status: active (transferred 299949361, remaining 1596813312), total 2164654080)
Apr 19 16:17:46 migration status: active (transferred 409710811, remaining 1278558208), total 2164654080)
Apr 19 16:17:48 migration status: active (transferred 532290268, remaining 1047597056), total 2164654080)
Apr 19 16:17:50 migration status: active (transferred 621268253, remaining 929361920), total 2164654080)
Apr 19 16:17:52 migration status: active (transferred 712314789, remaining 810864640), total 2164654080)
Apr 19 16:17:54 migration status: active (transferred 805500715, remaining 676638720), total 2164654080)
Apr 19 16:17:56 migration status: active (transferred 889300557, remaining 173568000), total 2164654080)
Apr 19 16:17:58 migration status: active (transferred 982950334, remaining 3428352), total 2164654080)
Apr 19 16:17:58 migration status: active (transferred 1013266548, remaining 0), total 2164654080)
Apr 19 16:17:59 migration status: active (transferred 1027227517, remaining 0), total 2164654080)
Apr 19 16:17:59 migration speed: 97.52 MB/s - downtime 16 ms
Apr 19 16:17:59 migration status: completed
/dev/sdb: read failed after 0 of 4096 at 0: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdb: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdc: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdh: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdh: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdi: read failed after 0 of 4096 at 0: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdi: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdb: read failed after 0 of 4096 at 0: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdb: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdb: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530571264: Input/output error
/dev/sdc: read failed after 0 of 4096 at 80530628608: Input/output error
/dev/sdc: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdh: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdh: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdh: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdi: read failed after 0 of 4096 at 0: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374116864: Input/output error
/dev/sdi: read failed after 0 of 4096 at 107374174208: Input/output error
/dev/sdi: read failed after 0 of 4096 at 4096: Input/output error
Apr 19 16:18:02 migration finished successfuly (duration 00:00:28)
TASK OK

but we have this errors in the other way too, and then VM works fine.

Sorry for my English!
 
Yes, we follow the howto, here you have multipath -ll output for dell node:
36d4ae52000853d0c000003a5516799e3 dm-3 DELL,MD32xxi
size=75G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 6:0:0:0 sdc 8:32 active ready running
| `- 5:0:0:0 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
|- 3:0:0:0 sdb 8:16 active ghost running
`- 4:0:0:0 sdd 8:48 active ghost running
36d4ae520008544c7000003305167b385 dm-4 DELL,MD32xxi
size=100G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 3:0:0:1 sde 8:64 active ready running
| `- 4:0:0:1 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
|- 6:0:0:1 sdg 8:96 active ghost running
`- 5:0:0:1 sdi 8:128 active ghost running

And HP node:


36d4ae52000853d0c000003a5516799e3 dm-3 DELL,MD32xxi
size=75G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 7:0:0:0 sdf 8:80 active ready running
| `- 8:0:0:0 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
|- 9:0:0:0 sdb 8:16 active ghost running
`- 6:0:0:0 sdc 8:32 active ghost running
36d4ae520008544c7000003305167b385 dm-4 DELL,MD32xxi
size=100G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 9:0:0:1 sdd 8:48 active ready running
| `- 6:0:0:1 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
|- 8:0:0:1 sdh 8:112 active ghost running
`- 7:0:0:1 sdi 8:128 active ghost running

And here the VM config:
root@darthvader:~# cat /etc/pve/qemu-server/102.conf
balloon: 2
bootdisk: virtio0
cores: 1
ide0: none,media=cdrom
ide1: SO_Servidores:vm-102-disk-1,size=25G
memory: 2048
name: win2k8Test
net0: virtio=BA:96:B3:97:BE:1D,bridge=vmbr0
ostype: win7
sockets: 1
virtio1: DataBases:vm-102-disk-1,cache=writeback,size=15G


Thanks
 
Yes, I follow the howto. Here is the multipath -ll output for dell server:

36d4ae52000853d0c000003a5516799e3 dm-3 DELL,MD32xxi
size=75G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 6:0:0:0 sdc 8:32 active ready running
| `- 5:0:0:0 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
|- 3:0:0:0 sdb 8:16 active ghost running
`- 4:0:0:0 sdd 8:48 active ghost running
36d4ae520008544c7000003305167b385 dm-4 DELL,MD32xxi
size=100G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 3:0:0:1 sde 8:64 active ready running
| `- 4:0:0:1 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
|- 6:0:0:1 sdg 8:96 active ghost running
`- 5:0:0:1 sdi 8:128 active ghost running

And for AMD server:
36d4ae52000853d0c000003a5516799e3 dm-3 DELL,MD32xxi
size=75G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 7:0:0:0 sdf 8:80 active ready running
| `- 8:0:0:0 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
|- 9:0:0:0 sdb 8:16 active ghost running
`- 6:0:0:0 sdc 8:32 active ghost running
36d4ae520008544c7000003305167b385 dm-4 DELL,MD32xxi
size=100G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=active
| |- 9:0:0:1 sdd 8:48 active ready running
| `- 6:0:0:1 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
|- 8:0:0:1 sdh 8:112 active ghost running
`- 7:0:0:1 sdi 8:128 active ghost running

I have all disks in the whitelist except the 2 LUN's i need.

The VM config is this:

balloon: 2
bootdisk: virtio0
cores: 1
ide0: none,media=cdrom
ide1: SO_Servidores:vm-102-disk-1,size=25G
memory: 2048
name: win2k8Test
net0: virtio=BA:96:B3:97:BE:1D,bridge=vmbr0
ostype: win7
sockets: 1
virtio1: DataBases:vm-102-disk-1,cache=writeback,size=15G



Thanks for your time.

- - - Updated - - -

ok, I think I see the error in the multipath part, I'm going to check this.
 
check your /etc/lvm/lvm.conf
Yes, there was the problem with i/o error. Thanksbut the guest continues freezing after live migration. Now this is the log for the migration.Apr 22 11:00:24 starting migration of VM 101 to node 'Yoda' (200.200.201.2)Apr 22 11:00:24 copying disk imagesApr 22 11:00:24 starting VM 101 on remote node 'Yoda'Apr 22 11:00:26 starting migration tunnelApr 22 11:00:26 starting online/live migration on port 60000Apr 22 11:00:26 migrate_set_speed: 8589934592Apr 22 11:00:26 migrate_set_downtime: 0.1Apr 22 11:00:28 migration status: active (transferred 150786958, remaining 1966723072), total 2164654080)Apr 22 11:00:30 migration status: active (transferred 225670809, remaining 1890959360), total 2164654080)Apr 22 11:00:32 migration status: active (transferred 298405837, remaining 1674944512), total 2164654080)Apr 22 11:00:34 migration status: active (transferred 398433607, remaining 1343582208), total 2164654080)Apr 22 11:00:36 migration status: active (transferred 558909640, remaining 659099648), total 2164654080)Apr 22 11:00:38 migration status: active (transferred 609409314, remaining 560267264), total 2164654080)Apr 22 11:00:40 migration status: active (transferred 733011775, remaining 376233984), total 2164654080)Apr 22 11:00:42 migration status: active (transferred 794741343, remaining 284217344), total 2164654080)Apr 22 11:00:44 migration status: active (transferred 934882735, remaining 111718400), total 2164654080)Apr 22 11:00:46 migration status: active (transferred 997124224, remaining 0), total 2164654080)Apr 22 11:00:47 migration speed: 97.52 MB/s - downtime 28 msApr 22 11:00:47 migration status: completedApr 22 11:00:50 migration finished successfuly (duration 00:00:27)TASK OK
 
You need to edit the lvm.conf file and add something similar to this as a filter:

filter = ["a/sda/","r/sdb*/","r/sdc*/","r/sdd*/","r/sde*/","r/sdf*/","r/sdg*/","r/sdh*/","r/sdi*/","r/sdj*/","r/sdk*/"]

Also as Tom says live migration between Intel <> AMD is generally a no go.
 
Sorry for the format, i'm having a lot of problems to see the post in the forum and write answers.

Here te output again.

Apr 22 11:00:24 starting migration of VM 101 to node 'Yoda' (200.200.201.2)
Apr 22 11:00:24 copying disk images
Apr 22 11:00:24 starting VM 101 on remote node 'Yoda'
Apr 22 11:00:26 starting migration tunnel
Apr 22 11:00:26 starting online/live migration on port 60000
Apr 22 11:00:26 migrate_set_speed: 8589934592
Apr 22 11:00:26 migrate_set_downtime: 0.1
Apr 22 11:00:28 migration status: active (transferred 150786958, remaining 1966723072), total 2164654080)
Apr 22 11:00:30 migration status: active (transferred 225670809, remaining 1890959360), total 2164654080)
Apr 22 11:00:32 migration status: active (transferred 298405837, remaining 1674944512), total 2164654080)
Apr 22 11:00:34 migration status: active (transferred 398433607, remaining 1343582208), total 2164654080)
Apr 22 11:00:36 migration status: active (transferred 558909640, remaining 659099648), total 2164654080)
Apr 22 11:00:38 migration status: active (transferred 609409314, remaining 560267264), total 2164654080)
Apr 22 11:00:40 migration status: active (transferred 733011775, remaining 376233984), total 2164654080)
Apr 22 11:00:42 migration status: active (transferred 794741343, remaining 284217344), total 2164654080)
Apr 22 11:00:44 migration status: active (transferred 934882735, remaining 111718400), total 2164654080)
Apr 22 11:00:46 migration status: active (transferred 997124224, remaining 0), total 2164654080)
Apr 22 11:00:47 migration speed: 97.52 MB/s - downtime 28 ms
Apr 22 11:00:47 migration status: completed
Apr 22 11:00:50 migration finished successfuly (duration 00:00:27)
TASK OK


I found this in dmesg
kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
 
Sorry for the format, i'm having a lot of problems to see the post in the forum and write answers.
...d

if you miss posts, use the "linear display mode" instead of threaded/hybrid mode.
 
ok, I think I found the problem.

Migration fails when parameter xbzrle is on in the VM. I check it and I have xbzrle ON in AMD servers and OFF on INTEL servers. Migration goes OK from INTEL to AMD but freezes from AMD to DELL.

https://bugzilla.redhat.com/show_bug.cgi?id=916060

Can I set xbzrle parameter off default to all nodes or force to ON in INTEL server?

Thanks
 
ok, I think I found the problem.

Migration fails when parameter xbzrle is on in the VM. I check it and I have xbzrle ON in AMD servers and OFF on INTEL servers. Migration goes OK from INTEL to AMD but freezes from AMD to DELL.

https://bugzilla.redhat.com/show_bug.cgi?id=916060

Can I set xbzrle parameter off default to all nodes or force to ON in INTEL server?

Thanks

are you sure that all yours nodes are updated to last proxmox stable packages ?
 
ok, I think I found the problem.

Migration fails when parameter xbzrle is on in the VM. I check it and I have xbzrle ON in AMD servers and OFF on INTEL servers. Migration goes OK from INTEL to AMD but freezes from AMD to DELL.

https://bugzilla.redhat.com/show_bug.cgi?id=916060

Can I set xbzrle parameter off default to all nodes or force to ON in INTEL server?

Thanks

are you sure that all yours nodes are updated to last proxmox stable packages ?
 
Hello again. Yes, the two nodes are updated to the latest version. I installed again the nodes and apply all the updates, but the problems are the same. I think i can try to install another distro in the servers and test KVM to see if the problem is in KVM or in proxmox. What do you think about this?

Regards.
 
I finally gave up. Previously migrate a virtual machine between athlon and xeon, but between these two processors does not seem possible. I will try to get another opteron. Thanks for the help.
regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!