CT - online migration: strange problem

mir

Famous Member
Apr 14, 2012
3,570
132
133
Copenhagen, Denmark
Hi,

After creating a new CT on shared storage (NFS) the first time I try online migration it fails (happens for every newly created CT) but succeeding online migration works like a charm. Debug follows when it fails:

Code:
[COLOR=#000000][FONT=tahoma]Oct 18 00:22:58 starting migration of CT 118 to node 'esx2' (192.168.2.9)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:22:58 container is running - using online migration[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:22:59 container data is on shared storage 'qnap_nfs'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:22:59 start live migration - suspending container[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:22:59 dump container state[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:22:59 dump 2nd level quota[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:00 initialize container on remote node 'esx2'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:00 initializing remote quota[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 turn on remote quota[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 load 2nd level quota[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 starting container on remote node 'esx2'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 restore container state[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 # /usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@192.168.2.9 vzctl restore 118 --undump --dumpfile /mnt/pve/qnap_nfs/dump/dump.118 --skip_arpdetect[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 Restoring container ...[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 Starting container ...[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 Container is mounted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10     undump...[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 Setting CPU units: 1000[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 Setting CPUs: 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 Configure veth devices: veth118.0 [/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:10 Adding interface veth118.0 to bridge vmbr0 on CT0 for CT118[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 vzquota : (warning) Quota is running for id 118 already[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: undump failed: No such file or directory[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Restoring failed:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: AF_PACKET binding failed: -22[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: rst_open_file: failed to lookup path '/lib/.nfs00000000035e46510000014b': -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: can't open file /lib/.nfs00000000035e46510000014b[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: do_rst_vma: rst_file: 54840[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: do_rst_mm: failed to restore vma: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: do_rst_mm 335520[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: rst_mm: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: make_baby: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Error: rst_clone_children[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 Container is unmounted[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 ERROR: online migrate failure - Failed to restore container: Container start failed[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 start final cleanup[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Oct 18 00:23:11 ERROR: migration finished with problems (duration 00:00:13)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK ERROR: migration problems
[/FONT][/COLOR]

Code:
pveversion --verbose
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1


Any idea what is going on here?
 
Last edited:
add your comment to the openvz bug report, the more reports the better.
 
I have similar problems with migrating with online option enabled. I had the same problems on 2.2, 2.3 and now on 3.1. I run containers on shared NFS. When I run online migration I get this:

Code:
Sep 20 09:25:31 starting migration of CT 105 to node 'proxmox' (192.168.0.1)
Sep 20 09:25:31 container is running - using online migration
Sep 20 09:25:32 container data is on shared storage 'qnapCT'
Sep 20 09:25:32 start live migration - suspending container
Sep 20 09:25:32 dump container state
Sep 20 09:25:35 dump 2nd level quota
Sep 20 09:25:36 initialize container on remote node 'proxmox'
Sep 20 09:25:36 initializing remote quota
Sep 20 09:26:04 turn on remote quota
Sep 20 09:26:04 load 2nd level quota
Sep 20 09:26:04 starting container on remote node 'proxmox'
Sep 20 09:26:04 restore container state
Sep  20 09:26:06 # /usr/bin/ssh -o 'BatchMode=yes' root@192.168.0.1 vzctl  restore 105 --undump --dumpfile /mnt/pve/qnapCT/dump/dump.105  --skip_arpdetect
Sep 20 09:26:05 Restoring container ...
Sep 20 09:26:05 Starting container ...
Sep 20 09:26:05 Container is mounted
Sep 20 09:26:05     undump...
Sep 20 09:26:05 Setting CPU units: 1000
Sep 20 09:26:05 Setting CPUs: 1
Sep 20 09:26:05 Configure veth devices: veth105.0 
Sep 20 09:26:05 Adding interface veth105.0 to bridge vmbr1 on CT0 for CT105
Sep 20 09:26:06 vzquota : (warning) Quota is running for id 105 already
Sep 20 09:26:06 Error: undump failed: No such file or directory
Sep 20 09:26:06 Restoring failed:
Sep 20 09:26:06 Error: rst_open_file: failed to lookup path '/tmp/.nfs0000000001d04c340000024c': -2
Sep 20 09:26:06 Error: can't open file /tmp/.nfs0000000001d04c340000024c
Sep 20 09:26:06 Error: rst_file: -2 114040
Sep 20 09:26:06 Error: rst_files: -2
Sep 20 09:26:06 Error: make_baby: -2
Sep 20 09:26:06 Error: rst_clone_children
Sep 20 09:26:06 Error: make_baby: -2
Sep 20 09:26:06 Error: rst_clone_children
Sep 20 09:26:06 Error: make_baby: -2
Sep 20 09:26:06 Error: rst_clone_children
Sep 20 09:26:06 Container start failed
Sep  20 09:26:06 ERROR: online migrate failure - Failed to restore  container: Can't umount /var/lib/vz/root/105: Device or resource busy
Sep 20 09:26:06 start final cleanup
Sep 20 09:26:06 ERROR: migration finished with problems (duration 00:00:35)
TASK ERROR: migration problems

Migrated CT shows on the new node but is stopped/not runnig. If I manually press start, it starts normally.