Container live migration problems

fabriceg

New Member
May 2, 2012
6
0
1
Talence
Hi,

I got problem to migrate a Container in my cluster of 3 nodes.
The vm migrate to the second node but can't start. I have to start it in the gui.

Code:
Jul 05 15:42:58 starting migration of CT 100 to node 'prox' (*.*.*.155)
Jul 05 15:42:58 container is running - using online migration
Jul 05 15:42:58 container data is on shared storage 'Containers'
Jul 05 15:42:58 start live migration - suspending container
Jul 05 15:42:58 dump container state
Jul 05 15:42:58 dump 2nd level quota
Jul 05 15:43:00 initialize container on remote node 'prox'
Jul 05 15:43:00 initializing remote quota
Jul 05 15:43:12 turn on remote quota
Jul 05 15:43:12 load 2nd level quota
Jul 05 15:43:12 starting container on remote node 'prox'
Jul 05 15:43:12 restore container state
Jul 05 15:43:13 # /usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@*.*.*.155 vzctl restore 100 --undump --dumpfile /mnt/pve/Containers/dump/dump.100 --skip_arpdetect
Jul 05 15:43:12 Restoring container ...
Jul 05 15:43:12 Starting container ...
Jul 05 15:43:12 Container is mounted
Jul 05 15:43:12 undump...
Jul 05 15:43:12 Setting CPU units: 1000
Jul 05 15:43:12 Setting CPUs: 1
Jul 05 15:43:12 Configure veth devices: veth100.0
Jul 05 15:43:12 Adding interface veth100.0 to bridge vmbr0 on CT0 for CT100
Jul 05 15:43:13 vzquota : (warning) Quota is running for id 100 already
Jul 05 15:43:13 Error: undump failed: No such file or directory
Jul 05 15:43:13 Restoring failed:
Jul 05 15:43:13 Error: rst_open_file: failed to lookup path '/var/run/apache2/.nfs0000000000007062000000a8': -2
Jul 05 15:43:13 Error: can't open file /var/run/apache2/.nfs0000000000007062000000a8
Jul 05 15:43:13 Error: rst_file: -2 1269128
Jul 05 15:43:13 Error: rst_files: -2
Jul 05 15:43:13 Error: make_baby: -2
Jul 05 15:43:13 Error: rst_clone_children
Jul 05 15:43:13 Container is unmounted
Jul 05 15:43:13 ERROR: online migrate failure - Failed to restore container: Container start failed
Jul 05 15:43:13 start final cleanup
Jul 05 15:43:13 ERROR: migration finished with problems (duration 00:00:16)
TASK ERROR: migration problems

The file /etc/vz/conf/100.conf contains
Code:
VE_ROOT="/var/lib/vz/root/$VEID"
VE_PRIVATE="/mnt/pve/Containers/private/100"

I got this error too sometimes
Code:
Jul 05 16:03:22 ERROR: online migrate failure - Failed to restore  container: Can't umount /var/lib/vz/root/100: Device or resource busy

Otherwise migration of ours KVM (on shared storage) or ours CT (on local) works great.
 
That fix is already in our stable repository (uploaded today).

Can you please tell me which kernel version have this fix as I've installed the latest proxmox 2.2 and still face the same bug with NFS?
 
we are migrating our containers to the proxmox 3.0 and the problem with mysql came back.

Code:
Aug 09 11:13:49 Error: undump failed: No such file or directory
Aug 09 11:13:49 Restoring failed:
Aug 09 11:13:49 Error: rst_open_file: failed to lookup path '/tmp/.nfs0000000002fe193c000001be': -2
Aug 09 11:13:49 Error: can't open file /tmp/.nfs0000000002fe193c000001be
Aug 09 11:13:49 Error: rst_file: -2 109248
Aug 09 11:13:49 Error: rst_files: -2
Aug 09 11:13:49 Error: make_baby: -2
Aug 09 11:13:49 Error: rst_clone_children
Aug 09 11:13:49 Error: make_baby: -2
Aug 09 11:13:49 Error: rst_clone_children
Aug 09 11:13:49 Error: make_baby: -2
Aug 09 11:13:49 Error: rst_clone_children
Aug 09 11:13:49 Container start failed
Aug 09 11:13:49 ERROR: online migrate failure - Failed to restore container: Can't umount /var/lib/vz/root/103: Device or resource busy
 
Last edited:
here is my configuration:

root@v3:~# pveversion -v
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-22-pve
proxmox-ve-2.6.32: 3.0-107
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-22-pve: 2.6.32-107
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-23
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1
 
i installed CT from the template debian-6-turnkey-twiki_12.1-1_i386.tar.gz and try to migrate :

Code:
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: undump failed: No such file or directory[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Restoring failed:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_open_file: failed to lookup path '/var/run/apache2/.nfs0000000003720a120000007c': -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: can't open file /var/run/apache2/.nfs0000000003720a120000007c[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_file: -2 96448[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_files: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: make_baby: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_clone_children[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: make_baby: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_clone_children[/FONT][/COLOR]
 
Last edited:
Good morning.

This issue is still occuring in Proxmox 3.1:

Code:
Nov 25 10:53:10 starting migration of CT 200 to node 'calcium' (10.1.1.13)
Nov 25 10:53:10 container is running - using online migration
Nov 25 10:53:10 container data is on shared storage 'infra'
Nov 25 10:53:10 start live migration - suspending container
Nov 25 10:53:10 dump container state
Nov 25 10:53:12 dump 2nd level quota
Nov 25 10:53:13 initialize container on remote node 'calcium'
Nov 25 10:53:14 initializing remote quota
Nov 25 10:59:42 turn on remote quota
Nov 25 10:59:42 load 2nd level quota
Nov 25 10:59:42 starting container on remote node 'calcium'
Nov 25 10:59:42 restore container state
Nov 25 10:59:43 # /usr/bin/ssh -o 'BatchMode=yes' root@10.1.1.13 vzctl restore 200 --undump --dumpfile /mnt/pve/infra/dump/dump.200 --skip_arpdetect
Nov 25 10:59:42 Restoring container ...
Nov 25 10:59:42 Starting container ...
Nov 25 10:59:43 Container is mounted
Nov 25 10:59:43 undump...
Nov 25 10:59:43 Setting CPU units: 1000
Nov 25 10:59:43 Setting CPUs: 1
Nov 25 10:59:43 Configure veth devices: veth200.0
Nov 25 10:59:43 Adding interface veth200.0 to bridge vmbr0 on CT0 for CT200
Nov 25 10:59:43 vzquota : (warning) Quota is running for id 200 already
Nov 25 10:59:43 Error: undump failed: Input/output error
Nov 25 10:59:43 Restoring failed:
Nov 25 10:59:43 Error: rst_file: failed to fix up file content: -5
Nov 25 10:59:43 Error: rst_file: -5 90496
Nov 25 10:59:43 Error: rst_files: -5
Nov 25 10:59:43 Error: make_baby: -5
Nov 25 10:59:43 Error: rst_clone_children
Nov 25 10:59:43 Container is unmounted
Nov 25 10:59:43 ERROR: online migrate failure - Failed to restore container: Container start failed
Nov 25 10:59:44 start final cleanup
Nov 25 10:59:44 ERROR: migration finished with problems (duration 00:06:35)
TASK ERROR: migration problems

Setup:
- proxmox 3.1 running kernel 2.6.32-23-pve
- container data stored on glusterFS shared storage
- small test container running debian 7 and very few things
- running also test program from https://bugzilla.openvz.org/show_bug.cgi?id=2242#c26

Code:
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 init [2]      
  972 ?        Ss     0:00 /sbin/rpcbind -w
 1062 ?        Sl     0:00 /usr/sbin/rsyslogd -c5
 1224 ?        Ss     0:00 /usr/lib/postfix/master
 1238 ?        S      0:00  \_ pickup -l -t fifo -u -c
 1239 ?        S      0:00  \_ qmgr -l -t fifo -u
 1236 ?        Ss+    0:00 /sbin/getty 38400 tty1
    2 ?        S      0:00 [kthreadd/200]
    3 ?        S      0:00  \_ [khelper/200]
 2616 ?        S      0:00 ./test
 2645 ?        Ss     0:00 vzctl: pts/1   
 2646 pts/1    Ss     0:00  \_ -bash
 2656 pts/1    R+     0:00      \_ ps axf

Results:
- migration fails when test program is running
- migration succeeds when test program is not running

root@potassium:~# pveversion -v
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2


Haven't tried with NFS storage but seems to be same behaviour with glusterFS storage.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!