Container live migration problems

fabriceg · Jul 5, 2012

Hi,

I got problem to migrate a Container in my cluster of 3 nodes.
The vm migrate to the second node but can't start. I have to start it in the gui.

Code:

Jul 05 15:42:58 starting migration of CT 100 to node 'prox' (*.*.*.155)
Jul 05 15:42:58 container is running - using online migration
Jul 05 15:42:58 container data is on shared storage 'Containers'
Jul 05 15:42:58 start live migration - suspending container
Jul 05 15:42:58 dump container state
Jul 05 15:42:58 dump 2nd level quota
Jul 05 15:43:00 initialize container on remote node 'prox'
Jul 05 15:43:00 initializing remote quota
Jul 05 15:43:12 turn on remote quota
Jul 05 15:43:12 load 2nd level quota
Jul 05 15:43:12 starting container on remote node 'prox'
Jul 05 15:43:12 restore container state
Jul 05 15:43:13 # /usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@*.*.*.155 vzctl restore 100 --undump --dumpfile /mnt/pve/Containers/dump/dump.100 --skip_arpdetect
Jul 05 15:43:12 Restoring container ...
Jul 05 15:43:12 Starting container ...
Jul 05 15:43:12 Container is mounted
Jul 05 15:43:12 undump...
Jul 05 15:43:12 Setting CPU units: 1000
Jul 05 15:43:12 Setting CPUs: 1
Jul 05 15:43:12 Configure veth devices: veth100.0
Jul 05 15:43:12 Adding interface veth100.0 to bridge vmbr0 on CT0 for CT100
Jul 05 15:43:13 vzquota : (warning) Quota is running for id 100 already
Jul 05 15:43:13 Error: undump failed: No such file or directory
Jul 05 15:43:13 Restoring failed:
Jul 05 15:43:13 Error: rst_open_file: failed to lookup path '/var/run/apache2/.nfs0000000000007062000000a8': -2
Jul 05 15:43:13 Error: can't open file /var/run/apache2/.nfs0000000000007062000000a8
Jul 05 15:43:13 Error: rst_file: -2 1269128
Jul 05 15:43:13 Error: rst_files: -2
Jul 05 15:43:13 Error: make_baby: -2
Jul 05 15:43:13 Error: rst_clone_children
Jul 05 15:43:13 Container is unmounted
Jul 05 15:43:13 ERROR: online migrate failure - Failed to restore container: Container start failed
Jul 05 15:43:13 start final cleanup
Jul 05 15:43:13 ERROR: migration finished with problems (duration 00:00:16)
TASK ERROR: migration problems

The file /etc/vz/conf/100.conf contains

Code:

VE_ROOT="/var/lib/vz/root/$VEID"
VE_PRIVATE="/mnt/pve/Containers/private/100"

I got this error too sometimes

Code:

Jul 05 16:03:22 ERROR: online migrate failure - Failed to restore  container: Can't umount /var/lib/vz/root/100: Device or resource busy

Otherwise migration of ours KVM (on shared storage) or ours CT (on local) works great.

fabriceg · Jul 9, 2012

no suggestions ???

dietmar · Jul 10, 2012

See:

http://bugzilla.openvz.org/show_bug.cgi?id=1857
http://bugzilla.openvz.org/show_bug.cgi?id=2242

No fix so far.

fabriceg · Jul 10, 2012

Thank you for the information.

Meanwhile, for container, i will use local storage...

bessome · Nov 15, 2012

dietmar said:
See:

http://bugzilla.openvz.org/show_bug.cgi?id=1857
http://bugzilla.openvz.org/show_bug.cgi?id=2242

No fix so far.

do you have some success wit this problem ?

http://bugzilla.openvz.org/show_bug.cgi?id=2242#c34

dietmar · Nov 15, 2012

That fix is already in our stable repository (uploaded today).

AhmedF · Feb 23, 2013

dietmar said:
That fix is already in our stable repository (uploaded today).

Can you please tell me which kernel version have this fix as I've installed the latest proxmox 2.2 and still face the same bug with NFS?

darinschmidt · Aug 8, 2013

Me too, we are running the latest PVE3 and the latest kernel and have this issue....

bessome · Aug 9, 2013

we are migrating our containers to the proxmox 3.0 and the problem with mysql came back.

Code:

Aug 09 11:13:49 Error: undump failed: No such file or directory
Aug 09 11:13:49 Restoring failed:
Aug 09 11:13:49 Error: rst_open_file: failed to lookup path '/tmp/.nfs0000000002fe193c000001be': -2
Aug 09 11:13:49 Error: can't open file /tmp/.nfs0000000002fe193c000001be
Aug 09 11:13:49 Error: rst_file: -2 109248
Aug 09 11:13:49 Error: rst_files: -2
Aug 09 11:13:49 Error: make_baby: -2
Aug 09 11:13:49 Error: rst_clone_children
Aug 09 11:13:49 Error: make_baby: -2
Aug 09 11:13:49 Error: rst_clone_children
Aug 09 11:13:49 Error: make_baby: -2
Aug 09 11:13:49 Error: rst_clone_children
Aug 09 11:13:49 Container start failed
Aug 09 11:13:49 ERROR: online migrate failure - Failed to restore container: Can't umount /var/lib/vz/root/103: Device or resource busy

bessome · Aug 14, 2013

here is my configuration:

root@v3:~# pveversion -v
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-22-pve
proxmox-ve-2.6.32: 3.0-107
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-22-pve: 2.6.32-107
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-23
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1

dietmar · Aug 14, 2013

bessome said:
we are migrating our containers to the proxmox 3.0 and the problem with mysql came back.

We need a reproducible test case - so how can I reproduce it?

bessome · Aug 14, 2013

i installed CT from the template debian-6-turnkey-twiki_12.1-1_i386.tar.gz and try to migrate :

Code:

[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: undump failed: No such file or directory[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Restoring failed:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_open_file: failed to lookup path '/var/run/apache2/.nfs0000000003720a120000007c': -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: can't open file /var/run/apache2/.nfs0000000003720a120000007c[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_file: -2 96448[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_files: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: make_baby: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_clone_children[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: make_baby: -2[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Aug 14 13:30:49 Error: rst_clone_children[/FONT][/COLOR]

bessome · Aug 29, 2013

May be it fixed in 3.1 version?

trs · Nov 25, 2013

Good morning.

This issue is still occuring in Proxmox 3.1:

Code:

Nov 25 10:53:10 starting migration of CT 200 to node 'calcium' (10.1.1.13)
Nov 25 10:53:10 container is running - using online migration
Nov 25 10:53:10 container data is on shared storage 'infra'
Nov 25 10:53:10 start live migration - suspending container
Nov 25 10:53:10 dump container state
Nov 25 10:53:12 dump 2nd level quota
Nov 25 10:53:13 initialize container on remote node 'calcium'
Nov 25 10:53:14 initializing remote quota
Nov 25 10:59:42 turn on remote quota
Nov 25 10:59:42 load 2nd level quota
Nov 25 10:59:42 starting container on remote node 'calcium'
Nov 25 10:59:42 restore container state
Nov 25 10:59:43 # /usr/bin/ssh -o 'BatchMode=yes' root@10.1.1.13 vzctl restore 200 --undump --dumpfile /mnt/pve/infra/dump/dump.200 --skip_arpdetect
Nov 25 10:59:42 Restoring container ...
Nov 25 10:59:42 Starting container ...
Nov 25 10:59:43 Container is mounted
Nov 25 10:59:43 undump...
Nov 25 10:59:43 Setting CPU units: 1000
Nov 25 10:59:43 Setting CPUs: 1
Nov 25 10:59:43 Configure veth devices: veth200.0
Nov 25 10:59:43 Adding interface veth200.0 to bridge vmbr0 on CT0 for CT200
Nov 25 10:59:43 vzquota : (warning) Quota is running for id 200 already
Nov 25 10:59:43 Error: undump failed: Input/output error
Nov 25 10:59:43 Restoring failed:
Nov 25 10:59:43 Error: rst_file: failed to fix up file content: -5
Nov 25 10:59:43 Error: rst_file: -5 90496
Nov 25 10:59:43 Error: rst_files: -5
Nov 25 10:59:43 Error: make_baby: -5
Nov 25 10:59:43 Error: rst_clone_children
Nov 25 10:59:43 Container is unmounted
Nov 25 10:59:43 ERROR: online migrate failure - Failed to restore container: Container start failed
Nov 25 10:59:44 start final cleanup
Nov 25 10:59:44 ERROR: migration finished with problems (duration 00:06:35)
TASK ERROR: migration problems

Setup:
- proxmox 3.1 running kernel 2.6.32-23-pve
- container data stored on glusterFS shared storage
- small test container running debian 7 and very few things
- running also test program from https://bugzilla.openvz.org/show_bug.cgi?id=2242#c26

Code:

  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 init [2]      
  972 ?        Ss     0:00 /sbin/rpcbind -w
 1062 ?        Sl     0:00 /usr/sbin/rsyslogd -c5
 1224 ?        Ss     0:00 /usr/lib/postfix/master
 1238 ?        S      0:00  \_ pickup -l -t fifo -u -c
 1239 ?        S      0:00  \_ qmgr -l -t fifo -u
 1236 ?        Ss+    0:00 /sbin/getty 38400 tty1
    2 ?        S      0:00 [kthreadd/200]
    3 ?        S      0:00  \_ [khelper/200]
 2616 ?        S      0:00 ./test
 2645 ?        Ss     0:00 vzctl: pts/1   
 2646 pts/1    Ss     0:00  \_ -bash
 2656 pts/1    R+     0:00      \_ ps axf

Results:
- migration fails when test program is running
- migration succeeds when test program is not running

root@potassium:~# pveversion -v
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2

Haven't tried with NFS storage but seems to be same behaviour with glusterFS storage.

Search

Search

Container live migration problems

fabriceg

New Member

fabriceg

New Member

dietmar

Proxmox Staff Member

fabriceg

New Member

bessome

Renowned Member

dietmar

Proxmox Staff Member

AhmedF

Renowned Member

darinschmidt

New Member

bessome

Renowned Member

bessome

Renowned Member

dietmar

Proxmox Staff Member

bessome

Renowned Member

bessome

Renowned Member

trs

New Member