vzmigrate fails

gkovacs

Renowned Member
Dec 22, 2008
516
51
93
Budapest, Hungary
We are running a two node cluster, both 1.9 with kernel 2.6.32-4.
I'm trying to migrate containers but they consistently fail on the web interface.

Trying from shell:
Code:
proxmox2:~# vzmigrate --online -vv --keep-dst x.x.x.x 116
Starting online migration of CT 116 to x.x.x.x
OpenVZ is running...
   Loading /etc/vz/vz.conf and /etc/vz/conf/116.conf files
   Check IPs on destination node: x.x.x.116
Preparing remote node
   Copying config file
116.conf                                                                          100% 1549     1.5KB/s   00:00
No changes in CT configuration, not saving
   Creating remote container root dir
   Creating remote container private dir
Initializing remote quota
   Quota init
   Turning remote quota on
Syncing private
sending incremental file list
116/

sent 1787934 bytes  received 15265 bytes  721279.60 bytes/sec
total size is 993914060  speedup is 551.19
Live migrating container...
   Suspending container
Setting up checkpoint...
        suspend...
        get context...
Checkpointing completed succesfully
   Dumping container
Setting up checkpoint...
        join context..
        dump...
[COLOR=#ff0000]Can not dump container: Invalid argument
Error: iptables-save exited with 255
Checkpointing failed
Error: Failed to dump container
Resuming...[/COLOR]

This is a serious problem for us. Any idea where to go next?
 
Last edited:
you need to use a stable OpenVZ kernel branch. either use the 2.6.18 kernel branch or 2.6.32-6.
if you use 2.6.32-6, you also need to disable the init.logger.
 
as I wrote, either 2.6.18 or 2.6.32-6.
 
as I wrote, either 2.6.18 or 2.6.32-6.

Sorry I don't understand the point you are making. Is migrate functionality unsupported on our kernel version?
If I remember correctly 2.6.32-4 was a stable kernel a couple of months ago...
 
Last edited:
no idea why you ignoring constantly my hints. for detail about kernels see http://pve.proxmox.com/wiki/Proxmox_VE_Kernel

2.6.32-4 was never the recommended and stable OpenVZ branch. if you take a look on the history in the wiki you will see.

but anyways, if you want a running online migration, follow my hints. if you don´t help, stop asking me.
 
Well, we can't use 2.6.18 (too old) or 2.6.32-6 (freezes our server), so that's the end of the online migration story.
Fortunately regular migration still works, which will have to do then.

Thanks for your help Tom.
 
for the kernel panic with 2.6.32.-6. we still cannot find a reliable way to reproduce it in our lab.
 
ok, you do not have a testcase for us?
 
Well I've already shared the software and hardware configuration of the server that produces the kernel panic, with screenshots. Not sure what you mean by test case.

The server in question is being reinstalled today, so before we move back the containers to it, I can run a couple of tests.
Tell me how can I help.
 
I have here a lot of servers here and I need a simple howto to reproduce the issue. so far I stressed my servers with load, running backups and benchmarks but I just get no kernel panic. as soon as got I the testcase, I can start debugging and try to find the issue. its more or less impossible to track it down without this - I also run several production servers with this kernel with no issues at all.

so if anyone can define a way to trigger the kernel panic, very welcome.
 
Let me try to gather all information I have:

- Intel Q6600 CPU, P45 board, 8GB DDR2 RAM, Adaptec RAID (C1E and EIST disabled in BIOS)
- 2.6.32-6 kernel started with elevator=deadline and clocksource=hpet options
- we were running two MySQL containers (5.0 and 5.1, both on Ubuntu) with heavy traffic
- mysqld processes caused the kernel panic both times (our other, similarly configured server without mysqld did not crash)
- when we set 1 cpu for the containers the server was stable for a day
- setting 2 or 4 cpus for mysql container caused the kernel panic in 3 hours
- snapshot vzdump backup was probably running at the same time to an nfs share

I will try to reproduce it later tonight and will let you know if I can find a way to do it without risking our production databases.
 
Last edited: