proxmox 2.1 - openvz live migration fails

ibsa

New Member
Sep 26, 2012
6
0
1
I have a reproducible migration issue with openvz live migration. Pveversion output below. Any ideas appreciated :confused:

Sep 26 11:41:33 starting migration of CT 7002 to node 'pvebc-master' (10.254.254.251)
Sep 26 11:41:33 container is running - using online migration
Sep 26 11:41:33 starting rsync phase 1
Sep 26 11:41:33 # /usr/bin/rsync -aHAX --delete --numeric-ids --sparse /var/lib/vz/private/7002 root@10.254.254.251:/var/lib/vz/private
Sep 26 11:45:42 start live migration - suspending container
Sep 26 11:45:42 dump container state
Sep 26 11:45:52 copy dump file to target node
Sep 26 11:46:34 starting rsync (2nd pass)
Sep 26 11:46:34 # /usr/bin/rsync -aHAX --delete --numeric-ids /var/lib/vz/private/7002 root@10.254.254.251:/var/lib/vz/private
Sep 26 11:46:39 dump 2nd level quota
Sep 26 11:46:39 copy 2nd level quota to target node
Sep 26 11:46:41 initialize container on remote node 'pvebc-master'
Sep 26 11:46:41 initializing remote quota
Sep 26 11:46:42 turn on remote quota
Sep 26 11:46:42 load 2nd level quota
Sep 26 11:46:42 starting container on remote node 'pvebc-master'
Sep 26 11:46:42 restore container state
Sep 26 11:46:44 # /usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@10.254.254.251 vzctl restore 7002 --undump --dumpfile /var/lib/vz/dump/dump.7002 --skip_arpdetect
Sep 26 11:46:42 Restoring container ...
Sep 26 11:46:42 Starting container ...
Sep 26 11:46:42 Container is mounted
Sep 26 11:46:42 undump...
Sep 26 11:46:42 Adding IP address(es): 10.5.0.139
Sep 26 11:46:42 Setting CPU units: 1000
Sep 26 11:46:42 Setting CPUs: 2
Sep 26 11:46:44 vzquota : (warning) Quota is running for id 7002 already
Sep 26 11:46:44 Error: undump failed: Bad address
Sep 26 11:46:44 Restoring failed:
Sep 26 11:46:44 Error: do_rst_mm: failed to restore vma: -14
Sep 26 11:46:44 Error: do_rst_mm 17381512
Sep 26 11:46:44 Error: rst_mm: -14
Sep 26 11:46:44 Error: make_baby: -14
Sep 26 11:46:44 Error: rst_clone_children
Sep 26 11:46:44 Error: make_baby: -14
Sep 26 11:46:44 Error: rst_clone_children
Sep 26 11:46:44 Container start failed
Sep 26 11:46:44 ERROR: online migrate failure - Failed to restore container: Can't umount /var/lib/vz/root/7002: Device or resource busy
Sep 26 11:46:44 removing container files on local node
Sep 26 11:46:49 start final cleanup
Sep 26 11:46:51 ERROR: migration finished with problems (duration 00:05:19)
TASK ERROR: migration problems

#pveversion -v
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1
 
What OS/services runs inside the container - something special?

The containers run jetty and postgresql on ubuntu 10.04LTS. These containers are from vzbackups created with a proxmox 1.9 host. I wouldn't consider these networked services to be anything special.


A search of these forums show a number of almost-similar threads, but I couldn't find anything related to "Error: undump failed: Bad address" What does this error mean?


EDIT: I found the following odd entry on the source (migrate-from) machine.

Oct 2 13:41:54 pvebc-master kernel: Holy Crap 1 0 543383,262(jsvc)

The timestamp corresponds to the beginning of the migration process.

After enabling debug logging (??) with "echo 3 > /proc/sys/debug/rst", the destination machine produces over 700 lines of log, here are some examples:

CPT DBG: ffff880681e05000,1460: JFixup 5699 14559378000
CPT WRN: ffff880681e05000,1460: SNMP stats trimmed
CPT WRN: ffff880681e05000,1460: VMA 7fb4ada91000@2599120 flag mismatch 08201875 08001875
CPT DBG: ffff880681e05000,1460: rst_file: file obtained by dentry_open
CPT WRN: ffff880681e05000,1460: file 2378456 mode mismatch 00000012 00000002
CPT DBG: ffff880681e05000,1460: file is attached to a socket
CPT WRN: ffff880681e05000,1460: fixup_file_flags: oops... creds mismatch
CPT DBG: ffff880681e05000,1460: bsd lock restored
CPT DBG: ffff880681e05000,1460: 430712,162(syslog-ng)open RDONLY fifo ino 2349704 ffff88068ab885c0 11a0
CPT DBG: ffff880681e05000,1460: restoring SHM block: 00000000-00034000
CPT DBG: ffff880681e05000,1460: vma pgoff mismatch, fixing
CPT DBG: ffff880681e05000,1460: vma 6234224 merged, split

CPT ERR: ffff880681e05000,1460 :do_rst_mm: failed to restore vma: -14
CPT ERR: ffff880681e05000,1460 :do_rst_mm 6647344
CPT ERR: ffff880681e05000,1460 :rst_mm: -14
CPT DBG: ffff880681e05000,1460: leaked through 430719/249 ffff8807d437d9c0
CPT ERR: ffff880681e05000,1460 :make_baby: -14
CPT ERR: ffff880681e05000,1460 :rst_clone_children
CPT DBG: ffff880681e05000,1460: leaked through 430718/248 ffff88043d884bc0
CPT ERR: ffff880681e05000,1460 :make_baby: -14
CPT ERR: ffff880681e05000,1460 :rst_clone_children
CPT DBG: ffff880681e05000,1460: leaked through 430681/1 ffff880791df27c0
CPT DBG: ffff880681e05000,1460: leaked through 430707/150 ffff88043da4c280
CPT DBG: ffff880681e05000,1460: leaked through 430715/180 ffff88077ed942c0
CPT DBG: ffff880681e05000,1460: leaked through 430711/164 ffff88083d854c00
CPT DBG: ffff880681e05000,1460: leaked through 430708/153 ffff8803de4c8840
CPT DBG: ffff880681e05000,1460: leaked through 430714/179 ffff88083d8a66c0
CPT DBG: ffff880681e05000,1460: leaked through 430709/155 ffff8803de4c8dc0
CPT DBG: ffff880681e05000,1460: leaked through 430713/176 ffff88043d8840c0
CPT DBG: ffff880681e05000,1460: leaked through 430717/182 ffff88083d8a71c0
CPT DBG: ffff880681e05000,1460: leaked through 430710/161 ffff88043d934c80
CPT DBG: ffff880681e05000,1460: leaked through 430716/181 ffff88077ed958c0
CPT DBG: ffff880681e05000,1460: leaked through 430712/162 ffff8803de4c9340
CT: 1460: stopped




Can anyone suggest additional tests to perform or workarounds that would get online openvz migration working reliably.

thanks..
 
Last edited:
Also, what is the significance of the quota warning?

Sep 26 11:46:44 vzquota : (warning) Quota is running for id 7002 already
 
You can ignore that.

Ok, that's good to know as it appears frequently.

Have you any ideas about the root cause of the migration failure?
I see a number of suspicious log entries but don't know what is relevant/important.

Sep 26 11:46:44 Error: undump failed: Bad address
Sep 26 11:46:44 Restoring failed:
Sep 26 11:46:44 Error: do_rst_mm: failed to restore vma: -14

cheers
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!