vzctl chkpnt kernel error

damien

New Member
Sep 22, 2009
14
0
1
While I try to migrate a CT, I encounter the following error:

May 14 15:31:37 starting migration of CT 102 to node 'wfsr011' (192.168.192.2)
May 14 15:31:37 container is running - using online migration
May 14 15:31:38 starting rsync phase 1
May 14 15:31:38 # /usr/bin/rsync -aH --delete --numeric-ids --sparse /var/lib/vz/private/102 root@192.168.192.2:/var/lib/vz/private
May 14 15:48:04 start live migration - suspending container
May 14 15:48:04 dump container state
May 14 15:48:04 # vzctl --skiplock chkpnt 102 --dump --dumpfile /var/lib/vz/dump/dump.102
May 14 15:48:04 Setting up checkpoint...
May 14 15:48:04 join context..
May 14 15:48:04 dump...
May 14 15:48:04 Can not dump container: Invalid argument
May 14 15:48:04 Error: page without mapping at b669e000@12084248
May 14 15:48:04 Error: dump_one_vma: funkey page
May 14 15:48:04 ERROR: Failed to dump container state: Checkpointing failed
May 14 15:48:04 aborting phase 1 - cleanup resources
May 14 15:48:04 removing copied files on target node
May 14 15:48:24 start final cleanup
May 14 15:48:24 ERROR: migration aborted (duration 00:16:48): Failed to dump container state: Checkpointing failed
TASK ERROR: migration aborted

I can see the same error in the kernel.log:
CPT ERR: ffff880598c5d000,102 :page without mapping at b669e000@12084248
CPT ERR: ffff880598c5d000,102 :dump_one_vma: funkey page


Do you have a solution for that? I found this bug that might be related: http://bugzilla.openvz.org/show_bug.cgi?id=203
But I can't disable vsyscall as explained.

My environment is:
root@wfsr010:~# uname -a
Linux wfsr010 2.6.32-11-pve #1 SMP Wed Apr 11 07:17:05 CEST 2012 x86_64 GNU/Linux

root@wfsr010:~# vzctl
vzctl version 3.0.30.2-11.git.aefc8ef


thanks a lot in advance.
Regards
 
Last edited:
Please can you test with the latest kernel from the 'pvetest' repository?


Hi, I tested with 2.6.32-12-pve, same problem:

root@wfsr010:~# vzctl chkpnt 102
Setting up checkpoint...
suspend...
dump...
Can not dump container: Invalid argument
Error: page without mapping at b66b6000@23695640
Error: dump_one_vma: funkey page
Checkpointing failed

kern.log:
May 18 18:26:01 wfsr010 kernel: CPT ERR: ffff880600db8000,102 :page without mapping at b66b6000@23695640
May 18 18:26:01 wfsr010 kernel: CPT ERR: ffff880600db8000,102 :dump_one_vma: funkey page
 
Last edited:
post also 'pveversion -v'
 
post also 'pveversion -v'


root@wfsr010:~# pveversion -v
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-12-pve
proxmox-ve-2.6.32: 2.1-68
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-6-pve: 2.6.32-55
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-16
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1
 
How can I reproduce that bug? What OS template/software do you run?


Hello,

I used this template for the CT:
http://download.openvz.org/template/precreated/ubuntu-10.04-x86.tar.gz

I updated (apt-get dist-upgrade) it and modified it slightly (mostly apache conf, mysql, etc...). It is a LAMP server.


A few notes:
_ when the CT is stopped, I can migrate it without problems.
_ I can reproduce this problem with multiple proxmox hosts.
_ I made some tests with an old Gentoo CT a few weeks ago and I did not encountered this problem.

here is its config:
root@wfsr010:~# more /etc/vz/conf/102.conf
ONBOOT="yes"


PHYSPAGES="0:917504"
SWAPPAGES="0:512M"
KMEMSIZE="1708130304:1879048192"
DCACHESIZE="853540864:939524096"
LOCKEDPAGES="458752"
PRIVVMPAGES="unlimited"
SHMPAGES="unlimited"
NUMPROC="unlimited"
VMGUARPAGES="0:unlimited"
OOMGUARPAGES="0:unlimited"
NUMTCPSOCK="unlimited"
NUMFLOCK="unlimited"
NUMPTY="unlimited"
NUMSIGINFO="unlimited"
TCPSNDBUF="unlimited"
TCPRCVBUF="unlimited"
OTHERSOCKBUF="unlimited"
DGRAMRCVBUF="unlimited"
NUMOTHERSOCK="unlimited"
NUMFILE="unlimited"
NUMIPTENT="unlimited"


# Disk quota parameters (in form of softlimit:hardlimit)
DISKSPACE="50G:55G"
DISKINODES="10000000:11000000"
QUOTATIME="0"
QUOTAUGIDLIMIT="0"


# CPU fair scheduler parameter
CPUUNITS="1000"
CPUS="2"
HOSTNAME="wfsv082.XXX.com"
SEARCHDOMAIN="XXX.com"
NAMESERVER="91.121.55.XX 213.186.33.99"
IP_ADDRESS="87.98.186.XX"
VE_ROOT="/var/lib/vz/root/$VEID"
VE_PRIVATE="/var/lib/vz/private/102"
OSTEMPLATE="ubuntu-10.04-x86.tar.gz"
 
I tested with another CT from a 64bits template, and there is the same problem.
I don't know if it might help.

thanks in advance
 
I tested:
# setarch `uname -m` -R vzctl chkpnt 102
and:
# sysctl -w kernel.randomize_va_space=0

as seen here:
www.acsu.buffalo.edu/~charngda/x86assembly.html

but it does not change, I still have the same error:

root@wfsr010:/boot# sysctl -w kernel.randomize_va_space=0
kernel.randomize_va_space = 0
root@wfsr010:/boot# setarch `uname -m` -R vzctl chkpnt 102
Setting up checkpoint...
suspend...
dump...
Can not dump container: Invalid argument
Error: page without mapping at b6696000@155457400
Error: dump_one_vma: funkey page
Checkpointing failed

[FONT=Verdana, Geneva, Lucida, Lucida Grande, Arial, Helvetica, sans-serif]
[/FONT]
 
Did you find anything about this problem? What should I try? Because I'm stuck with proxmox 1.9 for now, I cannot upgrade until this issue is resolved. Thanks in advance.
 
Did you find anything about this problem? What should I try? Because I'm stuck with proxmox 1.9 for now, I cannot upgrade until this issue is resolved. Thanks in advance.



I found the culprit: snort.I you stop snort inside your container before the chkpnt, it works.