Hi All,
Hopefully you can help. Just installed 2 new Proxmox VE 2.2 servers. Using the E5 processors - each with 64GB memory. I have set them up talking to a Ceph cluster which I have been running with for sometime. I started with one box, which was using Ceph perfectly. VMs would create and run without a problem. I then added the second server into the cluster etc, and am able to create VMs on this server and migrate a running VM from the original server to the second one (same setup) but cannot migrate it back from the second server to the original one. If I stop the VM on the second server and choose an offline migration this works without a problem. The VM will run on the second server without a problem, so I know that my shared storage (Ceph) is working perfectly from both systems.
These are the errors I get from the migration process:
Nov 14 15:07:12 starting migration of VM 101 to node 'ihv1' (192.168.0.1)
Nov 14 15:07:12 copying disk images
Nov 14 15:07:12 starting VM 101 on remote node 'ihv1'
Nov 14 15:07:13 ERROR: online migrate failure - command '/usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@192.168.0.1 qm start 101 --stateuri tcp --skiplock --migratedfrom ihv2' failed: exit code 255
Nov 14 15:07:13 aborting phase 2 - cleanup resources
Nov 14 15:07:13 ERROR: migration finished with problems (duration 00:00:03)
TASK ERROR: migration problems
The other error I get when the original server tries to start the machine is:
TASK ERROR: start failed: command '/usr/bin/kvm -id 101 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/101.vnc,x509,password -pidfile /var/run/qemu-server/101.pid -daemonize -name test.tester -smp 'sockets=1,cores=4' -cpu host -nodefaults -boot 'menu=on' -vga cirrus -k en-gb -m 768 -cpuunits 1000 -usbdevice tablet -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=rbd:rbd/vm-101-disk-1:id=admin:auth_supported=cephx\;none:keyring=/etc/pve/priv/ceph/CloudFlex.keyring:mon_host=192.168.10.10\:6789,if=none,id=drive-sata0,cache=writethrough,aio=native' -device 'ide-drive,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' -netdev 'type=user,id=net0,hostname=test.tester' -device 'rtl8139,mac=6E:0C:BE:2B:42:6F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -incoming tcp:localhost:60000 -S' failed: exit code 1
Hopefully someone can offer advice. Had a working 2.1 cluster before which seemed to be perfect.
Warren.
Hopefully you can help. Just installed 2 new Proxmox VE 2.2 servers. Using the E5 processors - each with 64GB memory. I have set them up talking to a Ceph cluster which I have been running with for sometime. I started with one box, which was using Ceph perfectly. VMs would create and run without a problem. I then added the second server into the cluster etc, and am able to create VMs on this server and migrate a running VM from the original server to the second one (same setup) but cannot migrate it back from the second server to the original one. If I stop the VM on the second server and choose an offline migration this works without a problem. The VM will run on the second server without a problem, so I know that my shared storage (Ceph) is working perfectly from both systems.
These are the errors I get from the migration process:
Nov 14 15:07:12 starting migration of VM 101 to node 'ihv1' (192.168.0.1)
Nov 14 15:07:12 copying disk images
Nov 14 15:07:12 starting VM 101 on remote node 'ihv1'
Nov 14 15:07:13 ERROR: online migrate failure - command '/usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@192.168.0.1 qm start 101 --stateuri tcp --skiplock --migratedfrom ihv2' failed: exit code 255
Nov 14 15:07:13 aborting phase 2 - cleanup resources
Nov 14 15:07:13 ERROR: migration finished with problems (duration 00:00:03)
TASK ERROR: migration problems
The other error I get when the original server tries to start the machine is:
TASK ERROR: start failed: command '/usr/bin/kvm -id 101 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/101.vnc,x509,password -pidfile /var/run/qemu-server/101.pid -daemonize -name test.tester -smp 'sockets=1,cores=4' -cpu host -nodefaults -boot 'menu=on' -vga cirrus -k en-gb -m 768 -cpuunits 1000 -usbdevice tablet -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=rbd:rbd/vm-101-disk-1:id=admin:auth_supported=cephx\;none:keyring=/etc/pve/priv/ceph/CloudFlex.keyring:mon_host=192.168.10.10\:6789,if=none,id=drive-sata0,cache=writethrough,aio=native' -device 'ide-drive,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' -netdev 'type=user,id=net0,hostname=test.tester' -device 'rtl8139,mac=6E:0C:BE:2B:42:6F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -incoming tcp:localhost:60000 -S' failed: exit code 1
Hopefully someone can offer advice. Had a working 2.1 cluster before which seemed to be perfect.
Warren.