PVE 4.1 to 4.2 upgrade: Corrupted CEPH VM images

...
As I already mentoined this wasn't a problem on PVE 4.1 (not applicable to PVE 3.x, since it was not possible to migrate more then 1 VM at the same time at all with PVE 3.x), so seems to be a bug in PVE 4.2.
...

As i remeber i did this in proxmox 3.x with multible browser windows too and i think i was running in the same open knife. :-(
My VM's had filesystem errors too when i restart them. :-(
So i never ever moved more than 1 VM at the same time. :)[/QUOTE]
 
As i remeber i did this in proxmox 3.x with multible browser windows too and i think i was running in the same open knife. :-(
My VM's had filesystem errors too when i restart them. :-(
So i never ever moved more than 1 VM at the same time. :)
[/QUOTE]

can you test with the unsecure migration flag ?
 
Hi Alexandre,

Not a big problem (no VM's are corrupted), but since I use the unsecure migration flag I still do have sometimes a migration problem on 1 VM when using the 'migrate all' option. I have some logs now:

Code:
()
task started by HA resource agent
Jun 06 12:42:30 starting migration of VM 121 to node 'host03' (192.168.110.134)
Jun 06 12:42:30 copying disk images
Jun 06 12:42:30 starting VM 121 on remote node 'host03'
Jun 06 12:42:31 trying to acquire lock... OK
Jun 06 12:42:33 start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 121 --description \''Proxmox VE VM 121'\' -p 'KillMode=none' -p 'CPUShares=1000' /usr/bin/kvm -id 121 -chardev 'socket,id=qmp,path=/var/run/qemu-server/121.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/121.pid -daemonize -smbios 'type=1,uuid=2417a306-e6a4-4e4f-9b24-db282c8ed9c8' -name HOSTNAME -smp '4,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga cirrus -vnc unix:/var/run/qemu-server/121.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 12288 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:13a9185295e7' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100' -drive 'file=rbd:cl1/vm-121-disk-1:mon_host=192.168.110.131\:6789;192.168.110.133\:6789;192.168.110.135\:6789:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/SSD-cluster.keyring,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=200' -drive 'file=/mnt/pve/backup/images/121/vm-121-disk-1.qcow2,if=none,id=drive-virtio1,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' -netdev 'type=tap,id=net0,ifname=tap121i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=8A:A4:08:1D:F4:91,netdev=net0,bus=pci.0,addr=0x12,id=net0' -machine 'type=pc-i440fx-2.5' -incoming tcp:192.168.110.134:60002 -S' failed: exit code 1
Jun 06 12:42:33 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' root@192.168.110.134 qm start 121 --stateuri tcp --skiplock --migratedfrom host05 --machine pc-i440fx-2.5' failed: exit code 255
Jun 06 12:42:33 aborting phase 2 - cleanup resources
Jun 06 12:42:33 migrate_cancel
Jun 06 12:42:33 ERROR: migration finished with problems (duration 00:00:03)
TASK ERROR: migration problems

Code:
Running as unit 121.scope.
kvm: -incoming tcp:192.168.110.134:60002: Failed to bind socket: Address already in use
TASK ERROR: start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 121 --description \''Proxmox VE VM 121'\' -p 'KillMode=none' -p 'CPUShares=1000' /usr/bin/kvm -id 121 -chardev 'socket,id=qmp,path=/var/run/qemu-server/121.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/121.pid -daemonize -smbios 'type=1,uuid=2417a306-e6a4-4e4f-9b24-db282c8ed9c8' -name HOSTNAME -smp '4,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga cirrus -vnc unix:/var/run/qemu-server/121.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 12288 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:13a9185295e7' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100' -drive 'file=rbd:cl1/vm-121-disk-1:mon_host=192.168.110.131\:6789;192.168.110.133\:6789;192.168.110.135\:6789:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/SSD-cluster.keyring,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=200' -drive 'file=/mnt/pve/backup/images/121/vm-121-disk-1.qcow2,if=none,id=drive-virtio1,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' -netdev 'type=tap,id=net0,ifname=tap121i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=8A:A4:08:1D:F4:91,netdev=net0,bus=pci.0,addr=0x12,id=net0' -machine 'type=pc-i440fx-2.5' -incoming tcp:192.168.110.134:60002 -S' failed: exit code 1

Any idea what's causing this and maybe how to fix? Thanks!
 
did you use packages from today? post your pveversion -v.
 
No, I don't. But are there changes in the transfer proces that can fix these errors? Unfortunately I can't update right now, because I'm out for a holiday in about 2 weeks, and I don't want to make any changes to config/servers that are not critical within 2 weeks before I'm out of office for holiday.

Code:
proxmox-ve: 4.2-51 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-5 (running version: 4.2-5/7cf09667)
pve-kernel-4.4.8-1-pve: 4.4.8-51
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-75
pve-firmware: 1.1-8
libpve-common-perl: 4.0-62
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-17
pve-container: 1.0-64
pve-firewall: 2.0-27
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
 

can you test with the unsecure migration flag ?[/QUOTE]

Sorry for the late answer, was verry busy.
I prefer not to touch! Its a production system and i dont want to crash the filesystems.
Maybe im not lucky to repair it again and have to go offline for longer, dont want this nigthmare.