HA migration not working

pepex7

New Member
Sep 4, 2023
19
1
1
I'm having trouble migrating VMs when shutting down a node (failover doesn't work).
It is a 3 node cluster (Dell 2x R710 y R510) with Proxmox 8.0.3 and ceph version 17.2.6 quincy.
VM test to migrate:

Code:
/etc/pve/101.conf: No such file or directory
root@svr1:/etc/pve# cat /etc/pve/qemu-server/101.conf
bios: ovmf
boot: order=sata0;ide2;net0
cores: 20
cpu: x86-64-v2-AES
efidisk0: poolceph:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
memory: 20480
meta: creation-qemu=8.0.2,ctime=1697583668
name: testvm01
net0: virtio=4A:9E:07:3C:D0:40,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
sata0: poolceph:vm-101-disk-1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=4ce83b96-4732-4e5d-b10b-3a3ae9bb7ca0
sockets: 2
vmgenid: f2f6b625-9878-4fe1-badb-c72a9a33f0e2
 
Hi,
please share the files resulting from journalctl --since "-48h" -u pve-ha-crm.service -u pve-ha-lrm.service > /tmp/ha-srv1.log on srv1 and the same for srv2 and srv3. (Assuming you did it in the last 48 hours, otherwise, you need to adapt that).
 
  • Like
Reactions: pepex7
It's three times the same log, only from the srv1.
Just telling from the log, it seems to have worked most of the times, i.e. the migration tasks were succuessful:
Code:
Oct 18 18:25:41 svr1 pve-ha-lrm[2108]: got shutdown request with shutdown policy 'migrate'
Oct 18 18:25:41 svr1 pve-ha-lrm[2108]: shutdown LRM, doing maintenance, removing this node from active list
Oct 18 18:25:47 svr1 pve-ha-lrm[2108]: status change active => maintenance
Oct 18 18:25:47 svr1 pve-ha-lrm[26297]: shutdown CT 102: UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26294]: <root@pam> starting task UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26295]: <root@pam> starting task UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26296]: <root@pam> starting task UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:
Oct 18 18:25:50 svr1 pve-ha-lrm[26294]: <root@pam> end task UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam: OK
Oct 18 18:25:50 svr1 pve-ha-lrm[26294]: <root@pam> starting task UPID:svr1:00006709:0006FC48:65304D5E:vzmigrate:102:root@pam:
Oct 18 18:25:52 svr1 pve-ha-lrm[26294]: <root@pam> end task UPID:svr1:00006709:0006FC48:65304D5E:vzmigrate:102:root@pam: OK
Oct 18 18:25:52 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:25:52 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:25:57 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:25:57 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:02 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:02 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26296]: <root@pam> end task UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam: OK
Oct 18 18:26:12 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:17 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:21 svr1 pve-ha-lrm[26295]: <root@pam> end task UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam: OK

One time there seem to be temporary problems, but it succeeded in the end:
Code:
Oct 18 23:10:24 svr1 pve-ha-lrm[2109]: got shutdown request with shutdown policy 'migrate'
Oct 18 23:10:24 svr1 pve-ha-lrm[2109]: shutdown LRM, doing maintenance, removing this node from active list
Oct 18 23:10:33 svr1 pve-ha-lrm[2109]: status change active => maintenance
Oct 18 23:10:33 svr1 pve-ha-lrm[7064]: <root@pam> starting task UPID:svr1:00001B99:0000B890:65309019:qmigrate:100:root@pam:
Oct 18 23:10:36 svr1 pve-ha-lrm[7065]: migration problems
Oct 18 23:10:36 svr1 pve-ha-lrm[7064]: <root@pam> end task UPID:svr1:00001B99:0000B890:65309019:qmigrate:100:root@pam: migration problems
Oct 18 23:10:36 svr1 pve-ha-lrm[7064]: service vm:100 not moved (migration error)
Oct 18 23:10:43 svr1 pve-ha-lrm[7071]: <root@pam> starting task UPID:svr1:00001BA0:0000BCBA:65309023:qmigrate:100:root@pam:
Oct 18 23:10:47 svr1 pve-ha-lrm[7072]: migration problems
Oct 18 23:14:22 svr1 pve-ha-lrm[7225]: <root@pam> end task UPID:svr1:00001C3A:000110AD:653090FA:qmigrate:100:root@pam: migration problems
Oct 18 23:14:22 svr1 pve-ha-lrm[7225]: service vm:100 not moved (migration error)
Oct 18 23:14:28 svr1 pve-ha-lrm[7232]: <root@pam> starting task 
...
UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:
Oct 18 23:14:33 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:38 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:43 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:48 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:53 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:54 svr1 pve-ha-lrm[7232]: <root@pam> end task UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam: OK
You can check the migration task logs for more information (e.g. in srv1 > Task History) in the UI.


And one time it seems everything was shut down at the same time? Wihout quorum, there won't be migrations then of course.
Code:
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: status change slave => master
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: node 'svr2': state changed from 'online' => 'maintenance'
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: node 'svr3': state changed from 'online' => 'maintenance'
Oct 19 00:30:21 svr1 systemd[1]: Stopping pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: received signal TERM
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: got shutdown request with shutdown policy 'migrate'
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: reboot LRM, doing maintenance, removing this node from active list
Oct 19 00:30:28 svr1 pve-ha-lrm[2095]: lost lock 'ha_agent_svr1_lock - cfs lock update failed - Permission denied
Oct 19 00:30:29 svr1 pve-ha-crm[2086]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Oct 19 00:30:33 svr1 pve-ha-lrm[2095]: status change active => lost_agent_lock
Oct 19 00:30:33 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: status change master => lost_manager_lock
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: watchdog closed (disabled)
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: status change lost_manager_lock => wait_for_quorum
Oct 19 00:30:38 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:43 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:48 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:53 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:58 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:31:03 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
 
  • Like
Reactions: pepex7
It's three times the same log, only from the srv1.
Just telling from the log, it seems to have worked most of the times, i.e. the migration tasks were succuessful:
Code:
Oct 18 18:25:41 svr1 pve-ha-lrm[2108]: got shutdown request with shutdown policy 'migrate'
Oct 18 18:25:41 svr1 pve-ha-lrm[2108]: shutdown LRM, doing maintenance, removing this node from active list
Oct 18 18:25:47 svr1 pve-ha-lrm[2108]: status change active => maintenance
Oct 18 18:25:47 svr1 pve-ha-lrm[26297]: shutdown CT 102: UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26294]: <root@pam> starting task UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26295]: <root@pam> starting task UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26296]: <root@pam> starting task UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:
Oct 18 18:25:50 svr1 pve-ha-lrm[26294]: <root@pam> end task UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam: OK
Oct 18 18:25:50 svr1 pve-ha-lrm[26294]: <root@pam> starting task UPID:svr1:00006709:0006FC48:65304D5E:vzmigrate:102:root@pam:
Oct 18 18:25:52 svr1 pve-ha-lrm[26294]: <root@pam> end task UPID:svr1:00006709:0006FC48:65304D5E:vzmigrate:102:root@pam: OK
Oct 18 18:25:52 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:25:52 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:25:57 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:25:57 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:02 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:02 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26296]: <root@pam> end task UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam: OK
Oct 18 18:26:12 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:17 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:21 svr1 pve-ha-lrm[26295]: <root@pam> end task UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam: OK

One time there seem to be temporary problems, but it succeeded in the end:
Code:
Oct 18 23:10:24 svr1 pve-ha-lrm[2109]: got shutdown request with shutdown policy 'migrate'
Oct 18 23:10:24 svr1 pve-ha-lrm[2109]: shutdown LRM, doing maintenance, removing this node from active list
Oct 18 23:10:33 svr1 pve-ha-lrm[2109]: status change active => maintenance
Oct 18 23:10:33 svr1 pve-ha-lrm[7064]: <root@pam> starting task UPID:svr1:00001B99:0000B890:65309019:qmigrate:100:root@pam:
Oct 18 23:10:36 svr1 pve-ha-lrm[7065]: migration problems
Oct 18 23:10:36 svr1 pve-ha-lrm[7064]: <root@pam> end task UPID:svr1:00001B99:0000B890:65309019:qmigrate:100:root@pam: migration problems
Oct 18 23:10:36 svr1 pve-ha-lrm[7064]: service vm:100 not moved (migration error)
Oct 18 23:10:43 svr1 pve-ha-lrm[7071]: <root@pam> starting task UPID:svr1:00001BA0:0000BCBA:65309023:qmigrate:100:root@pam:
Oct 18 23:10:47 svr1 pve-ha-lrm[7072]: migration problems
Oct 18 23:14:22 svr1 pve-ha-lrm[7225]: <root@pam> end task UPID:svr1:00001C3A:000110AD:653090FA:qmigrate:100:root@pam: migration problems
Oct 18 23:14:22 svr1 pve-ha-lrm[7225]: service vm:100 not moved (migration error)
Oct 18 23:14:28 svr1 pve-ha-lrm[7232]: <root@pam> starting task
...
UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:
Oct 18 23:14:33 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:38 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:43 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:48 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:53 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:54 svr1 pve-ha-lrm[7232]: <root@pam> end task UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam: OK
You can check the migration task logs for more information (e.g. in srv1 > Task History) in the UI.


And one time it seems everything was shut down at the same time? Wihout quorum, there won't be migrations then of course.
Code:
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: status change slave => master
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: node 'svr2': state changed from 'online' => 'maintenance'
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: node 'svr3': state changed from 'online' => 'maintenance'
Oct 19 00:30:21 svr1 systemd[1]: Stopping pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: received signal TERM
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: got shutdown request with shutdown policy 'migrate'
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: reboot LRM, doing maintenance, removing this node from active list
Oct 19 00:30:28 svr1 pve-ha-lrm[2095]: lost lock 'ha_agent_svr1_lock - cfs lock update failed - Permission denied
Oct 19 00:30:29 svr1 pve-ha-crm[2086]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Oct 19 00:30:33 svr1 pve-ha-lrm[2095]: status change active => lost_agent_lock
Oct 19 00:30:33 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: status change master => lost_manager_lock
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: watchdog closed (disabled)
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: status change lost_manager_lock => wait_for_quorum
Oct 19 00:30:38 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:43 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:48 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:53 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:58 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:31:03 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Thanks for the reply. I made a mistake when uploading the log files.

Apparently, node 3, being very old, did not allow me to remove raid from disks, perhaps for this reason ceph presents problems.

I am looking at another way to provide a solution without node 3.
 
Hi,
please share the files resulting from journalctl --since "-48h" -u pve-ha-crm.service -u pve-ha-lrm.service > /tmp/ha-srv1.log on srv1 and the same for srv2 and srv3. (Assuming you did it in the last 48 hours, otherwise, you need to adapt that).
Now I upload the correct logs.
 

Attachments

task migrate

task started by HA resource agent 2023-10-23 21:04:10 starting migration of VM 102 to node 'svr1' (192.168.106.230) 2023-10-23 21:04:10 starting VM 102 on remote node 'svr1' 2023-10-23 21:04:12 start remote tunnel 2023-10-23 21:04:13 ssh tunnel ver 1 2023-10-23 21:04:13 starting online/live migration on unix:/run/qemu-server/102.migrate 2023-10-23 21:04:13 set migration capabilities 2023-10-23 21:04:13 migration downtime limit: 100 ms 2023-10-23 21:04:13 migration cachesize: 512.0 MiB 2023-10-23 21:04:13 set migration parameters 2023-10-23 21:04:13 start migrate command to unix:/run/qemu-server/102.migrate 2023-10-23 21:04:14 migration active, transferred 93.7 MiB of 4.0 GiB VM-state, 113.5 MiB/s 2023-10-23 21:04:15 migration active, transferred 204.1 MiB of 4.0 GiB VM-state, 10.3 GiB/s 2023-10-23 21:04:16 migration active, transferred 316.1 MiB of 4.0 GiB VM-state, 114.8 MiB/s 2023-10-23 21:04:17 migration active, transferred 428.6 MiB of 4.0 GiB VM-state, 123.9 MiB/s 2023-10-23 21:04:18 migration active, transferred 540.4 MiB of 4.0 GiB VM-state, 2.6 GiB/s 2023-10-23 21:04:19 migration active, transferred 652.8 MiB of 4.0 GiB VM-state, 116.0 MiB/s 2023-10-23 21:04:20 migration active, transferred 765.0 MiB of 4.0 GiB VM-state, 122.8 MiB/s 2023-10-23 21:04:21 migration active, transferred 877.3 MiB of 4.0 GiB VM-state, 123.9 MiB/s 2023-10-23 21:04:22 migration active, transferred 989.6 MiB of 4.0 GiB VM-state, 112.5 MiB/s 2023-10-23 21:04:23 migration active, transferred 1.1 GiB of 4.0 GiB VM-state, 119.3 MiB/s 2023-10-23 21:04:24 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 112.2 MiB/s 2023-10-23 21:04:25 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 114.9 MiB/s 2023-10-23 21:04:26 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 112.5 MiB/s 2023-10-23 21:04:28 average migration speed: 274.4 MiB/s - downtime 273 ms 2023-10-23 21:04:28 migration status: completed 2023-10-23 21:04:31 migration finished successfully (duration 00:00:21) TASK OK
 
Now I upload the correct logs.
The start command timed out for VM 101 on srv2. Since you are using Ceph, are you sure the Ceph cluster was healthy/operational at the time?

task migrate

task started by HA resource agent 2023-10-23 21:04:10 starting migration of VM 102 to node 'svr1' (192.168.106.230) 2023-10-23 21:04:10 starting VM 102 on remote node 'svr1' 2023-10-23 21:04:12 start remote tunnel 2023-10-23 21:04:13 ssh tunnel ver 1 2023-10-23 21:04:13 starting online/live migration on unix:/run/qemu-server/102.migrate 2023-10-23 21:04:13 set migration capabilities 2023-10-23 21:04:13 migration downtime limit: 100 ms 2023-10-23 21:04:13 migration cachesize: 512.0 MiB 2023-10-23 21:04:13 set migration parameters 2023-10-23 21:04:13 start migrate command to unix:/run/qemu-server/102.migrate 2023-10-23 21:04:14 migration active, transferred 93.7 MiB of 4.0 GiB VM-state, 113.5 MiB/s 2023-10-23 21:04:15 migration active, transferred 204.1 MiB of 4.0 GiB VM-state, 10.3 GiB/s 2023-10-23 21:04:16 migration active, transferred 316.1 MiB of 4.0 GiB VM-state, 114.8 MiB/s 2023-10-23 21:04:17 migration active, transferred 428.6 MiB of 4.0 GiB VM-state, 123.9 MiB/s 2023-10-23 21:04:18 migration active, transferred 540.4 MiB of 4.0 GiB VM-state, 2.6 GiB/s 2023-10-23 21:04:19 migration active, transferred 652.8 MiB of 4.0 GiB VM-state, 116.0 MiB/s 2023-10-23 21:04:20 migration active, transferred 765.0 MiB of 4.0 GiB VM-state, 122.8 MiB/s 2023-10-23 21:04:21 migration active, transferred 877.3 MiB of 4.0 GiB VM-state, 123.9 MiB/s 2023-10-23 21:04:22 migration active, transferred 989.6 MiB of 4.0 GiB VM-state, 112.5 MiB/s 2023-10-23 21:04:23 migration active, transferred 1.1 GiB of 4.0 GiB VM-state, 119.3 MiB/s 2023-10-23 21:04:24 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 112.2 MiB/s 2023-10-23 21:04:25 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 114.9 MiB/s 2023-10-23 21:04:26 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 112.5 MiB/s 2023-10-23 21:04:28 average migration speed: 274.4 MiB/s - downtime 273 ms 2023-10-23 21:04:28 migration status: completed 2023-10-23 21:04:31 migration finished successfully (duration 00:00:21) TASK OK
There was no error in this migration.
 
  • Like
Reactions: pepex7
The start command timed out for VM 101 on srv2. Since you are using Ceph, are you sure the Ceph cluster was healthy/operational at the time?


There was no error in this migration.
The state of ceph is bad, I had to change to another solution since ceph did not work for me.


2023-10-23 21:04:24 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 112.2 MiB/s2023-10-23 21:04:25 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 114.9 MiB/s2023-10-23 21:04:26 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 112.5 MiB/s2023-10-23 21:04:28 average migration speed: 274.4 MiB/s - downtime 273 ms2023-10-23 21:04:28 migration status: completed2023-10-23 21:04:31 migration finished successfully (duration 00:00:21)TASK OK

I have a doubt, because the migration jumps from 1.4 GiB to 4.0. Isn't there a lack of data to copy to reach 4GiB?
 
The state of ceph is bad, I had to change to another solution since ceph did not work for me.


2023-10-23 21:04:24 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 112.2 MiB/s2023-10-23 21:04:25 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 114.9 MiB/s2023-10-23 21:04:26 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 112.5 MiB/s2023-10-23 21:04:28 average migration speed: 274.4 MiB/s - downtime 273 ms2023-10-23 21:04:28 migration status: completed2023-10-23 21:04:31 migration finished successfully (duration 00:00:21)TASK OK

I have a doubt, because the migration jumps from 1.4 GiB to 4.0. Isn't there a lack of data to copy to reach 4GiB?
If the VM hadn't actually allocated/used more than 1.4 GiB then the rest can be treated as all zeroes and migrated instantly.
 
  • Like
Reactions: pepex7

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!