HA migration not working

pepex7 · Oct 18, 2023

I'm having trouble migrating VMs when shutting down a node (failover doesn't work).
It is a 3 node cluster (Dell 2x R710 y R510) with Proxmox 8.0.3 and ceph version 17.2.6 quincy.
VM test to migrate:

Code:

/etc/pve/101.conf: No such file or directory
root@svr1:/etc/pve# cat /etc/pve/qemu-server/101.conf
bios: ovmf
boot: order=sata0;ide2;net0
cores: 20
cpu: x86-64-v2-AES
efidisk0: poolceph:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
memory: 20480
meta: creation-qemu=8.0.2,ctime=1697583668
name: testvm01
net0: virtio=4A:9E:07:3C:D0:40,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
sata0: poolceph:vm-101-disk-1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=4ce83b96-4732-4e5d-b10b-3a3ae9bb7ca0
sockets: 2
vmgenid: f2f6b625-9878-4fe1-badb-c72a9a33f0e2

LnxBil · Oct 19, 2023

What about the HA configuration?

pepex7 · Oct 19, 2023

LnxBil said:
What about the HA configuration?

Captura de Pantalla 2023-10-18 a la(s) 19.19.11.png

fiona · Oct 19, 2023

Hi,
please share the files resulting from journalctl --since "-48h" -u pve-ha-crm.service -u pve-ha-lrm.service > /tmp/ha-srv1.log on srv1 and the same for srv2 and srv3. (Assuming you did it in the last 48 hours, otherwise, you need to adapt that).

pepex7 · Oct 19, 2023

fiona · Oct 20, 2023

It's three times the same log, only from the srv1.
Just telling from the log, it seems to have worked most of the times, i.e. the migration tasks were succuessful:

Code:

Oct 18 18:25:41 svr1 pve-ha-lrm[2108]: got shutdown request with shutdown policy 'migrate'
Oct 18 18:25:41 svr1 pve-ha-lrm[2108]: shutdown LRM, doing maintenance, removing this node from active list
Oct 18 18:25:47 svr1 pve-ha-lrm[2108]: status change active => maintenance
Oct 18 18:25:47 svr1 pve-ha-lrm[26297]: shutdown CT 102: UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26294]: <root@pam> starting task UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26295]: <root@pam> starting task UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26296]: <root@pam> starting task UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:
Oct 18 18:25:50 svr1 pve-ha-lrm[26294]: <root@pam> end task UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam: OK
Oct 18 18:25:50 svr1 pve-ha-lrm[26294]: <root@pam> starting task UPID:svr1:00006709:0006FC48:65304D5E:vzmigrate:102:root@pam:
Oct 18 18:25:52 svr1 pve-ha-lrm[26294]: <root@pam> end task UPID:svr1:00006709:0006FC48:65304D5E:vzmigrate:102:root@pam: OK
Oct 18 18:25:52 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:25:52 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:25:57 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:25:57 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:02 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:02 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26296]: <root@pam> end task UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam: OK
Oct 18 18:26:12 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:17 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:21 svr1 pve-ha-lrm[26295]: <root@pam> end task UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam: OK

One time there seem to be temporary problems, but it succeeded in the end:

Code:

Oct 18 23:10:24 svr1 pve-ha-lrm[2109]: got shutdown request with shutdown policy 'migrate'
Oct 18 23:10:24 svr1 pve-ha-lrm[2109]: shutdown LRM, doing maintenance, removing this node from active list
Oct 18 23:10:33 svr1 pve-ha-lrm[2109]: status change active => maintenance
Oct 18 23:10:33 svr1 pve-ha-lrm[7064]: <root@pam> starting task UPID:svr1:00001B99:0000B890:65309019:qmigrate:100:root@pam:
Oct 18 23:10:36 svr1 pve-ha-lrm[7065]: migration problems
Oct 18 23:10:36 svr1 pve-ha-lrm[7064]: <root@pam> end task UPID:svr1:00001B99:0000B890:65309019:qmigrate:100:root@pam: migration problems
Oct 18 23:10:36 svr1 pve-ha-lrm[7064]: service vm:100 not moved (migration error)
Oct 18 23:10:43 svr1 pve-ha-lrm[7071]: <root@pam> starting task UPID:svr1:00001BA0:0000BCBA:65309023:qmigrate:100:root@pam:
Oct 18 23:10:47 svr1 pve-ha-lrm[7072]: migration problems
Oct 18 23:14:22 svr1 pve-ha-lrm[7225]: <root@pam> end task UPID:svr1:00001C3A:000110AD:653090FA:qmigrate:100:root@pam: migration problems
Oct 18 23:14:22 svr1 pve-ha-lrm[7225]: service vm:100 not moved (migration error)
Oct 18 23:14:28 svr1 pve-ha-lrm[7232]: <root@pam> starting task 
...
UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:
Oct 18 23:14:33 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:38 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:43 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:48 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:53 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:54 svr1 pve-ha-lrm[7232]: <root@pam> end task UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam: OK

You can check the migration task logs for more information (e.g. in srv1 > Task History) in the UI.

And one time it seems everything was shut down at the same time? Wihout quorum, there won't be migrations then of course.

Code:

Oct 19 00:30:19 svr1 pve-ha-crm[2086]: status change slave => master
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: node 'svr2': state changed from 'online' => 'maintenance'
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: node 'svr3': state changed from 'online' => 'maintenance'
Oct 19 00:30:21 svr1 systemd[1]: Stopping pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: received signal TERM
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: got shutdown request with shutdown policy 'migrate'
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: reboot LRM, doing maintenance, removing this node from active list
Oct 19 00:30:28 svr1 pve-ha-lrm[2095]: lost lock 'ha_agent_svr1_lock - cfs lock update failed - Permission denied
Oct 19 00:30:29 svr1 pve-ha-crm[2086]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Oct 19 00:30:33 svr1 pve-ha-lrm[2095]: status change active => lost_agent_lock
Oct 19 00:30:33 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: status change master => lost_manager_lock
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: watchdog closed (disabled)
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: status change lost_manager_lock => wait_for_quorum
Oct 19 00:30:38 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:43 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:48 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:53 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:58 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:31:03 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services

pepex7 · Oct 22, 2023

fiona said:

It's three times the same log, only from the srv1.
Just telling from the log, it seems to have worked most of the times, i.e. the migration tasks were succuessful:

Code:

Oct 18 18:25:41 svr1 pve-ha-lrm[2108]: got shutdown request with shutdown policy 'migrate'
Oct 18 18:25:41 svr1 pve-ha-lrm[2108]: shutdown LRM, doing maintenance, removing this node from active list
Oct 18 18:25:47 svr1 pve-ha-lrm[2108]: status change active => maintenance
Oct 18 18:25:47 svr1 pve-ha-lrm[26297]: shutdown CT 102: UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26294]: <root@pam> starting task UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26295]: <root@pam> starting task UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:
Oct 18 18:25:47 svr1 pve-ha-lrm[26296]: <root@pam> starting task UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:
Oct 18 18:25:50 svr1 pve-ha-lrm[26294]: <root@pam> end task UPID:svr1:000066B9:0006FB49:65304D5B:vzshutdown:102:root@pam: OK
Oct 18 18:25:50 svr1 pve-ha-lrm[26294]: <root@pam> starting task UPID:svr1:00006709:0006FC48:65304D5E:vzmigrate:102:root@pam:
Oct 18 18:25:52 svr1 pve-ha-lrm[26294]: <root@pam> end task UPID:svr1:00006709:0006FC48:65304D5E:vzmigrate:102:root@pam: OK
Oct 18 18:25:52 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:25:52 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:25:57 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:25:57 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:02 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:02 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26296]: Task 'UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam:' still active, waiting
Oct 18 18:26:07 svr1 pve-ha-lrm[26296]: <root@pam> end task UPID:svr1:000066BB:0006FB49:65304D5B:qmigrate:101:root@pam: OK
Oct 18 18:26:12 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:17 svr1 pve-ha-lrm[26295]: Task 'UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam:' still active, waiting
Oct 18 18:26:21 svr1 pve-ha-lrm[26295]: <root@pam> end task UPID:svr1:000066BA:0006FB49:65304D5B:qmigrate:100:root@pam: OK

One time there seem to be temporary problems, but it succeeded in the end:

Code:

Oct 18 23:10:24 svr1 pve-ha-lrm[2109]: got shutdown request with shutdown policy 'migrate'
Oct 18 23:10:24 svr1 pve-ha-lrm[2109]: shutdown LRM, doing maintenance, removing this node from active list
Oct 18 23:10:33 svr1 pve-ha-lrm[2109]: status change active => maintenance
Oct 18 23:10:33 svr1 pve-ha-lrm[7064]: <root@pam> starting task UPID:svr1:00001B99:0000B890:65309019:qmigrate:100:root@pam:
Oct 18 23:10:36 svr1 pve-ha-lrm[7065]: migration problems
Oct 18 23:10:36 svr1 pve-ha-lrm[7064]: <root@pam> end task UPID:svr1:00001B99:0000B890:65309019:qmigrate:100:root@pam: migration problems
Oct 18 23:10:36 svr1 pve-ha-lrm[7064]: service vm:100 not moved (migration error)
Oct 18 23:10:43 svr1 pve-ha-lrm[7071]: <root@pam> starting task UPID:svr1:00001BA0:0000BCBA:65309023:qmigrate:100:root@pam:
Oct 18 23:10:47 svr1 pve-ha-lrm[7072]: migration problems
Oct 18 23:14:22 svr1 pve-ha-lrm[7225]: <root@pam> end task UPID:svr1:00001C3A:000110AD:653090FA:qmigrate:100:root@pam: migration problems
Oct 18 23:14:22 svr1 pve-ha-lrm[7225]: service vm:100 not moved (migration error)
Oct 18 23:14:28 svr1 pve-ha-lrm[7232]: <root@pam> starting task
...
UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:
Oct 18 23:14:33 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:38 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:43 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:48 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:53 svr1 pve-ha-lrm[7232]: Task 'UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam:' still active, waiting
Oct 18 23:14:54 svr1 pve-ha-lrm[7232]: <root@pam> end task UPID:svr1:00001C41:0001147B:65309104:qmigrate:100:root@pam: OK

You can check the migration task logs for more information (e.g. in srv1 > Task History) in the UI.

And one time it seems everything was shut down at the same time? Wihout quorum, there won't be migrations then of course.

Code:

Oct 19 00:30:19 svr1 pve-ha-crm[2086]: status change slave => master
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: node 'svr2': state changed from 'online' => 'maintenance'
Oct 19 00:30:19 svr1 pve-ha-crm[2086]: node 'svr3': state changed from 'online' => 'maintenance'
Oct 19 00:30:21 svr1 systemd[1]: Stopping pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: received signal TERM
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: got shutdown request with shutdown policy 'migrate'
Oct 19 00:30:22 svr1 pve-ha-lrm[2095]: reboot LRM, doing maintenance, removing this node from active list
Oct 19 00:30:28 svr1 pve-ha-lrm[2095]: lost lock 'ha_agent_svr1_lock - cfs lock update failed - Permission denied
Oct 19 00:30:29 svr1 pve-ha-crm[2086]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Oct 19 00:30:33 svr1 pve-ha-lrm[2095]: status change active => lost_agent_lock
Oct 19 00:30:33 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: status change master => lost_manager_lock
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: watchdog closed (disabled)
Oct 19 00:30:34 svr1 pve-ha-crm[2086]: status change lost_manager_lock => wait_for_quorum
Oct 19 00:30:38 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:43 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:48 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:53 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:30:58 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services
Oct 19 00:31:03 svr1 pve-ha-lrm[2095]: get shutdown request in state 'lost_agent_lock' - detected 1 running services

Thanks for the reply. I made a mistake when uploading the log files.

Apparently, node 3, being very old, did not allow me to remove raid from disks, perhaps for this reason ceph presents problems.

I am looking at another way to provide a solution without node 3.

pepex7 · Oct 24, 2023

fiona said:
Hi,
please share the files resulting from journalctl --since "-48h" -u pve-ha-crm.service -u pve-ha-lrm.service > /tmp/ha-srv1.log on srv1 and the same for srv2 and srv3. (Assuming you did it in the last 48 hours, otherwise, you need to adapt that).

Now I upload the correct logs.

pepex7 · Oct 24, 2023

task migrate


task started by HA resource agent
2023-10-23 21:04:10 starting migration of VM 102 to node 'svr1' (192.168.106.230)
2023-10-23 21:04:10 starting VM 102 on remote node 'svr1'
2023-10-23 21:04:12 start remote tunnel
2023-10-23 21:04:13 ssh tunnel ver 1
2023-10-23 21:04:13 starting online/live migration on unix:/run/qemu-server/102.migrate
2023-10-23 21:04:13 set migration capabilities
2023-10-23 21:04:13 migration downtime limit: 100 ms
2023-10-23 21:04:13 migration cachesize: 512.0 MiB
2023-10-23 21:04:13 set migration parameters
2023-10-23 21:04:13 start migrate command to unix:/run/qemu-server/102.migrate
2023-10-23 21:04:14 migration active, transferred 93.7 MiB of 4.0 GiB VM-state, 113.5 MiB/s
2023-10-23 21:04:15 migration active, transferred 204.1 MiB of 4.0 GiB VM-state, 10.3 GiB/s
2023-10-23 21:04:16 migration active, transferred 316.1 MiB of 4.0 GiB VM-state, 114.8 MiB/s
2023-10-23 21:04:17 migration active, transferred 428.6 MiB of 4.0 GiB VM-state, 123.9 MiB/s
2023-10-23 21:04:18 migration active, transferred 540.4 MiB of 4.0 GiB VM-state, 2.6 GiB/s
2023-10-23 21:04:19 migration active, transferred 652.8 MiB of 4.0 GiB VM-state, 116.0 MiB/s
2023-10-23 21:04:20 migration active, transferred 765.0 MiB of 4.0 GiB VM-state, 122.8 MiB/s
2023-10-23 21:04:21 migration active, transferred 877.3 MiB of 4.0 GiB VM-state, 123.9 MiB/s
2023-10-23 21:04:22 migration active, transferred 989.6 MiB of 4.0 GiB VM-state, 112.5 MiB/s
2023-10-23 21:04:23 migration active, transferred 1.1 GiB of 4.0 GiB VM-state, 119.3 MiB/s
2023-10-23 21:04:24 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 112.2 MiB/s
2023-10-23 21:04:25 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 114.9 MiB/s
2023-10-23 21:04:26 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 112.5 MiB/s
2023-10-23 21:04:28 average migration speed: 274.4 MiB/s - downtime 273 ms
2023-10-23 21:04:28 migration status: completed
2023-10-23 21:04:31 migration finished successfully (duration 00:00:21)
TASK OK

fiona · Oct 24, 2023

pepex7 said:
Now I upload the correct logs.

The start command timed out for VM 101 on srv2. Since you are using Ceph, are you sure the Ceph cluster was healthy/operational at the time?

pepex7 said:
task migrate

task started by HA resource agent 2023-10-23 21:04:10 starting migration of VM 102 to node 'svr1' (192.168.106.230) 2023-10-23 21:04:10 starting VM 102 on remote node 'svr1' 2023-10-23 21:04:12 start remote tunnel 2023-10-23 21:04:13 ssh tunnel ver 1 2023-10-23 21:04:13 starting online/live migration on unix:/run/qemu-server/102.migrate 2023-10-23 21:04:13 set migration capabilities 2023-10-23 21:04:13 migration downtime limit: 100 ms 2023-10-23 21:04:13 migration cachesize: 512.0 MiB 2023-10-23 21:04:13 set migration parameters 2023-10-23 21:04:13 start migrate command to unix:/run/qemu-server/102.migrate 2023-10-23 21:04:14 migration active, transferred 93.7 MiB of 4.0 GiB VM-state, 113.5 MiB/s 2023-10-23 21:04:15 migration active, transferred 204.1 MiB of 4.0 GiB VM-state, 10.3 GiB/s 2023-10-23 21:04:16 migration active, transferred 316.1 MiB of 4.0 GiB VM-state, 114.8 MiB/s 2023-10-23 21:04:17 migration active, transferred 428.6 MiB of 4.0 GiB VM-state, 123.9 MiB/s 2023-10-23 21:04:18 migration active, transferred 540.4 MiB of 4.0 GiB VM-state, 2.6 GiB/s 2023-10-23 21:04:19 migration active, transferred 652.8 MiB of 4.0 GiB VM-state, 116.0 MiB/s 2023-10-23 21:04:20 migration active, transferred 765.0 MiB of 4.0 GiB VM-state, 122.8 MiB/s 2023-10-23 21:04:21 migration active, transferred 877.3 MiB of 4.0 GiB VM-state, 123.9 MiB/s 2023-10-23 21:04:22 migration active, transferred 989.6 MiB of 4.0 GiB VM-state, 112.5 MiB/s 2023-10-23 21:04:23 migration active, transferred 1.1 GiB of 4.0 GiB VM-state, 119.3 MiB/s 2023-10-23 21:04:24 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 112.2 MiB/s 2023-10-23 21:04:25 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 114.9 MiB/s 2023-10-23 21:04:26 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 112.5 MiB/s 2023-10-23 21:04:28 average migration speed: 274.4 MiB/s - downtime 273 ms 2023-10-23 21:04:28 migration status: completed 2023-10-23 21:04:31 migration finished successfully (duration 00:00:21) TASK OK

There was no error in this migration.

pepex7 · Oct 26, 2023

fiona said:
The start command timed out for VM 101 on srv2. Since you are using Ceph, are you sure the Ceph cluster was healthy/operational at the time?

There was no error in this migration.

The state of ceph is bad, I had to change to another solution since ceph did not work for me.

2023-10-23 21:04:24 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 112.2 MiB/s2023-10-23 21:04:25 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 114.9 MiB/s2023-10-23 21:04:26 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 112.5 MiB/s2023-10-23 21:04:28 average migration speed: 274.4 MiB/s - downtime 273 ms2023-10-23 21:04:28 migration status: completed2023-10-23 21:04:31 migration finished successfully (duration 00:00:21)TASK OK

I have a doubt, because the migration jumps from 1.4 GiB to 4.0. Isn't there a lack of data to copy to reach 4GiB?

fiona · Oct 27, 2023

pepex7 said:
The state of ceph is bad, I had to change to another solution since ceph did not work for me.

2023-10-23 21:04:24 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 112.2 MiB/s2023-10-23 21:04:25 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 114.9 MiB/s2023-10-23 21:04:26 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 112.5 MiB/s2023-10-23 21:04:28 average migration speed: 274.4 MiB/s - downtime 273 ms2023-10-23 21:04:28 migration status: completed2023-10-23 21:04:31 migration finished successfully (duration 00:00:21)TASK OK

I have a doubt, because the migration jumps from 1.4 GiB to 4.0. Isn't there a lack of data to copy to reach 4GiB?

If the VM hadn't actually allocated/used more than 1.4 GiB then the rest can be treated as all zeroes and migrated instantly.

pepex7 · Oct 27, 2023

fiona said:
If the VM hadn't actually allocated/used more than 1.4 GiB then the rest can be treated as all zeroes and migrated instantly.

Thank you very much Fiona.

HA migration not working

pepex7

New Member

LnxBil

Distinguished Member

pepex7

New Member

fiona

Proxmox Staff Member

pepex7

New Member

Attachments

fiona

Proxmox Staff Member

pepex7

New Member

pepex7

New Member

Attachments

pepex7

New Member

fiona

Proxmox Staff Member

pepex7

New Member

fiona

Proxmox Staff Member

pepex7

New Member

We value your privacy