hello,
I have a 3 node cluster which I tried to patch from 4.1-2 to 4.1-22.
The plan was to migrate the VMs, online, from node 1 to node 2, patch node 1, reboot and migrate the VMs from node 2 back to 1. Rinse and repeat for the other 2 nodes.
After patching (apt-get update && apt-get dist-upgrade) and rebooting node 1 the cluster was quorate and showed all 3 nodes as active.
However it was not possible to migrate the VMs, online, back to node 1. The error message does not seem to imply what the problem actually is though:
Mar 26 23:30:09 starting migration of VM 102 to node 'node01' (10.20.30.41)
Mar 26 23:30:09 copying disk images
Mar 26 23:30:09 starting VM 102 on remote node 'node01'
Mar 26 23:30:12 starting ssh migration tunnel
Mar 26 23:30:13 starting online/live migration on localhost:60000
Mar 26 23:30:13 migrate_set_speed: 8589934592
Mar 26 23:30:13 migrate_set_downtime: 0.1
Mar 26 23:30:15 ERROR: online migrate failure - aborting
Mar 26 23:30:15 aborting phase 2 - cleanup resources
Mar 26 23:30:15 migrate_cancel
Mar 26 23:30:16 ERROR: migration finished with problems (duration 00:00:07)
The VMs all have a rbd disks which are located on a separate dedicated CEPH cluster.
Shutting down the VMs, migrating offline and starting them back up on the patched node worked fine. But of course I would prefer being able to patch the nodes without having to shut down the VMs.
Any feedback on what I did wrong and how to do a rolling update without downtime would be greatly appreciated.
Best regards,
Rudo
I have a 3 node cluster which I tried to patch from 4.1-2 to 4.1-22.
The plan was to migrate the VMs, online, from node 1 to node 2, patch node 1, reboot and migrate the VMs from node 2 back to 1. Rinse and repeat for the other 2 nodes.
After patching (apt-get update && apt-get dist-upgrade) and rebooting node 1 the cluster was quorate and showed all 3 nodes as active.
However it was not possible to migrate the VMs, online, back to node 1. The error message does not seem to imply what the problem actually is though:
Mar 26 23:30:09 starting migration of VM 102 to node 'node01' (10.20.30.41)
Mar 26 23:30:09 copying disk images
Mar 26 23:30:09 starting VM 102 on remote node 'node01'
Mar 26 23:30:12 starting ssh migration tunnel
Mar 26 23:30:13 starting online/live migration on localhost:60000
Mar 26 23:30:13 migrate_set_speed: 8589934592
Mar 26 23:30:13 migrate_set_downtime: 0.1
Mar 26 23:30:15 ERROR: online migrate failure - aborting
Mar 26 23:30:15 aborting phase 2 - cleanup resources
Mar 26 23:30:15 migrate_cancel
Mar 26 23:30:16 ERROR: migration finished with problems (duration 00:00:07)
The VMs all have a rbd disks which are located on a separate dedicated CEPH cluster.
Shutting down the VMs, migrating offline and starting them back up on the patched node worked fine. But of course I would prefer being able to patch the nodes without having to shut down the VMs.
Any feedback on what I did wrong and how to do a rolling update without downtime would be greatly appreciated.
Best regards,
Rudo