Configuring shared storage

maurolucc

Member
Dec 1, 2017
30
0
6
31
I have already a cluster of 3 nodes and I would like to enable proper VM Live migration, i.e. unnoticeable downtime. Right now I have my VMs using local-lvm (LVM-thin) and therefore live migration is not allowed (you can use --with-local-disks but there's noticeable downtime).

Could anyone provide me some guidance on how to configure any shared storage among the nodes to enable Live migration?

I do not have an external server where to host NFS server for instance so the solution should be worked out with these 3 resources I already have.

Thanks in advance
 
The only way is ceph. For all other things you need an external Storage. What Hardware do you have exactly?
- Manufacturer
- CPU
- Memory
- Hoply an real SAS/SATA Controller no HW-Raid
- HDD/SSD how many per node

Have also a look at the storagelist.
 
All the nodes have these specs:

- Dell Optiplex 7040
- CPU: i7-6700
- RAM: 16GB
- No Raid. Controller: Intel Corporation SATA Controller
- 1 SDD

Is it feasible to configure ceph in my nodes?

Will Ceph allow real vm live migration? And lxc live migration?
 
No, Ceph is not possible for your hardware. What you need per Node?

- Two NIC's, better 6 NIC, for real production 10Gigabit for Ceph: 2 for Cluster, 2 for normal Network, 2 for Ceph
- Minimal 4 OSD's per node and on Cache SSD (Enterpprise), Or only SSD without Cache
- The PVE-System must stay on extra SSD's. We used here two SSD with ZFS Raid1 per node.
- Depending on the situation more memory is needed

Maybe the it technically will work (not tested), it is enough if you add your system on extradisk(s) and use 2 OSD's per node, but not tested.
 
Ok, I comprehend the hw limitation I suffer.

With the hardware I have, could I somehow have vm and lxc live migration with another kind of storage?

Potentially, I could grasp a regular PC to configure it as my NFS server for instance, although I don't have it now...

Please, notice that I am not trying to configure a real production data center but a small controlled test lab.

Thanks again.
 
I'd just like to point out that the discussion around criu referenced above is 2 YEARS OLD, when it was still on version 1.xx. Its currently on version 3.6 and under active support/development. Just sayin...
 
Forget any live migration, replication, HA with 1 NIC.
Use load balancer and duplicate application on 2+ VMs.
 
Hello everyone,

thank you all for the feedback.

@alexskysilk From your comment I understand that I can use CRIU without problems right?

@czechsys Is it impossible to configure live migration with 1 NIC or it's just not recommendable or inefficient? I would like to avoid the use of a load balancer in my scenario.

@fireon Thanks for the links they'll be useful for the container part. What about my very first challenge with the VMs? What would you recommend me with the HW I have?

Cheers
 
Bonus question:

In case of a shared storage for live migration, the shared storage in use is just used as support (cache) during the migration and allow 0 downtime or it always stores VM images?
 
the disks images need to be on the shared storage, so you only need to transfer the contents of the virtual RAM.
 
--with-local-disks works like this. there is just a minimal downtime when converging the migration and switching over.
 
In my case this downtime is 11s... any idea to reduce it to a reasanoble value?

May I move the images from local-lvm to local? My images are now in raw.
 
I copy paste the log of the migration and I color in red the gap of 11s I observe.

root@kcl-node1:~# qm migrate 104 kcl-node2 --online --with-local-disks
2017-12-07 11:44:24 starting migration of VM 104 to node 'kcl-node2' (10.81.59.102)
2017-12-07 11:44:24 found local disk 'local-lvm:vm-104-disk-1' (in current VM config)
2017-12-07 11:44:24 copying disk images
2017-12-07 11:44:24 starting VM 104 on remote node 'kcl-node2'
2017-12-07 11:44:26 start remote tunnel
2017-12-07 11:44:26 ssh tunnel ver 1
2017-12-07 11:44:26 starting storage migration
2017-12-07 11:44:26 scsi0: start migration to to nbd:10.81.59.102:60000:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 0 bytes remaining: 5477761024 bytes total: 5477761024 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi0: transferred: 116391936 bytes remaining: 5361369088 bytes total: 5477761024 bytes progression: 2.12 % busy: 1 ready: 0
drive-scsi0: transferred: 225443840 bytes remaining: 5252317184 bytes total: 5477761024 bytes progression: 4.12 % busy: 1 ready: 0
drive-scsi0: transferred: 342884352 bytes remaining: 5134876672 bytes total: 5477761024 bytes progression: 6.26 % busy: 1 ready: 0
drive-scsi0: transferred: 457179136 bytes remaining: 5020581888 bytes total: 5477761024 bytes progression: 8.35 % busy: 1 ready: 0
drive-scsi0: transferred: 573571072 bytes remaining: 4904189952 bytes total: 5477761024 bytes progression: 10.47 % busy: 1 ready: 0
drive-scsi0: transferred: 691011584 bytes remaining: 4786749440 bytes total: 5477761024 bytes progression: 12. % busy: 1 ready: 0
drive-scsi0: transferred: 3783262208 bytes remaining: 1694564352 bytes total: 5477826560 bytes progression: 69.07 % busy: 1 ready: 0
drive-scsi0: transferred: 3900702720 bytes remaining: 1577123840 bytes total: 5477826560 bytes progression: 71.21 % busy: 1 ready: 0
drive-scsi0: transferred: 4017094656 bytes remaining: 1460731904 bytes total: 5477826560 bytes progression: 73.33 % busy: 1 ready: 0
drive-scsi0: transferred: 4133486592 bytes remaining: 1344339968 bytes total: 5477826560 bytes progression: 75.46 % busy: 1 ready: 0
drive-scsi0: transferred: 4250927104 bytes remaining: 1226899456 bytes total: 5477826560 bytes progression: 77.60 % busy: 1 ready: 0
drive-scsi0: transferred: 4367319040 bytes remaining: 1110507520 bytes total: 5477826560 bytes progression: 79.73 % busy: 1 ready: 0
drive-scsi0: transferred: 4483710976 bytes remaining: 994115584 bytes total: 5477826560 bytes progression: 81.85 % busy: 1 ready: 0
drive-scsi0: transferred: 4601151488 bytes remaining: 876675072 bytes total: 5477826560 bytes progression: 84.00 % busy: 1 ready: 0
drive-scsi0: transferred: 4717543424 bytes remaining: 760283136 bytes total: 5477826560 bytes progression: 86.12 % busy: 1 ready: 0
drive-scsi0: transferred: 4833935360 bytes remaining: 643956736 bytes total: 5477892096 bytes progression: 88.24 % busy: 1 ready: 0
drive-scsi0: transferred: 4951375872 bytes remaining: 526516224 bytes total: 5477892096 bytes progression: 90.39 % busy: 1 ready: 0
drive-scsi0: transferred: 5067767808 bytes remaining: 410124288 bytes total: 5477892096 bytes progression: 92.51 % busy: 1 ready: 0
drive-scsi0: transferred: 5183111168 bytes remaining: 294780928 bytes total: 5477892096 bytes progression: 94.62 % busy: 1 ready: 0
drive-scsi0: transferred: 5301600256 bytes remaining: 176291840 bytes total: 5477892096 bytes progression: 96.78 % busy: 1 ready: 0
drive-scsi0: transferred: 5416943616 bytes remaining: 60948480 bytes total: 5477892096 bytes progression: 98.89 % busy: 1 ready: 0
drive-scsi0: transferred: 5477892096 bytes remaining: 0 bytes total: 5477892096 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2017-12-07 11:45:16 starting online/live migration on unix:/run/qemu-server/104.migrate
2017-12-07 11:45:16 migrate_set_speed: 8589934592
2017-12-07 11:45:16 migrate_set_downtime: 0.1
2017-12-07 11:45:16 set migration_caps
2017-12-07 11:45:16 set cachesize: 858993459
2017-12-07 11:45:16 start migrate command to unix:/run/qemu-server/104.migrate
2017-12-07 11:45:17 migration status: active (transferred 110599053, remaining 5645664256), total 8607571968)
2017-12-07 11:45:17 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2017-12-07 11:45:18 migration status: active (transferred 227295041, remaining 5524242432), total 8607571968)
2017-12-07 11:45:18 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2017-12-07 11:45:19 migration status: active (transferred 332378247, remaining 609148928), total 8607571968)
2017-12-07 11:45:19 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2017-12-07 11:45:20 migration status: active (transferred 449122925, remaining 484204544), total 8607571968)
2017-12-07 11:45:20 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2017-12-07 11:45:21 migration status: active (transferred 565772005, remaining 358039552), total 8607571968)
2017-12-07 11:45:21 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2017-12-07 11:45:22 migration status: active (transferred 682510788, remaining 233914368), total 8607571968)
2017-12-07 11:45:22 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2017-12-07 11:45:23 migration status: active (transferred 799100315, remaining 114352128), total 8607571968)
2017-12-07 11:45:23 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2017-12-07 11:45:24 migration status: active (transferred 915795953, remaining 8859648), total 8607571968)
2017-12-07 11:45:24 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 4400 overflow 0
2017-12-07 11:45:24 migration speed: 141.24 MB/s - downtime 46 ms
2017-12-07 11:45:24 migration status: completed
drive-scsi0: transferred: 5477892096 bytes remaining: 0 bytes total: 5477892096 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0 : finished
Logical volume "vm-104-disk-1" successfully removed
2017-12-07 11:45:39 migration finished successfully (duration 00:01:16)

Between these two outputs there's the 11s gap. Any idea how can I reduce it without using shared storage?
 
if you want (almost) no downtime, use shared storage. if you can live with a bit of downtime, use --with-local-disks. for the latter, faster network and less stuff to transfer (both VM memory and VM disks) reduce the downtime.
 
Just for the record, I finally configured a NFS server on an external machine and I could perform live migration with 30-40ms downtime. Moreover the migration time has been reduced significantly.

Thanks for all the replies.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!