VM live migration using lvm-thin with discard results in high I/O

robm · Oct 8, 2021

When doing a live VM migration (latest proxmox enterprise 7) from one server to another where both servers use local disks with lvm-thin (ext4 hardware SSD RAID-10), if the VM hard disk has "Discard" enabled, we find that the migration hammers the I/O of the target node until the first copy phase is over ("drive mirror is starting for drive" portion). Once that completes the first VM copy, the I/O settles down for the remainder of the copy and sync. The live migration copies the full VM size (not the actual thin size) but them trims after the migration is complete.

If we disable discard on the hard drive and then run a live VM migration, the I/O is very low the entire time as we'd expect. It still does a full size copy (not the thin size), but I/O is low throughout. Of course we cannot reclaim space using fstrim until we re-enable discard after the migration (which requires a reboot).

We've tested with bandwidth limits, which can limit the I/O spike somewhat with discard on, but even with SSD RAID-10, the I/O spike is very high (20-40% IO delay) and impacts other running VMs on the target node.

Is this a bug in the way live migrations are done, or just an inherent limitation of live migrations with discard enabled? Ideally we'd like to keep discard on and not have to reboot a VM twice in order to live migrate it without impacting other VMs performance on the target node.

mira · Oct 11, 2021

Could you provide the task log? One without discard and one with.

robm · Oct 11, 2021

Sure. Task log with discard on:

Code:

Migrate with discard enabled:
-----------------------------
Proxmox
Virtual Environment 7.0-11
Logs
2021-10-09 22:14:05 use dedicated network address for sending migration traffic (10.10.23.81)
2021-10-09 22:14:06 starting migration of VM 214 to node 'vps16' (10.10.23.81)
2021-10-09 22:14:06 found local disk 'containers:vm-214-disk-0' (in current VM config)
2021-10-09 22:14:06 starting VM 214 on remote node 'vps16'
2021-10-09 22:14:09 volume 'containers:vm-214-disk-0' is 'containers:vm-214-disk-0' on the target
2021-10-09 22:14:09 start remote tunnel
2021-10-09 22:14:09 ssh tunnel ver 1
2021-10-09 22:14:09 starting storage migration
2021-10-09 22:14:09 scsi0: start migration to nbd:10.10.23.81:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 368.0 MiB of 100.0 GiB (0.36%) in 5m 33s
drive-scsi0: transferred 850.0 MiB of 100.0 GiB (0.83%) in 5m 34s
drive-scsi0: transferred 1.3 GiB of 100.0 GiB (1.30%) in 5m 35s
drive-scsi0: transferred 1.7 GiB of 100.0 GiB (1.69%) in 5m 36s
drive-scsi0: transferred 2.1 GiB of 100.0 GiB (2.10%) in 5m 37s
...
drive-scsi0: transferred 99.0 GiB of 100.2 GiB (98.81%) in 9m 17s
drive-scsi0: transferred 99.5 GiB of 100.2 GiB (99.28%) in 9m 18s
drive-scsi0: transferred 99.9 GiB of 100.2 GiB (99.76%) in 9m 19s
drive-scsi0: transferred 100.2 GiB of 100.2 GiB (100.00%) in 9m 20s, ready
all 'mirror' jobs are ready
2021-10-09 22:23:29 starting online/live migration on tcp:10.10.23.81:60000
2021-10-09 22:23:29 set migration capabilities
2021-10-09 22:23:29 migration downtime limit: 100 ms
2021-10-09 22:23:29 migration cachesize: 2.0 GiB
2021-10-09 22:23:29 set migration parameters
2021-10-09 22:23:29 start migrate command to tcp:10.10.23.81:60000
2021-10-09 22:23:30 migration active, transferred 920.2 MiB of 16.0 GiB VM-state, 977.1 MiB/s
2021-10-09 22:23:31 migration active, transferred 1.8 GiB of 16.0 GiB VM-state, 974.8 MiB/s
2021-10-09 22:23:32 migration active, transferred 2.8 GiB of 16.0 GiB VM-state, 1.4 GiB/s
2021-10-09 22:23:33 migration active, transferred 3.7 GiB of 16.0 GiB VM-state, 944.2 MiB/s
2021-10-09 22:23:34 migration active, transferred 4.6 GiB of 16.0 GiB VM-state, 992.6 MiB/s
2021-10-09 22:23:35 migration active, transferred 5.5 GiB of 16.0 GiB VM-state, 985.3 MiB/s
2021-10-09 22:23:36 migration active, transferred 6.5 GiB of 16.0 GiB VM-state, 966.0 MiB/s
2021-10-09 22:23:37 migration active, transferred 7.4 GiB of 16.0 GiB VM-state, 995.4 MiB/s
2021-10-09 22:23:38 migration active, transferred 8.3 GiB of 16.0 GiB VM-state, 986.8 MiB/s
2021-10-09 22:23:39 migration active, transferred 9.3 GiB of 16.0 GiB VM-state, 1012.1 MiB/s
2021-10-09 22:23:40 migration active, transferred 10.2 GiB of 16.0 GiB VM-state, 940.4 MiB/s
2021-10-09 22:23:41 migration active, transferred 11.1 GiB of 16.0 GiB VM-state, 939.0 MiB/s
2021-10-09 22:23:42 migration active, transferred 12.0 GiB of 16.0 GiB VM-state, 933.9 MiB/s
2021-10-09 22:23:43 migration active, transferred 12.9 GiB of 16.0 GiB VM-state, 967.6 MiB/s
2021-10-09 22:23:44 migration active, transferred 13.8 GiB of 16.0 GiB VM-state, 964.1 MiB/s
2021-10-09 22:23:45 migration active, transferred 14.7 GiB of 16.0 GiB VM-state, 879.7 MiB/s
2021-10-09 22:23:47 migration active, transferred 15.8 GiB of 16.0 GiB VM-state, 644.3 MiB/s
2021-10-09 22:23:47 average migration speed: 911.2 MiB/s - downtime 100 ms
2021-10-09 22:23:47 migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job_id...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
2021-10-09 22:23:49 stopping NBD storage migration server on target.
  Logical volume "vm-214-disk-0" successfully removed
2021-10-09 22:23:55 migration finished successfully (duration 00:09:50)
TASK OK
-----------------------------

Task log with discard off:

Code:

NO discard
-------------------------------------------------------
2021-10-08 07:48:43 use dedicated network address for sending migration traffic (10.10.23.81)
2021-10-08 07:48:43 starting migration of VM 227 to node 'vps16' (10.10.23.81)
2021-10-08 07:48:43 found local disk 'containers:vm-227-disk-0' (in current VM config)
2021-10-08 07:48:43 starting VM 227 on remote node 'vps16'
2021-10-08 07:48:46 volume 'containers:vm-227-disk-0' is 'containers:vm-227-disk-0' on the target
2021-10-08 07:48:46 start remote tunnel
2021-10-08 07:48:46 ssh tunnel ver 1
2021-10-08 07:48:46 starting storage migration
2021-10-08 07:48:46 scsi0: start migration to nbd:10.10.23.81:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 0.0 B of 50.0 GiB (0.00%) in 0s
drive-scsi0: transferred 80.0 MiB of 50.0 GiB (0.16%) in 1s
drive-scsi0: transferred 160.0 MiB of 50.0 GiB (0.31%) in 2s
drive-scsi0: transferred 240.0 MiB of 50.0 GiB (0.47%) in 3s
drive-scsi0: transferred 320.0 MiB of 50.0 GiB (0.62%) in 4s
drive-scsi0: transferred 400.0 MiB of 50.0 GiB (0.78%) in 5s
...
drive-scsi0: transferred 49.7 GiB of 50.0 GiB (99.37%) in 10m 39s
drive-scsi0: transferred 49.8 GiB of 50.0 GiB (99.52%) in 10m 40s
drive-scsi0: transferred 49.8 GiB of 50.0 GiB (99.68%) in 10m 41s
drive-scsi0: transferred 49.9 GiB of 50.0 GiB (99.84%) in 10m 42s
drive-scsi0: transferred 50.0 GiB of 50.0 GiB (100.00%) in 10m 43s, ready
all 'mirror' jobs are ready
2021-10-08 07:59:29 starting online/live migration on tcp:10.10.23.81:60000
2021-10-08 07:59:29 set migration capabilities
2021-10-08 07:59:29 migration speed limit: 80.0 MiB/s
2021-10-08 07:59:29 migration downtime limit: 100 ms
2021-10-08 07:59:29 migration cachesize: 512.0 MiB
2021-10-08 07:59:29 set migration parameters
2021-10-08 07:59:29 start migrate command to tcp:10.10.23.81:60000
2021-10-08 07:59:30 migration active, transferred 81.5 MiB of 4.0 GiB VM-state, 100.2 MiB/s
2021-10-08 07:59:31 migration active, transferred 160.0 MiB of 4.0 GiB VM-state, 81.6 MiB/s
2021-10-08 07:59:32 migration active, transferred 240.0 MiB of 4.0 GiB VM-state, 104.4 MiB/s
2021-10-08 07:59:33 migration active, transferred 320.1 MiB of 4.0 GiB VM-state, 88.3 MiB/s
2021-10-08 07:59:34 migration active, transferred 400.1 MiB of 4.0 GiB VM-state, 81.1 MiB/s
2021-10-08 07:59:35 migration active, transferred 484.1 MiB of 4.0 GiB VM-state, 80.8 MiB/s
2021-10-08 07:59:36 migration active, transferred 566.6 MiB of 4.0 GiB VM-state, 80.2 MiB/s
2021-10-08 07:59:37 migration active, transferred 642.6 MiB of 4.0 GiB VM-state, 121.7 MiB/s
2021-10-08 07:59:38 migration active, transferred 721.7 MiB of 4.0 GiB VM-state, 104.2 MiB/s
2021-10-08 07:59:39 migration active, transferred 803.1 MiB of 4.0 GiB VM-state, 87.0 MiB/s
2021-10-08 07:59:40 migration active, transferred 881.3 MiB of 4.0 GiB VM-state, 79.2 MiB/s
2021-10-08 07:59:41 migration active, transferred 960.7 MiB of 4.0 GiB VM-state, 102.7 MiB/s
2021-10-08 07:59:42 migration active, transferred 1.0 GiB of 4.0 GiB VM-state, 236.3 MiB/s
2021-10-08 07:59:43 migration active, transferred 1.1 GiB of 4.0 GiB VM-state, 1.1 GiB/s
2021-10-08 07:59:44 migration active, transferred 1.2 GiB of 4.0 GiB VM-state, 250.0 MiB/s
2021-10-08 07:59:45 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 81.5 MiB/s
2021-10-08 07:59:46 migration active, transferred 1.3 GiB of 4.0 GiB VM-state, 105.0 MiB/s
2021-10-08 07:59:47 migration active, transferred 1.4 GiB of 4.0 GiB VM-state, 126.7 MiB/s
2021-10-08 07:59:48 migration active, transferred 1.5 GiB of 4.0 GiB VM-state, 81.8 MiB/s
2021-10-08 07:59:49 migration active, transferred 1.6 GiB of 4.0 GiB VM-state, 635.4 MiB/s
2021-10-08 07:59:50 migration active, transferred 1.6 GiB of 4.0 GiB VM-state, 84.5 MiB/s
2021-10-08 07:59:52 migration active, transferred 1.8 GiB of 4.0 GiB VM-state, 80.6 MiB/s
2021-10-08 07:59:52 average migration speed: 178.8 MiB/s - downtime 57 ms
2021-10-08 07:59:52 migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job_id...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
2021-10-08 07:59:53 stopping NBD storage migration server on target.
  Logical volume "vm-227-disk-0" successfully removed
2021-10-08 07:59:58 migration finished successfully (duration 00:11:15)
TASK OK
--------------------------------------

You can see where with discard enabled, the drive mirror process takes about 5 minutes to build a 100 GB drive on the target, during which time I/O delay goes to 25-40% with 4 x 2 TB Enterprise SSD drives. With discard off, the drive copying starts right away.

I believe this is related to the bug report at:
https://bugzilla.proxmox.com/show_bug.cgi?id=2631

mira · Oct 11, 2021

That seems to be a different issue. In your case it runs through just fine.

The issue is that the `drive-mirror` zeroes the disk before it starts writing. Perhaps it uses a different way to do so depending on the `discard` setting.
Could you test this with the same VM and disk with and without `discard`?

robm · Oct 11, 2021

Sure, here are the task logs for the same VM being migrated 3 different ways. Also attached is the image of I/O load for these 3 tests (discard on first, discard off second, VM offline migration 3rd). The live migration with discard on really impacts other VMS performance during the zeroing of the disk, whereas the other 2 migrations caused no issues with performance on other VMs.

Code:

LIVE MIGRATE, DISCARD ON
------------------------
2021-10-11 10:39:39 use dedicated network address for sending migration traffic (10.10.23.81)
2021-10-11 10:39:39 starting migration of VM 195 to node 'vps16' (10.10.23.81)
2021-10-11 10:39:40 found local disk 'containers:vm-195-disk-0' (in current VM config)
2021-10-11 10:39:40 starting VM 195 on remote node 'vps16'
2021-10-11 10:39:42 volume 'containers:vm-195-disk-0' is 'containers:vm-195-disk-0' on the target
2021-10-11 10:39:42 start remote tunnel
2021-10-11 10:39:43 ssh tunnel ver 1
2021-10-11 10:39:43 starting storage migration
2021-10-11 10:39:43 scsi0: start migration to nbd:10.10.23.81:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 71.0 MiB of 50.0 GiB (0.14%) in 3m 1s
drive-scsi0: transferred 511.0 MiB of 50.0 GiB (1.00%) in 3m 2s
drive-scsi0: transferred 962.0 MiB of 50.0 GiB (1.88%) in 3m 3s
drive-scsi0: transferred 1.4 GiB of 50.0 GiB (2.84%) in 3m 4s
drive-scsi0: transferred 1.9 GiB of 50.0 GiB (3.80%) in 3m 5s
...
...
drive-scsi0: transferred 48.4 GiB of 50.0 GiB (96.74%) in 4m 48s
drive-scsi0: transferred 48.8 GiB of 50.0 GiB (97.62%) in 4m 49s
drive-scsi0: transferred 49.3 GiB of 50.0 GiB (98.57%) in 4m 50s
drive-scsi0: transferred 49.8 GiB of 50.0 GiB (99.51%) in 4m 51s
drive-scsi0: transferred 50.0 GiB of 50.0 GiB (100.00%) in 4m 52s, ready
all 'mirror' jobs are ready
2021-10-11 10:44:35 starting online/live migration on tcp:10.10.23.81:60000
2021-10-11 10:44:35 set migration capabilities
2021-10-11 10:44:35 migration downtime limit: 100 ms
2021-10-11 10:44:35 migration cachesize: 1.0 GiB
2021-10-11 10:44:35 set migration parameters
2021-10-11 10:44:35 start migrate command to tcp:10.10.23.81:60000
2021-10-11 10:44:36 migration active, transferred 723.5 MiB of 8.0 GiB VM-state, 1.9 GiB/s
2021-10-11 10:44:37 migration active, transferred 1.5 GiB of 8.0 GiB VM-state, 928.9 MiB/s
2021-10-11 10:44:38 migration active, transferred 2.1 GiB of 8.0 GiB VM-state, 1.7 GiB/s
2021-10-11 10:44:40 average migration speed: 1.6 GiB/s - downtime 80 ms
2021-10-11 10:44:40 migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job_id...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
2021-10-11 10:44:41 stopping NBD storage migration server on target.
  Logical volume "vm-195-disk-0" successfully removed
2021-10-11 10:44:47 migration finished successfully (duration 00:05:08)
TASK OK
------------------------

Code:

LIVE MIGRATE, DISCARD OFF
-------------------------
2021-10-11 10:54:07 use dedicated network address for sending migration traffic (10.10.23.81)
2021-10-11 10:54:07 starting migration of VM 195 to node 'vps16' (10.10.23.81)
2021-10-11 10:54:07 found local disk 'containers:vm-195-disk-0' (in current VM config)
2021-10-11 10:54:07 starting VM 195 on remote node 'vps16'
2021-10-11 10:54:10 volume 'containers:vm-195-disk-0' is 'containers:vm-195-disk-0' on the target
2021-10-11 10:54:10 start remote tunnel
2021-10-11 10:54:11 ssh tunnel ver 1
2021-10-11 10:54:11 starting storage migration
2021-10-11 10:54:11 scsi0: start migration to nbd:10.10.23.81:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred 0.0 B of 50.0 GiB (0.00%) in 0s
drive-scsi0: transferred 267.0 MiB of 50.0 GiB (0.52%) in 1s
drive-scsi0: transferred 542.0 MiB of 50.0 GiB (1.06%) in 2s
drive-scsi0: transferred 815.0 MiB of 50.0 GiB (1.59%) in 3s
drive-scsi0: transferred 1.1 GiB of 50.0 GiB (2.14%) in 4s
drive-scsi0: transferred 1.3 GiB of 50.0 GiB (2.68%) in 5s
drive-scsi0: transferred 1.6 GiB of 50.0 GiB (3.21%) in 6s
...
...
drive-scsi0: transferred 49.1 GiB of 50.0 GiB (98.20%) in 3m 6s
drive-scsi0: transferred 49.4 GiB of 50.0 GiB (98.71%) in 3m 7s
drive-scsi0: transferred 49.6 GiB of 50.0 GiB (99.26%) in 3m 8s
drive-scsi0: transferred 49.9 GiB of 50.0 GiB (99.77%) in 3m 9s
drive-scsi0: transferred 50.0 GiB of 50.0 GiB (100.00%) in 3m 10s, ready
all 'mirror' jobs are ready
2021-10-11 10:57:21 starting online/live migration on tcp:10.10.23.81:60000
2021-10-11 10:57:21 set migration capabilities
2021-10-11 10:57:21 migration downtime limit: 100 ms
2021-10-11 10:57:21 migration cachesize: 1.0 GiB
2021-10-11 10:57:21 set migration parameters
2021-10-11 10:57:21 start migrate command to tcp:10.10.23.81:60000
2021-10-11 10:57:22 migration active, transferred 564.3 MiB of 8.0 GiB VM-state, 6.2 GiB/s
2021-10-11 10:57:23 migration active, transferred 1.3 GiB of 8.0 GiB VM-state, 5.2 GiB/s
2021-10-11 10:57:24 average migration speed: 2.7 GiB/s - downtime 86 ms
2021-10-11 10:57:24 migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job_id...
drive-scsi0: Completed successfully.
drive-scsi0: mirror-job finished
2021-10-11 10:57:25 stopping NBD storage migration server on target.
  Logical volume "vm-195-disk-0" successfully removed
2021-10-11 10:57:30 migration finished successfully (duration 00:03:23)
TASK OK
-------------------------

Code:

OFFLINE MIGRATION, DISCARD ON
-----------------------------
2021-10-11 11:00:58 use dedicated network address for sending migration traffic (10.10.23.81)
2021-10-11 11:00:58 starting migration of VM 195 to node 'vps16' (10.10.23.81)
2021-10-11 11:00:58 found local disk 'containers:vm-195-disk-0' (in current VM config)
2021-10-11 11:00:58 copying local disk images
2021-10-11 11:02:25 819200+0 records in
2021-10-11 11:02:25 819200+0 records out
2021-10-11 11:02:25 53687091200 bytes (54 GB, 50 GiB) copied, 84.7417 s, 634 MB/s
2021-10-11 11:02:25 [vps16]   Logical volume "vm-195-disk-0" created.
2021-10-11 11:02:28 [vps16] 630303+424593 records in
2021-10-11 11:02:28 [vps16] 630303+424593 records out
2021-10-11 11:02:28 [vps16] 53687091200 bytes (54 GB, 50 GiB) copied, 88.0742 s, 610 MB/s
2021-10-11 11:02:28 [vps16] successfully imported 'containers:vm-195-disk-0'
2021-10-11 11:02:28 volume 'containers:vm-195-disk-0' is 'containers:vm-195-disk-0' on the target
  Logical volume "vm-195-disk-0" successfully removed
2021-10-11 11:02:29 migration finished successfully (duration 00:01:31)
TASK OK
-----------------------------

mira · Oct 12, 2021

I checked the QEMU code and it seems if `discard` is not set, the whole disk is set `dirty`, which means the whole disk will be copied.
If `discard` is set, it will zero the whole disk first.
That's not something we can control in any way, sadly.

robm · Oct 12, 2021

Thank you for looking into it. I wonder if there is a way to set an "ionice" setting on the disk zeroing to lessen its I/O impact when zeroing the disk. Since it's a live migration, a few extra minutes would not be an issue on the initial zeroing...

s-mendyka · May 18, 2022

We have the same Problem. Live Migration with discard "on" resulting into all VMs are Stuck because of high IO. We see with iotop write rates over 900MB/s so all other VMs not respondig anymore.

So last year we disabled discard on all VM´s. The result is: After a big failure with one Server (VM) with 1,2TB Disk ( and only 250GB used space) the PBS restore process used many time because the lv mapped size are 100% ( because of no trim..)...
so at this time Live Migration is not a way anymore, very bad

Is there some way to fix it? i cant find anything into the bugtracker from qemu like this.

im surprised, that i cant find so much threads in google

Regards
Sebastian

marciglesias17 · Sep 13, 2022

I have the same problem. https://forum.proxmox.com/threads/raidz1-pool-zfs-use-a-lot-of-cpu-on-live-migration.114745/

I have tried to migrate a machine with the discard disabled and it still happens, in order to migrate machines, does the discard have to be disabled in all the machines of the destination node?

DJP · Jan 26, 2023

I have exact same issue since proxmox 6. I did noticed it was because of zeroing, but didn't noticed it was because of the discard...

Any solution ?

Search

Search

VM live migration using lvm-thin with discard results in high I/O

robm

Member

mira

Proxmox Staff Member

robm

Member

mira

Proxmox Staff Member

robm

Member

mira

Proxmox Staff Member

robm

Member

s-mendyka

Member

marciglesias17

Well-Known Member

DJP

Active Member

We value your privacy