Zeroing of data prior to 'migrate'

Tozz

Active Member
Mar 11, 2012
31
0
26
We have a Proxmox VE cluster with 3 nodes, all using local storage. Using local storage is a choice we made due to some issues in the past with different storage solutions.

When we migrate a machine from node A to node B we are seeing IO-issues occuring on Node B. It's SSD drives are saturated causing other VMs to hang, give timeouts, etc. During this high IO wait there is no network traffic from the migrator, so it looks like the disks are being zeroed? We can see with 'lvs -a' that the newly created disk is beeing filled (Data% percentage increases). After data percentage is 100% we can see the actual migration start in the Proxmox WebUI:

Code:
2020-06-11 20:21:53 use dedicated network address for sending migration traffic (192.168.0.1)
2020-06-11 20:21:53 starting migration of VM 172 to node 'nlgrq1pm-p001' (192.168.0.1)
2020-06-11 20:21:54 found local disk 'thindata2:vm-172-disk-0' (in current VM config)
2020-06-11 20:21:54 copying local disk images
2020-06-11 20:21:54 starting VM 172 on remote node 'nlgrq1pm-p001'
2020-06-11 20:21:57 start remote tunnel
2020-06-11 20:21:58 ssh tunnel ver 1
2020-06-11 20:21:58 starting storage migration
2020-06-11 20:21:58 scsi0: start migration to nbd:unix:/run/qemu-server/172_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0 with bandwidth limit: 153600 KB/s

After the line 'drive mirror is starting for ...', the (I assume) zeroing begins. lvs -a output for the LV increases till it reaches 100%. During this process 'iotop' shows about 800 MB/s of IO traffic, while network traffic is very low (eg. kilobytes/s)

After 'lvs -a' shows the device is filled 100% the process continues:

Code:
drive-scsi0: transferred: 50331648 bytes remaining: 64374177792 bytes total: 64424509440 bytes progression: 0.08 % busy: 0 ready: 0
drive-scsi0: transferred: 201326592 bytes remaining: 64223182848 bytes total: 64424509440 bytes progression: 0.31 % busy: 0 ready: 0

And from here on it continues normally. Because it has to transfer over network the IO load is lower, and machines that experienced high IO-wait times resume their work.

A couple of questions:
- The zeroing of devices doesn't seem to occur when moving disks (eg. from storage0 to storage1 on the same node). Why the difference?
- Can we prevent the (I assume) zeroing of devices? It causes timeouts
 
Nobody that can help me? I've tried to disable zeroing on the LVM thin with "lvchange -Zn vg/lv", but that doesn't resolve the issue.
Could it be that it's the Proxmox tooling that is zeroing the drives before it actually starts the data transfer?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!