Live Migration / ZFS / Discard

Mar 24, 2024
3
0
1
Hello everyone.

A little bit of background
I'm quite new to the Proxmox world - I'm coming from the VMWare world and been implementing vSphere solutions as a systems engineer for over 15 years now. So I'm pretty familiar with virtualization. Due to the Broadcom fallout we've decided to abandon VMWare and use Proxmox from now on. I've been playing with Proxmox VE for the past month and I'm pretty happy with it. Now we are looking at clustering and migration. This is where I'm struggling. I've searched the interwebs and these forums but I can't find a answer / explanation for what I'm seeing happeing.

The System
We are running the following configuration:
- 2x HPE Proliant DL 380 Gen 10 Servers (one Node with 1 CPU and 128GB RAM, one Node with 2 CPUs and 256 GB RAM)
- Both systems are using local ZFS-RAID10 storage on 6x600GB HDDs. The disks are backed by a RAID controller in HBA mode.
- The two nodes are connected to the management/vm network with a 4x1GB Linux Bond and to the SAN network with 1x10GB
- One node is on 8.1.4 (with a subscription) the other on 8.1.5 (without a subscription) - both are reporting to have no updates available.

The Problem
I've got a Test VM running on Node1 and want to (live) Migrate it to Node2 / and back again.
The disk attached to the VM is using the VirtIO SCSI Single controller and with "Write Back" caching and Discard and SSD Emulation turned on. The disk is 100GiB big (thin provisioned - actual usage 33GiB)
When I migrate the VM the process takes 43Minutes. The target Node will use 80%-95% CPU for a long time and no network traffic is beeing generated, sitting at "drive mirror is starting for drive-scsi0".
Then after some time the Network traffic starts (300-400MiB/s). When the tranfer hits the 33GB mark (after which the VM disk is empty) data is still beeing transfered over the network but the target disk on the target node stops growing (I would expect that the transfer would be much faster after the actual data has been send, but that doesn't happen).

Now to the conondrum I can't get my head around
If I stop the VM disable "Discard" and "SSD Emulation", start the VM and do the migration again the process takes 4 Minutes the transfer starts instantly and does not wait at "drive mirror is starting for drive-scsi0". What still happens though is that the transfer continues into the "empty space" and does generate network traffic - why is date beeing transfered for a "empty disk"?

What's going on? Why does the discard flag makes the live migration take that much longer, and generate that much stress on the target CPU? I've read that migrating a VM with discard on will "pre fill the target disk with zeroes" but to my understanding this should not happen with ZFS? But what I'm seeing sure feels like the host would fill the disks with zeroes before starting the migration?

Can anyone shed some light on this issue for me please? We'd like to use discard in producation but also want to use live migration quite a bit. the best workaround so far is to shutdown the vm, remove the discard flag, start the vm, migrate the vm, stop the vm, enable discard, start the vm... which is not ideal for production servers.

I'm happy to share log files with you guys if that helps.

Thanks for any insight into this problem :)

Cheers

Daniel

(edited for typos)
 
Last edited:
I cannot comment on the issue at hand, because normally you won't do it that way and most people will not have experience doing it.
I want to show you ways that are MUCH faster:
  • use a shared storage, migration will be instant without copying files
  • use ZFS replication and do a switch in a second after the asynchronous transfer is finished
Everything else will be very slow and not a setup I would want to use in production.
 
I cannot comment on the issue at hand, because normally you won't do it that way and most people will not have experience doing it.
I want to show you ways that are MUCH faster:
  • use a shared storage, migration will be instant without copying files
  • use ZFS replication and do a switch in a second after the asynchronous transfer is finished
Everything else will be very slow and not a setup I would want to use in production.
Hi :)
Thanks for the reply. A shared storage would be ideal, yes - but that's out of our budged range atm. Could CEPH be an alternative?

Can you elaborate on ZFS replication? How would that work? I never used that before.

Thanks!
 
Thanks for the reply. A shared storage would be ideal, yes - but that's out of our budged range atm. Could CEPH be an alternative?
That is a distributed shared storage and totally fine for this (if you have at least 3 nodes).

The two nodes are connected to the management/vm network with a 4x1GB Linux Bond and to the SAN network with 1x10GB
What's that SAN then? This is a dedicated shared storage that can be used with PVE.


Can you elaborate on ZFS replication? How would that work? I never used that before.
Sure, this chapter in the reference documentation.
 
That is a distributed shared storage and totally fine for this (if you have at least 3 nodes).
I'll give that a good look then. We'll have 3 nodes once we get rid off all ESXi Hosts

What's that SAN then? This is a dedicated shared storage that can be used with PVE.
This is "just" e dedicated 10Gbit Network for Cluster and Storage traffic - so the PVEs will have an exclusive 10Gbit link to talk to each other and transfer data

Cheers! I'm looking at replication as we speak and this might just be the answer to all I've been doing "wrong" with migration! Thanks for pointing me in the right direction!
 
I'll give that a good look then. We'll have 3 nodes once we get rid off all ESXi Hosts
Then CEPH is the way to go and you don't need to invest time in ZFS replication, which is mainly for a kind of standby system and mostly used in a two-independendet-node-scenario. CEPH is a hyperconverged cluster and exactly what you need. If I understand you correctly, you may have a chicken-egg-problem while migrating from the machines that will be the PVE cluster, so you may still need to invest in ZFS at least for the time of the migration.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!