Hello everyone.
A little bit of background
I'm quite new to the Proxmox world - I'm coming from the VMWare world and been implementing vSphere solutions as a systems engineer for over 15 years now. So I'm pretty familiar with virtualization. Due to the Broadcom fallout we've decided to abandon VMWare and use Proxmox from now on. I've been playing with Proxmox VE for the past month and I'm pretty happy with it. Now we are looking at clustering and migration. This is where I'm struggling. I've searched the interwebs and these forums but I can't find a answer / explanation for what I'm seeing happeing.
The System
We are running the following configuration:
- 2x HPE Proliant DL 380 Gen 10 Servers (one Node with 1 CPU and 128GB RAM, one Node with 2 CPUs and 256 GB RAM)
- Both systems are using local ZFS-RAID10 storage on 6x600GB HDDs. The disks are backed by a RAID controller in HBA mode.
- The two nodes are connected to the management/vm network with a 4x1GB Linux Bond and to the SAN network with 1x10GB
- One node is on 8.1.4 (with a subscription) the other on 8.1.5 (without a subscription) - both are reporting to have no updates available.
The Problem
I've got a Test VM running on Node1 and want to (live) Migrate it to Node2 / and back again.
The disk attached to the VM is using the VirtIO SCSI Single controller and with "Write Back" caching and Discard and SSD Emulation turned on. The disk is 100GiB big (thin provisioned - actual usage 33GiB)
When I migrate the VM the process takes 43Minutes. The target Node will use 80%-95% CPU for a long time and no network traffic is beeing generated, sitting at "drive mirror is starting for drive-scsi0".
Then after some time the Network traffic starts (300-400MiB/s). When the tranfer hits the 33GB mark (after which the VM disk is empty) data is still beeing transfered over the network but the target disk on the target node stops growing (I would expect that the transfer would be much faster after the actual data has been send, but that doesn't happen).
Now to the conondrum I can't get my head around
If I stop the VM disable "Discard" and "SSD Emulation", start the VM and do the migration again the process takes 4 Minutes the transfer starts instantly and does not wait at "drive mirror is starting for drive-scsi0". What still happens though is that the transfer continues into the "empty space" and does generate network traffic - why is date beeing transfered for a "empty disk"?
What's going on? Why does the discard flag makes the live migration take that much longer, and generate that much stress on the target CPU? I've read that migrating a VM with discard on will "pre fill the target disk with zeroes" but to my understanding this should not happen with ZFS? But what I'm seeing sure feels like the host would fill the disks with zeroes before starting the migration?
Can anyone shed some light on this issue for me please? We'd like to use discard in producation but also want to use live migration quite a bit. the best workaround so far is to shutdown the vm, remove the discard flag, start the vm, migrate the vm, stop the vm, enable discard, start the vm... which is not ideal for production servers.
I'm happy to share log files with you guys if that helps.
Thanks for any insight into this problem
Cheers
Daniel
(edited for typos)
A little bit of background
I'm quite new to the Proxmox world - I'm coming from the VMWare world and been implementing vSphere solutions as a systems engineer for over 15 years now. So I'm pretty familiar with virtualization. Due to the Broadcom fallout we've decided to abandon VMWare and use Proxmox from now on. I've been playing with Proxmox VE for the past month and I'm pretty happy with it. Now we are looking at clustering and migration. This is where I'm struggling. I've searched the interwebs and these forums but I can't find a answer / explanation for what I'm seeing happeing.
The System
We are running the following configuration:
- 2x HPE Proliant DL 380 Gen 10 Servers (one Node with 1 CPU and 128GB RAM, one Node with 2 CPUs and 256 GB RAM)
- Both systems are using local ZFS-RAID10 storage on 6x600GB HDDs. The disks are backed by a RAID controller in HBA mode.
- The two nodes are connected to the management/vm network with a 4x1GB Linux Bond and to the SAN network with 1x10GB
- One node is on 8.1.4 (with a subscription) the other on 8.1.5 (without a subscription) - both are reporting to have no updates available.
The Problem
I've got a Test VM running on Node1 and want to (live) Migrate it to Node2 / and back again.
The disk attached to the VM is using the VirtIO SCSI Single controller and with "Write Back" caching and Discard and SSD Emulation turned on. The disk is 100GiB big (thin provisioned - actual usage 33GiB)
When I migrate the VM the process takes 43Minutes. The target Node will use 80%-95% CPU for a long time and no network traffic is beeing generated, sitting at "drive mirror is starting for drive-scsi0".
Then after some time the Network traffic starts (300-400MiB/s). When the tranfer hits the 33GB mark (after which the VM disk is empty) data is still beeing transfered over the network but the target disk on the target node stops growing (I would expect that the transfer would be much faster after the actual data has been send, but that doesn't happen).
Now to the conondrum I can't get my head around
If I stop the VM disable "Discard" and "SSD Emulation", start the VM and do the migration again the process takes 4 Minutes the transfer starts instantly and does not wait at "drive mirror is starting for drive-scsi0". What still happens though is that the transfer continues into the "empty space" and does generate network traffic - why is date beeing transfered for a "empty disk"?
What's going on? Why does the discard flag makes the live migration take that much longer, and generate that much stress on the target CPU? I've read that migrating a VM with discard on will "pre fill the target disk with zeroes" but to my understanding this should not happen with ZFS? But what I'm seeing sure feels like the host would fill the disks with zeroes before starting the migration?
Can anyone shed some light on this issue for me please? We'd like to use discard in producation but also want to use live migration quite a bit. the best workaround so far is to shutdown the vm, remove the discard flag, start the vm, migrate the vm, stop the vm, enable discard, start the vm... which is not ideal for production servers.
I'm happy to share log files with you guys if that helps.
Thanks for any insight into this problem
Cheers
Daniel
(edited for typos)
Last edited: