Import from ESXi Extremely Slow

Trying to engineer out a way for the qcow2 imports to work with compression, it'd be a significant time saving in transit with sparse disks. But having a problem with the qcow2 conversion process not reading the data fast enough from the buffer, which slows the whole import down.

RAW is getting the expected 250MB/s over the line which equates out to 500MB/s after decompress (due to lots of sparse space). I'm not exactly sure why I can't get the compressed pipe to hit 500MB/s though.

So currently what works:
✅ RAW without compression
✅ RAW with compression
✅ qcow2 without compression

I'll work on it more on Monday.

@kaliszad brought something up to me the otherday about how Vates recently implemented VDDK into their import process with XCP-ng:

https://github.com/vatesfr/xen-orchestra/pull/8840
https://xen-orchestra.com/blog/xen-orchestra-5-110/

In the best case, when using VMs with many snapshots or mostly empty disks, migrations can be up to 100 times faster. In our high-performance lab, we measured around 150 MB/s per disk and up to 500 MB/s total, which means an infrastructure with 10 TB of data could be migrated in a single day, with less than five minutes of downtime per VM.
Interestingly they are hitting the same limitations as the current Proxmox import tool with the ~150MB/s per disk. Then they are also hitting the 500MB/s limit that I'm seeing with a single TCP stream. Not even sure if this could be implemented in a sane way due to the qcow2 thing again.

The main issue with qcow2 needs to kinda/sorta read the whole file, not as a stream, when it's converting. So that'd where my FUSE trick comes in, it's showing the qemu-img dd process that it's a whole file, not a stdout coming in. The FUSE allows the small seeks that the qcow2 conversion to have by keeping a small chunk of data in the buffer. But that buffer is having a fit with the compression, I'm not sure if it's because the decompress part gets to the large sparse area towards the end of the disk and starts hammering the buffer or what, but the FUSE isn't liking it lol.

And option would be to just not allow compression with qcow2 set as the output format. I'll need to do some speed tests to see if that's even worth implementing, if it's the same speed or slower than the existing HTTP/2.0 method, it's not much use.

NGL, very frustrating lol. I need to do a benchmark to see how fast an import happens when you bypass ESXi all together and just attached the vmfs off the storage directly in Proxmox using the vmfs6 tools and then run an import that way.
 
  • Like
Reactions: robertlukan
More coding today, so far so good.

1761566825323.png

So at the very least, it's almost 2x faster that the built in import method for qcow2 and about 4 times faster with raw+compression.

I'm going to work on documentation and some other optimizations. Plus some features to make it easier to use, at the moment you cannot use this from the GUI and I think the only option I can reasonably do to have it available in the GUI is by having the program check the following and if all are true, use the netcat approach:

1. Is their valid SSH keys to the ESXi host from the Proxmox node
2. Is the firewall on the ESXi off or allow specific range of ports to be used for netcat
3. Does Proxmox have "pigz" installed

If all pass, I can have it attempt the import from the GUI, making it easy peasy as possible. The only issue I can think that might come of this is by default the core count for compression is set to 8 on the ESXi side, any less and the performance was kinda meh, but if you kick off too many imports at once it's possible to run into an OOM error and crash the import process. It seems pigz has a limitation that it only uses the first physical socket available, so even if you have 128 cores on the machine across 2 sockets, it can only use the first 64. I could, in theory, have it check to see if there's any more available cores for compression and if so go ahead and/or reduce the core count for compression or simply have the import wait until some cores are available.

That's a little much to be doing in the background without the user knowing, so I'm hesitant to do that, but it is an option to go with.

Edit:

Here's a more accurate table with the HTTP 2.0 mode included
1761570025903.png
 
Last edited:
  • Like
Reactions: waltar
I came to this thread because i am able to achieve a import speed of 80GB in 16 minutes.
That is to slow for a decent migration, i am using the import tool on a dedicated 100gbit networkcard in proxmox.

My source (ESX) and destination (Proxmox) have the same hardware.
Each server consists of:
2 x AMD EPYC 9554
2 x 100Gbit AOC-S100GC-I2C-O on the same switch (proxmox and vmware)
2 x Crucial Micron 7450 (raid 1 on esx, zfs mirror on proxmox)

Both are connected to purestorage C and X with ISCSI, Proxmox is shared lvm (raw) and vmware vmfs on iscsi.
I tried the following
ESX pure-C to proxmox pure-C
ESX pure-C to proxmox pure-x
ESX pure-x to proxmox pure-x
ESX pure-x to proxmox IBM FLashstorage

All have the same result, also the following.
ESX raid1 nvme to proxmox zfs mirror nvme
ESX raid1 nvme to proxmox Pure-x
ESX raid1 nvme to proxmox IBM Flashsystem

The hosts ESX and proxmox do not show any increase in load, just a little cpu usage.
CPU usage is on both hosts under 5%
I did a speed test between 2 proxmox nodes with the same hardware as above.
On both nodes i installed a debian trixie vm and setup multique on the nic and was able to reach 198Gbits/sec (lacp)

On the vmware side i also tested with the following values changed and standerd and always rebooted the host after, but does not makes a diffrence.
Config.HostAgent.vmacore.soap.maxSessionCount == 0
Config.HostAgent.vmacore.soap.sessionTimeout == 0

Also updated to the latest version of proxmox 9.0.11 with Linux 6.14.11-4-pve, same results.
I also consolidated each vm i tried migrating to be absolutly sure all are not containing snapshots.

I am going to try a backup and resore with veeam to check if that goes any faster.
On the esx host i see combined traffic arround 1,2Gbit but this is up and down, the half of it is iscsi to the source pure.
 
Last edited:
My source (ESX) and destination (Proxmox) have the same hardware.
Each server consists of:
2 x AMD EPYC 9554
2 x 100Gbit AOC-S100GC-I2C-O on the same switch (proxmox and vmware)
2 x Crucial Micron 7450 (raid 1 on esx, zfs mirror on proxmox)

Both are connected to purestorage C and X with ISCSI, Proxmox is shared lvm (raw) and vmware vmfs on iscsi.
I tried the following
ESX pure-C to proxmox pure-C
ESX pure-C to proxmox pure-x
ESX pure-x to proxmox pure-x
ESX pure-x to proxmox IBM FLashstorage

This is pretty much the EXACT setup we are using.

I'm about to test this setup, but maybe you can try this before I get to it:

Try attaching the LUN that ESXi is using to the Proxmox and then use vmfs6-tools to mount the volume.
https://www.nakivo.com/blog/mount-vmfs-datastore-in-linux-windows-esxi/

From there try to import using the qm import tool in CLI.

In theory, it should be as fast as the network and storage can go, provided the qemu import can keep up.
 
This is pretty much the EXACT setup we are using.

I'm about to test this setup, but maybe you can try this before I get to it:

Try attaching the LUN that ESXi is using to the Proxmox and then use vmfs6-tools to mount the volume.
https://www.nakivo.com/blog/mount-vmfs-datastore-in-linux-windows-esxi/

From there try to import using the qm import tool in CLI.

In theory, it should be as fast as the network and storage can go, provided the qemu import can keep up.
I wil try this later this week with a new LUN not with the production LUN.
I also ran a restore from Veeam, this is going with 330M in the webinterface (not sure what mesuremeant this is), but the import is going with 80M
 
I wil try this later this week with a new LUN not with the production LUN.
I also ran a restore from Veeam, this is going with 330M in the webinterface (not sure what mesuremeant this is), but the import is going with 80M

The fastest speed we have hit is 140MB/s, measuring from the NIC. With my modified binary you should be able to hit 500MB/s, which seems to be a limitation on the ESXi side. There seems to be some arbitrary limitation in ESXi that limits a single TCP stream to 500MB/s. Which is exactly 4x1GbE connections.

But, if you can do this mount method, you're bypassing ESXi entirely and not dealing with it.

From what I can see, the big flaw with the built in Proxmox tool is it's using HTTP/2 which is somewhere in ESXi limited to like 30MB/s per API call, and it seems to run 4 calls at the same time. You can increase the hardcoded call count and it really doesn't do any better. Then, if you bypass the HTTP method and use my modified NETCAT+DD method, it will max out at 500MB/s - Again, another limitation somewhere in ESXi. This has been noted on posts across the internet that some TCP streams are just maxed out in ESXi.

Again, in theory, provided qemu import can keep up with the import, you should be getting above 500MB/s.
 
I wil try this later this week with a new LUN not with the production LUN.
I also ran a restore from Veeam, this is going with 330M in the webinterface (not sure what mesuremeant this is), but the import is going with 80M

Ultimately we decided to also go with the Veeam restore route for our migration. There's some unfortunate downsides (needing to change the hardware after the migration, no migration progress from the Proxmox side) but ultimately we decided that the extra work needed would still put us in the black when it came to the migration process overall, simply because of how much faster the Veeam restore was.
 
There's some unfortunate downsides (needing to change the hardware after the migration
You'd still have to do that anyway even with the built in Proxmox tool. You have to import Windows VMs as SATA and then add a separate SCSI disk for Windows to activate the driver, then shut down the VM, detach the primary disk and re-attach it as SCSI, then remove the disk you used to activate the driver.
 
  • Like
Reactions: Johannes S
You'd still have to do that anyway even with the built in Proxmox tool. You have to import Windows VMs as SATA and then add a separate SCSI disk for Windows to activate the driver, then shut down the VM, detach the primary disk and re-attach it as SCSI, then remove the disk you used to activate the driver.
You can prepare the vm before migration to skip these steps and make the drivers ready immediately. See here and here