Import from ESXi Extremely Slow

Trying to engineer out a way for the qcow2 imports to work with compression, it'd be a significant time saving in transit with sparse disks. But having a problem with the qcow2 conversion process not reading the data fast enough from the buffer, which slows the whole import down.

RAW is getting the expected 250MB/s over the line which equates out to 500MB/s after decompress (due to lots of sparse space). I'm not exactly sure why I can't get the compressed pipe to hit 500MB/s though.

So currently what works:
✅ RAW without compression
✅ RAW with compression
✅ qcow2 without compression

I'll work on it more on Monday.

@kaliszad brought something up to me the otherday about how Vates recently implemented VDDK into their import process with XCP-ng:

https://github.com/vatesfr/xen-orchestra/pull/8840
https://xen-orchestra.com/blog/xen-orchestra-5-110/

In the best case, when using VMs with many snapshots or mostly empty disks, migrations can be up to 100 times faster. In our high-performance lab, we measured around 150 MB/s per disk and up to 500 MB/s total, which means an infrastructure with 10 TB of data could be migrated in a single day, with less than five minutes of downtime per VM.
Interestingly they are hitting the same limitations as the current Proxmox import tool with the ~150MB/s per disk. Then they are also hitting the 500MB/s limit that I'm seeing with a single TCP stream. Not even sure if this could be implemented in a sane way due to the qcow2 thing again.

The main issue with qcow2 needs to kinda/sorta read the whole file, not as a stream, when it's converting. So that'd where my FUSE trick comes in, it's showing the qemu-img dd process that it's a whole file, not a stdout coming in. The FUSE allows the small seeks that the qcow2 conversion to have by keeping a small chunk of data in the buffer. But that buffer is having a fit with the compression, I'm not sure if it's because the decompress part gets to the large sparse area towards the end of the disk and starts hammering the buffer or what, but the FUSE isn't liking it lol.

And option would be to just not allow compression with qcow2 set as the output format. I'll need to do some speed tests to see if that's even worth implementing, if it's the same speed or slower than the existing HTTP/2.0 method, it's not much use.

NGL, very frustrating lol. I need to do a benchmark to see how fast an import happens when you bypass ESXi all together and just attached the vmfs off the storage directly in Proxmox using the vmfs6 tools and then run an import that way.
 
  • Like
Reactions: robertlukan
More coding today, so far so good.

1761566825323.png

So at the very least, it's almost 2x faster that the built in import method for qcow2 and about 4 times faster with raw+compression.

I'm going to work on documentation and some other optimizations. Plus some features to make it easier to use, at the moment you cannot use this from the GUI and I think the only option I can reasonably do to have it available in the GUI is by having the program check the following and if all are true, use the netcat approach:

1. Is their valid SSH keys to the ESXi host from the Proxmox node
2. Is the firewall on the ESXi off or allow specific range of ports to be used for netcat
3. Does Proxmox have "pigz" installed

If all pass, I can have it attempt the import from the GUI, making it easy peasy as possible. The only issue I can think that might come of this is by default the core count for compression is set to 8 on the ESXi side, any less and the performance was kinda meh, but if you kick off too many imports at once it's possible to run into an OOM error and crash the import process. It seems pigz has a limitation that it only uses the first physical socket available, so even if you have 128 cores on the machine across 2 sockets, it can only use the first 64. I could, in theory, have it check to see if there's any more available cores for compression and if so go ahead and/or reduce the core count for compression or simply have the import wait until some cores are available.

That's a little much to be doing in the background without the user knowing, so I'm hesitant to do that, but it is an option to go with.

Edit:

Here's a more accurate table with the HTTP 2.0 mode included
1761570025903.png
 
Last edited:
  • Like
Reactions: waltar