Import from ESXi Extremely Slow

spencerh

Member
May 18, 2021
9
0
21
33
I've recently been experimenting with the ESXi import process in preparation for a migration from VMware to Proxmox. I'm having an issue where the disk import process runs extremely slowly. I've got a 10 Gb link between the ESXi host and Proxmox (verified with `iperf3` tests between the boxes on the same interface used for import) but the import runs at around 350 Mbps. I read other places online that setting `Config.HostAgent.vmacore.soap.maxSessionCount` to `0` on the ESXi host should help with API throttling, but I have seen the same result after modifying that setting.

Does anyone have any ideas about why the transfers from ESX are so slow? Should we be expecting in terms of speed importing VMs over a 10 Gb link?

I'm running Proxmox 9.0.10 and ESXi 7.0u3.
 
Did you take a look at your cpu usage of import processing in "top", maybe you hit your limit there ?
 
From what I'm able to tell it doesn't seem that I'm hitting CPU limits. On the Proxmox side the import tool is using 11% of a CPU core and on the ESXi side I don't see anything using a full core.
 
We have observed the same thing here. I believe it's a limit of the ESXi API though, as I'm pretty sure the import process is using the ESXi API to pull down the chunks of data. We have dual 25GbE between the machines and an import of a 60GB VM takes about 30 mins. However, there seems to be a behavior that the larger the VM, the longer it takes. As in, it doesn't scale the same, instead of 120GB/hr, it's more like 30GB/hr. It took about 3 days for a 7TB VM to import.

I'm doing some tests right now to verify if that is indeed the issue. Going to attach the storage that ESXi uses to the PVE host and use `qm import disk` and see what the speed is vs importing using the VMware import tool.
 
I've seen similar results in my limited testing. I'm able to pull the machines across the wire much faster using `scp` but then you lose out on the ESXi importer magic. You can go create the VMs and import the disks by hand but I was hoping to avoid doing that as it adds a lot more opportunity to screw things up, especially when migrating hundreds of VMs.
 
Did a quick test on a 1GbE connection:
qm import: 8m 50s
esxi import: 12m 34s

Once I get the environment setup, I'll test it on a 2x25GbE -> 2x25GbE connection.

But the import is definitely slower through the import tool.

No load on the PVE when this is importing either.
1759857981341.png
 
For reference, we're doing a 2x10Gb -> 2x10Gb transfer and we're seeing a rate of roughly 50GiB import in 20 min.

EDIT: When we do the same import via Veeam restore we're able to import the disk in ~6 minutes.
 
Okay, I'm going to preface this with that I do not know Rust, but if I'm reading this correctly, it looks like the ESXi import can do 4 calls at once for pulling data.

This is what I think is happening:

It kicks off the make_request function
code_language.rust:
    async fn make_request<F>(&self, mut make_req: F) -> Result<Response<Body>, Error>
    where
        F: FnMut() -> Result<http::request::Builder, Error>,
    {
        let mut permit = self.connection_limit.acquire().await;

      ....

In there you see where it says, self.connection_limit.acquire, I believe that's pulling the value from the ConnectionLimit, here

code_language.rust:
impl ConnectionLimit {
    fn new() -> Self {
        Self {
            requests: tokio::sync::Semaphore::new(4),
            retry: tokio::sync::Mutex::new(()),
        }
    }

    /// This is supposed to cover an entire request including its retries.
    async fn acquire(&self) -> SemaphorePermit<'_> {
        // acquire can only fail when the semaphore is closed...
        self.requests
            .acquire()
            .await
            .expect("failed to acquire semaphore")
    }

Which, if I'm reading correctly, is hard coded to 4.

This is from the PVE-ESXI-IMPORT-TOOLS repo
https://git.proxmox.com/?p=pve-esxi-import-tools.git;a=summary

And I'm guessing the download_do function picks the next chunk of data to be pulled and asks for it

code_language.rust:
async fn download_do(
        &self,
        query: &str,
        range: Option<Range<u64>>,
    ) -> Result<(Bytes, Option<u64>), Error> {
        let (parts, body) = self
            .make_request(|| {
                let mut req = Request::get(query);

                if let Some(range) = &range {
                    req = req.header(
                        "range",
                        &format!("bytes={}-{}", range.start, range.end.saturating_sub(1)),
                    )
                }

                Ok(req)
            })
            .await?
            .into_parts();

They may have picked a really low number for the import just to be on the safe side, similar to how the backup jobs were originally being done to PBS a few months ago.

Again, if I'm reading this right, maybe a test would be to change the 4 to a 6 or to 8 and see if the speed scales with the increased number of calls.
 
I remember some users had similiar issues if they had any snapshots in ESXi. Do you happen to have any snapshots? Could you try to remove them?

You could also try one of the other migration methods, e.g. this one for minimal downtime:
https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE#Attach_Disk_&_Move_Disk_(minimal_downtime)
Sorry forgot to mention this in my original post. No, I have no snapshots on the machine. We do have some other options, but I'll admit that at this point my curiosity has gotten the better of me and, options aside, I'd like to understand what's going on here and why it's so slow.
 
Okay, I'm going to preface this with that I do not know Rust, but if I'm reading this correctly, it looks like the ESXi import can do 4 calls at once for pulling data.

This is what I think is happening:

It kicks off the make_request function
code_language.rust:
    async fn make_request<F>(&self, mut make_req: F) -> Result<Response<Body>, Error>
    where
        F: FnMut() -> Result<http::request::Builder, Error>,
    {
        let mut permit = self.connection_limit.acquire().await;

      ....

In there you see where it says, self.connection_limit.acquire, I believe that's pulling the value from the ConnectionLimit, here

code_language.rust:
impl ConnectionLimit {
    fn new() -> Self {
        Self {
            requests: tokio::sync::Semaphore::new(4),
            retry: tokio::sync::Mutex::new(()),
        }
    }

    /// This is supposed to cover an entire request including its retries.
    async fn acquire(&self) -> SemaphorePermit<'_> {
        // acquire can only fail when the semaphore is closed...
        self.requests
            .acquire()
            .await
            .expect("failed to acquire semaphore")
    }

Which, if I'm reading correctly, is hard coded to 4.

This is from the PVE-ESXI-IMPORT-TOOLS repo
https://git.proxmox.com/?p=pve-esxi-import-tools.git;a=summary

And I'm guessing the download_do function picks the next chunk of data to be pulled and asks for it

code_language.rust:
async fn download_do(
        &self,
        query: &str,
        range: Option<Range<u64>>,
    ) -> Result<(Bytes, Option<u64>), Error> {
        let (parts, body) = self
            .make_request(|| {
                let mut req = Request::get(query);

                if let Some(range) = &range {
                    req = req.header(
                        "range",
                        &format!("bytes={}-{}", range.start, range.end.saturating_sub(1)),
                    )
                }

                Ok(req)
            })
            .await?
            .into_parts();

They may have picked a really low number for the import just to be on the safe side, similar to how the backup jobs were originally being done to PBS a few months ago.

Again, if I'm reading this right, maybe a test would be to change the 4 to a 6 or to 8 and see if the speed scales with the increased number of calls.
I'm going to see if I can figure out how to get it to compile (I am also not a Rust developer) with that value changed to 8 and see if it has any impact on throughput. This is a great find, thanks for doing the legwork on this.
 
Tested a nfs accessed linux vm vmdk on T430 32threads, create a vm with 2cores, 2GB, disk 0.001G: 2s, qemu-ing convert -t writeback -f vmdk -O raw linux-vm.vmdk vm-117-disk-0.raw : created 30G in 4min24s on nfs (~ 130% core usage), mv to nfs-path/images/117/.: 2s.or direct give path to qemu-img cmd.
This could be done easily a lot in parallel by script until you reach your limit on I/O on esxi or network limit to/from esxi or to local or remote pve nfs storage I(O or even cpu limit. After an image is converted (while one by one get done in parallel) maybe move around vm.conf on pve hosts in /etc/pve/nodes/<node>/qemuserver/. and just start or fine tune cpu/mem before start. Won't do that without a script
:)