Import from ESXi Extremely Slow

Let me know if there's anything else I can do to provide more useful information. I'm going to keep fiddling around and doing some testing next week (starting Tuesday, it's Canadian Thanksgiving) to see if I can make it do... something different.

Thanks again for all the effort you're putting into this.

I'm working on a different approach, hopefully I can have it done by tomorrow though. I created another branch in the Github page that says "direct-send", with this version it creates 8 SSH connections to the ESXi host and pulls the data down using dd. FUSE is used to have one path to pull from. In testing it's getting about 2x what HTTP does over 1GbE (again, haven't got 25GbE working)

https://github.com/PwrBank/pve-esxi-import-tools/tree/direct-send

I'm still working on getting the GUI part working though. But, in theory, this should completely bypass the HTTP limit. The caveat is the ESXi host will need SSH enabled, which I can imagine would be a fair trade off if it does actually increase performance.

I'll keep you informed, thanks for testing these out as well @spencerh!
 
  • Like
Reactions: Johannes S
@spencerh
Give the new release a try
https://github.com/PwrBank/pve-esxi-import-tools/tree/direct-send

You just need to setup SSH keys from the PVE to the ESXi hosts, it should automatically try to create 16 SSH connections and pull the VM down from the ESXi host using dd.

You can check the logs to see if it is using ssh+dd or falling back to HTTP mode

Bash:
journalctl -t esxi-folder-fuse --since "5 minutes ago" | grep -E "SSH connection|HTTP"

Let me know how that goes!
 
OK so I did a little bit of playing around with the 1.1.2 release this afternoon. Here's some questions and findings in no particular order:
  1. Does it fallback to HTTP whenever a password is provided? I don't seem to be able to figure out a way to get the GUI to let me add the ESX storage without providing a password. When I attach the storage through the UI I see the following process running, which implies to me that it's using a password:

    /usr/libexec/pve-esxi-import-tools/esxi-folder-fuse --skip-cert-verification --change-user nobody --change-group nogroup -o allow_other --ready-fd 10 --user root --password-file /etc/pve/priv/storage/MY-ESX.pw my-esx.example.com /run/pve/import/esxi/MY-ESX/manifest.json /run/pve/import/esxi/MY-ESX/mnt

    I assume from my testing results that it's still using SSH even when the password file is provided (assuming it can reach it over SSH) but I wanted to double-check since I wasn't 100% sure.

  2. I tried mounting the CLI directly but I'm a bit confused about the manifest.json. Based on the CLI that the GUI executes to mount the directory, it's pointing at the manifest.json on the mount. However, when I tried running:

    /usr/libexec/pve-esxi-import-tools/esxi-folder-fuse --skip-cert-verification --change-user nobody --change-group nogroup -o allow_other --user root my-esx.example.com /run/pve/import/esxi/MY-ESX/manifest.json /run/pve/import/esxi/MY-ESX/mnt

    it throws an error saying Error: failed to read manifest. That obviously makes some sense, since it can't read the manifest from a mount point that isn't mounted yet.

  3. I next tried mounting the storage through the UI to gain access to the manifest.json and then mounting the storage again to a second mount point:
    /usr/libexec/pve-esxi-import-tools/esxi-folder-fuse --skip-cert-verification --change-user nobody --change-group nogroup -o allow_other --user root my-esx.example.com /run/pve/import/esxi/MY-ESX/manifest.json /tmp/mnt

    This worked, but when I explored the datastores that were exposed through the new mountpoint they all appeared to be empty:

    Code:
    root@pve:~# ls /tmp/mnt/ha-datacenter/*
    /tmp/mnt/ha-datacenter/fs72_misc_vol:
    
    /tmp/mnt/ha-datacenter/fs72_test_datastore_01:
    
    /tmp/mnt/ha-datacenter/fs72_test_datastore_02:
    
    /tmp/mnt/ha-datacenter/fs72_test_datastore_03:

    This also appeared to be the case with the normally mounted storage, so maybe I'm just missing something about how this works:

    Code:
    root@hertz:~# ls /run/pve/import/esxi/MY-ESX/mnt/ha-datacenter/*
    /run/pve/import/esxi/MY-ESX/mnt/ha-datacenter/fs72_misc_vol:
    
    /run/pve/import/esxi/MY-ESX/mnt/ha-datacenter/fs72_test_datastore_01:
    
    /run/pve/import/esxi/MY-ESX/mnt/ha-datacenter/fs72_test_datastore_02:
    
    /run/pve/import/esxi/MY-ESX/mnt/ha-datacenter/fs72_test_datastore_03:

  4. Getting into my actual import tests, I'm still seeing basically exactly the same performance. I can see the SSH connections transferring data at ~20MB/s, but I never see more than 2-3 connections simultaneously:

    Screenshot 2025-10-14 at 4.48.33 PM.png

    When I run watch 'pgrep ssh | xargs ps -ww' I can see the dd processes being executed over SSH, but I never see more than one:

    ssh -o BatchMode=yes -o StrictHostKeyChecking=no root@my-esx.example.com dd if='/vmfs/volumes/fs72_test_datastore_03/TESTVM/TESTVM_1-flat.vmdk' bs=1048576 skip=19328 count=128 2>/dev/null
  5. I did a test using the SSH command to try to see what I could get going if I had nothing in between. I ran

    ssh -o BatchMode=yes -o StrictHostKeyChecking=no root@my-esx.example.com dd if='/vmfs/volumes/fs72_test_datastore_03/TESTVM/TESTVM_1-flat.vmdk' bs=1048576 | pv > /dev/null

    and was seeing speeds around 40MiB/s, but still far below my expected ~1000 MiB/s. I'm going to do some more experimentation on the ESX side tomorrow to see if maybe there's something misconfigured on that side that I'm missing that's causing the speeds to be slower than expected.

  6. I do see the expected messages in the logs:

    Code:
    root@pve:~# journalctl -t esxi-folder-fuse --since "30 minutes ago" -f | grep -E "SSH connection|HTTP"Oct 14 16:26:24 pve esxi-folder-fuse[29081]: SSH connection successful - using SSH streaming mode

  7. Just to make sure I'm not crazy, here's my iPerf3 test showing the connection running at ~10Gb/s:

    Proxmox (acting as client):
    Code:
    root@pve:~# iperf3 -c 192.168.0.123 -t 10 -i 5 -f g -p 8000
    Connecting to host 192.168.0.123, port 8000
    [  5] local 192.168.0.150 port 41206 connected to 192.168.0.123 port 8000
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-5.00   sec  5.19 GBytes  8.91 Gbits/sec  1357   2.00 MBytes
    [  5]   5.00-10.00  sec  5.39 GBytes  9.25 Gbits/sec    0   2.00 MBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec  10.6 GBytes  9.08 Gbits/sec  1357            sender
    [  5]   0.00-10.00  sec  10.6 GBytes  9.08 Gbits/sec                  receiver
    
    iperf Done.

    ESX (acting as server):
    Code:
    [root@my-esx:~] /usr/lib/vmware/vsan/bin/iperf3.copy -s -B 192.168.0.123 -p 8000
    -----------------------------------------------------------
    Server listening on 8000 (test #1)
    -----------------------------------------------------------
    Accepted connection from 192.168.0.150, port 41198
    [  5] local 192.168.0.123 port 8000 connected to 192.168.0.150 port 41206
    iperf3: getsockopt - Function not implemented
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-1.00   sec   877 MBytes  7.35 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   1.00-2.00   sec  1.09 GBytes  9.35 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   2.00-3.00   sec  1.07 GBytes  9.24 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   3.00-4.00   sec  1.09 GBytes  9.35 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   4.00-5.00   sec  1.08 GBytes  9.25 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   5.00-6.00   sec  1.07 GBytes  9.24 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   6.00-7.00   sec  1.07 GBytes  9.24 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   7.00-8.00   sec  1.08 GBytes  9.24 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   8.00-9.00   sec  1.08 GBytes  9.30 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]   9.00-10.00  sec  1.07 GBytes  9.24 Gbits/sec
    iperf3: getsockopt - Function not implemented
    [  5]  10.00-10.00  sec  4.12 MBytes  8.85 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-10.00  sec  10.6 GBytes  9.08 Gbits/sec                  receiver
    -----------------------------------------------------------
    Server listening on 8000 (test #2)
    -----------------------------------------------------------

Based on this testing and comparing it with your test results, I'm increasingly inclined towards thinking that there's something with my ESX server in particular that's slowing down this transfer, rather than the bottleneck being something to do with the importer itself. I'm going to see if I can find anything tomorrow (and I did see a few promising Google results before the end of my day) that will let me improve the SSH performance coming out of ESX.