Improve VM restore speed - ZFS datastore over NFS

Mar 27, 2021
102
15
23
44
Greetings to all,

Need an advise please...

We are trying to setup 2nd site (DR) on the cheap.
We have a TrueNAS Core server with 40TB HDD (z2) pool. The server has 32cores and 64GB RAM.
The PBS datastore is on the TrueNAS, connected over NFS.
We have synced (sync job) the backups from the primary PBS (over VPN) which went decently well (~20MB/s on the storage).
The TruesNAS server performance is solid, we are easily getting 250MB/s writes on the pool over NFS.

The problem comes when we try to restore a VM... It literarily takes more than 30 mins for a 30GB VM disk.
Cannot imagine how many days/weeks it will take to restore all VMs :confused:

The research we did so far suggests that the TrueNAS struggles to deliver the backup chunks e.g. requires metadata disk.
The metadata disk however appears to be tricky/risky since it can damage the whole pool (mirror is recommended).
On the other hand, others are implementing L2ARC with "secondarycache=metadata" which appears to deliver the same result while reducing the risk to the pool.

Have anyone ran into this issues and what was the solution you implemented?

Many thanks,
Dimitar
 
no one that have dealt with such issue issue before?
I am sure there a plenty people having there datastores on a NFS share and may have ran into this.
NFS share on a ZFS pool may be a bit exotic in the community though...
 
And just to add for completeness...
Our primary PBS server runs on dedicated hardware - 40cores, 64GB RAM, 40TB HDD ZFS pool.
The ZFS cache have happily took a good 32+GB of the RAM and hence is performing really well when restores are concerned.
 
I have received zero responses - I am either unique or stupid o_O

More testing happened... we added 500GB SSD as cache (L2ARC) vdev, no noticable impromevement.
Moreover, here is the arcstat during restore

1637173686141.png

Metadata misses are in place but insignificant, the prefetch misses are 100%.
I am struggling with finding the best way address this :(

Anyone with bigger brains?
 
I'm also running my PBS datastore on a TrueNAS ZFS Pool on HDDs over NFS. Performance is fine here.
I just have no idea how to help. Did you verify that the ZFS read performance on your TrueNAS is really the problem and not the network or the write performance of your PVEs VM storage disks? Maybe you can do some fio/iperf tests to narrow down the bottleneck.
 
a proxmox-backup-client benchmark ... from the PVE system where restoring is slow, and from the second PBS system to itself might also give a hint where the performance bottle neck is..
 
  • Like
Reactions: hepo
Thanks for the responses guys!

Full disclosure on the setup:
- we are virtualizing both TrueNAS and PBS on top of proxmox
- the host is Dell R730xd with dual E5-2698 v4, 128GB RAM, HBA330 adapter where all the HDD's (6x12TB Toshiba MG07SCA12TE) are connected
- The HBA card is pci passthrough'ed to the TrueNAS VM, 32 cores and 96GB RAM, additional 480GB SSD as cache (l2arc) drive
- the PBS VM has 16 cores and 8GB RAM

We were suspecting that the virtualization may be causing the problem and installed TrueNAS on the host (bare metal).
In addition we recreated the pool in raid10 (3 mirrored vdevs) with cache.
At the moment PBS is installed as VM on another pve host we have in that DC, backups are currently synchronizing...

Some test results:

iperf to the TrueNAS host
Code:
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.01 GBytes  8.70 Gbits/sec  702   2.21 MBytes
[  5]   1.00-2.00   sec  1.04 GBytes  8.90 Gbits/sec  1685   1.03 MBytes
[  5]   2.00-3.00   sec  1.03 GBytes  8.84 Gbits/sec  1464    566 KBytes
[  5]   3.00-4.00   sec  1.04 GBytes  8.91 Gbits/sec  451    949 KBytes
[  5]   4.00-5.00   sec  1.05 GBytes  9.06 Gbits/sec  718    949 KBytes
[  5]   5.00-6.00   sec  1.06 GBytes  9.09 Gbits/sec  388    892 KBytes
[  5]   6.00-7.00   sec  1.05 GBytes  9.00 Gbits/sec  256   1.22 MBytes
[  5]   7.00-8.00   sec  1.06 GBytes  9.09 Gbits/sec  521   1001 KBytes
[  5]   8.00-9.00   sec  1.05 GBytes  9.06 Gbits/sec  568    895 KBytes
[  5]   9.00-10.00  sec   828 MBytes  6.94 Gbits/sec  1695    994 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.2 GBytes  8.76 Gbits/sec  8448             sender
[  5]   0.00-10.00  sec  10.2 GBytes  8.76 Gbits/sec                  receiver

NFS performance using dd
Code:
root@benchbox:~# echo 3 > /proc/sys/vm/drop_caches
root@benchbox:~# dd if=/dev/zero of=/mnt/nfs/testfile bs=16k count=800k
819200+0 records in
819200+0 records out
13421772800 bytes (13 GB, 12 GiB) copied, 20.8376 s, 644 MB/s
root@benchbox:~# echo 3 > /proc/sys/vm/drop_caches
root@benchbox:~# dd if=/mnt/nfs/testfile of=/dev/null bs=16k
819200+0 records in
819200+0 records out
13421772800 bytes (13 GB, 12 GiB) copied, 77.2008 s, 174 MB/s

I have expected the write boost due to the raid10 pool, but I am surprised to see the read speeds :confused:

I will run the backup benchmark that Fabian suggested once the backup sync is completed and share accordingly.

thanks!
 
First, result from the PVE host, this is where restore tests were done previously
Code:
root@pve1:~# proxmox-backup-client benchmark
SHA256 speed: 227.92 MB/s
Compression speed: 369.36 MB/s
Decompress speed: 1143.15 MB/s
AES256/GCM speed: 2400.57 MB/s
Verify speed: 319.00 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ not tested         │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 227.92 MB/s (11%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 369.36 MB/s (49%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 1143.15 MB/s (95%) │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 319.00 MB/s (42%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 2400.57 MB/s (66%) │
└───────────────────────────────────┴────────────────────┘

This is from the PBS VM itself (runs on the PVE host above, 8cores, 8GB RAM)
Code:
root@pbs:~# proxmox-backup-client benchmark
SHA256 speed: 171.80 MB/s
Compression speed: 325.27 MB/s
Decompress speed: 965.98 MB/s
AES256/GCM speed: 136.92 MB/s
Verify speed: 230.71 MB/s
┌───────────────────────────────────┬───────────────────┐
│ Name                              │ Value             │
╞═══════════════════════════════════╪═══════════════════╡
│ TLS (maximal backup upload speed) │ not tested        │
├───────────────────────────────────┼───────────────────┤
│ SHA256 checksum computation speed │ 171.80 MB/s (8%)  │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 compression speed    │ 325.27 MB/s (43%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 decompression speed  │ 965.98 MB/s (81%) │
├───────────────────────────────────┼───────────────────┤
│ Chunk verification speed          │ 230.71 MB/s (30%) │
├───────────────────────────────────┼───────────────────┤
│ AES256 GCM encryption speed       │ 136.92 MB/s (4%)  │
└───────────────────────────────────┴───────────────────┘

For reference, the same result from one of our prod PVE hosts
Code:
root@pve12:~# proxmox-backup-client benchmark
SHA256 speed: 459.73 MB/s
Compression speed: 501.08 MB/s
Decompress speed: 1281.94 MB/s
AES256/GCM speed: 2056.84 MB/s
Verify speed: 335.78 MB/s
┌───────────────────────────────────┬─────────────────────┐
│ Name                              │ Value               │
╞═══════════════════════════════════╪═════════════════════╡
│ TLS (maximal backup upload speed) │ not tested          │
├───────────────────────────────────┼─────────────────────┤
│ SHA256 checksum computation speed │ 459.73 MB/s (23%)   │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 compression speed    │ 501.08 MB/s (67%)   │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 decompression speed  │ 1281.94 MB/s (107%) │
├───────────────────────────────────┼─────────────────────┤
│ Chunk verification speed          │ 335.78 MB/s (44%)   │
├───────────────────────────────────┼─────────────────────┤
│ AES256 GCM encryption speed       │ 2056.84 MB/s (56%)  │
└───────────────────────────────────┴─────────────────────┘

Restore test in both environments (same VM is restored, 32GB disk)

DR (slow) environment (speed doubled with the raid10 but still far off)
Code:
progress 97% (read 33332133888 bytes, zeroes = 12% (4135583744 bytes), duration 479 sec)
progress 98% (read 33676066816 bytes, zeroes = 12% (4139778048 bytes), duration 483 sec)
progress 99% (read 34019999744 bytes, zeroes = 12% (4139778048 bytes), duration 491 sec)
progress 100% (read 34359738368 bytes, zeroes = 12% (4139778048 bytes), duration 498 sec)
restore image complete (bytes=34359738368, duration=498.96s, speed=65.67MB/s)
rescan volumes...
TASK OK

Prod environment (raidz2)
Code:
progress 97% (read 33332133888 bytes, zeroes = 12% (4135583744 bytes), duration 253 sec)
progress 98% (read 33676066816 bytes, zeroes = 12% (4139778048 bytes), duration 256 sec)
progress 99% (read 34019999744 bytes, zeroes = 12% (4139778048 bytes), duration 259 sec)
progress 100% (read 34359738368 bytes, zeroes = 12% (4139778048 bytes), duration 262 sec)
restore image complete (bytes=34359738368, duration=262.01s, speed=125.06MB/s)
rescan volumes...
TASK OK
 
Last edited:
This starts to become TrueNAS performance comparison on two similar servers...
I have prepared a document describing the setups and test - here (too big to upload to the forum)
I welcome your review and comments!
Any other tests I can perform?

Thanks for your time
 
Last edited:
  • Like
Reactions: Dunuin
I have done another test today - in short removed the RAID10 setup on the problematic server and re-created the pool in RAIDZ2 without cache drive. This is to ensure that I am comparing apples to apples.
The results are posted in the link above.

I have no idea why the R730 host delivers such bad read speeds.
Fluctuation is expected due to ARC, but 2x less speed is unexpected.
The hardware is almost identical, the network speed and latency is proper.

1637517112656.png


I am thinking hardware problem, maybe start with exchanging the HBA330 card...
I have tested all disks when the system was delivered, all of them had the same result with fio.

Appreciate ideas!
thanks
 
Last edited:
Did you tried to place the HBA into another PCIe slot?

If that is a multi socket board maybe that PCIe slot is connected to the wrong CPU and the link between the CPUs might be a bottleneck.

Or maybe that PCIe slot is connected to the chipset instead of directly to the CPU.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!