Improve VM restore speed - ZFS datastore over NFS

hepo · Nov 8, 2021

Greetings to all,

Need an advise please...

We are trying to setup 2nd site (DR) on the cheap.
We have a TrueNAS Core server with 40TB HDD (z2) pool. The server has 32cores and 64GB RAM.
The PBS datastore is on the TrueNAS, connected over NFS.
We have synced (sync job) the backups from the primary PBS (over VPN) which went decently well (~20MB/s on the storage).
The TruesNAS server performance is solid, we are easily getting 250MB/s writes on the pool over NFS.

The problem comes when we try to restore a VM... It literarily takes more than 30 mins for a 30GB VM disk.
Cannot imagine how many days/weeks it will take to restore all VMs

The research we did so far suggests that the TrueNAS struggles to deliver the backup chunks e.g. requires metadata disk.
The metadata disk however appears to be tricky/risky since it can damage the whole pool (mirror is recommended).
On the other hand, others are implementing L2ARC with "secondarycache=metadata" which appears to deliver the same result while reducing the risk to the pool.

Have anyone ran into this issues and what was the solution you implemented?

Many thanks,
Dimitar

hepo · Nov 9, 2021

no one that have dealt with such issue issue before?
I am sure there a plenty people having there datastores on a NFS share and may have ran into this.
NFS share on a ZFS pool may be a bit exotic in the community though...

hepo · Nov 9, 2021

And just to add for completeness...
Our primary PBS server runs on dedicated hardware - 40cores, 64GB RAM, 40TB HDD ZFS pool.
The ZFS cache have happily took a good 32+GB of the RAM and hence is performing really well when restores are concerned.

hepo · Nov 17, 2021

I have received zero responses - I am either unique or stupid

More testing happened... we added 500GB SSD as cache (L2ARC) vdev, no noticable impromevement.
Moreover, here is the arcstat during restore

Metadata misses are in place but insignificant, the prefetch misses are 100%.
I am struggling with finding the best way address this

Anyone with bigger brains?

Dunuin · Nov 17, 2021

I'm also running my PBS datastore on a TrueNAS ZFS Pool on HDDs over NFS. Performance is fine here.
I just have no idea how to help. Did you verify that the ZFS read performance on your TrueNAS is really the problem and not the network or the write performance of your PVEs VM storage disks? Maybe you can do some fio/iperf tests to narrow down the bottleneck.

fabian · Nov 18, 2021

a proxmox-backup-client benchmark ... from the PVE system where restoring is slow, and from the second PBS system to itself might also give a hint where the performance bottle neck is..

hepo · Nov 19, 2021

Thanks for the responses guys!

Full disclosure on the setup:
- we are virtualizing both TrueNAS and PBS on top of proxmox
- the host is Dell R730xd with dual E5-2698 v4, 128GB RAM, HBA330 adapter where all the HDD's (6x12TB Toshiba MG07SCA12TE) are connected
- The HBA card is pci passthrough'ed to the TrueNAS VM, 32 cores and 96GB RAM, additional 480GB SSD as cache (l2arc) drive
- the PBS VM has 16 cores and 8GB RAM

We were suspecting that the virtualization may be causing the problem and installed TrueNAS on the host (bare metal).
In addition we recreated the pool in raid10 (3 mirrored vdevs) with cache.
At the moment PBS is installed as VM on another pve host we have in that DC, backups are currently synchronizing...

Some test results:

iperf to the TrueNAS host

Code:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.01 GBytes  8.70 Gbits/sec  702   2.21 MBytes
[  5]   1.00-2.00   sec  1.04 GBytes  8.90 Gbits/sec  1685   1.03 MBytes
[  5]   2.00-3.00   sec  1.03 GBytes  8.84 Gbits/sec  1464    566 KBytes
[  5]   3.00-4.00   sec  1.04 GBytes  8.91 Gbits/sec  451    949 KBytes
[  5]   4.00-5.00   sec  1.05 GBytes  9.06 Gbits/sec  718    949 KBytes
[  5]   5.00-6.00   sec  1.06 GBytes  9.09 Gbits/sec  388    892 KBytes
[  5]   6.00-7.00   sec  1.05 GBytes  9.00 Gbits/sec  256   1.22 MBytes
[  5]   7.00-8.00   sec  1.06 GBytes  9.09 Gbits/sec  521   1001 KBytes
[  5]   8.00-9.00   sec  1.05 GBytes  9.06 Gbits/sec  568    895 KBytes
[  5]   9.00-10.00  sec   828 MBytes  6.94 Gbits/sec  1695    994 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.2 GBytes  8.76 Gbits/sec  8448             sender
[  5]   0.00-10.00  sec  10.2 GBytes  8.76 Gbits/sec                  receiver

NFS performance using dd

Code:

root@benchbox:~# echo 3 > /proc/sys/vm/drop_caches
root@benchbox:~# dd if=/dev/zero of=/mnt/nfs/testfile bs=16k count=800k
819200+0 records in
819200+0 records out
13421772800 bytes (13 GB, 12 GiB) copied, 20.8376 s, 644 MB/s
root@benchbox:~# echo 3 > /proc/sys/vm/drop_caches
root@benchbox:~# dd if=/mnt/nfs/testfile of=/dev/null bs=16k
819200+0 records in
819200+0 records out
13421772800 bytes (13 GB, 12 GiB) copied, 77.2008 s, 174 MB/s

I have expected the write boost due to the raid10 pool, but I am surprised to see the read speeds

I will run the backup benchmark that Fabian suggested once the backup sync is completed and share accordingly.

thanks!

hepo · Nov 20, 2021

First, result from the PVE host, this is where restore tests were done previously

Code:

root@pve1:~# proxmox-backup-client benchmark
SHA256 speed: 227.92 MB/s
Compression speed: 369.36 MB/s
Decompress speed: 1143.15 MB/s
AES256/GCM speed: 2400.57 MB/s
Verify speed: 319.00 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ not tested         │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 227.92 MB/s (11%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 369.36 MB/s (49%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 1143.15 MB/s (95%) │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 319.00 MB/s (42%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 2400.57 MB/s (66%) │
└───────────────────────────────────┴────────────────────┘

This is from the PBS VM itself (runs on the PVE host above, 8cores, 8GB RAM)

Code:

root@pbs:~# proxmox-backup-client benchmark
SHA256 speed: 171.80 MB/s
Compression speed: 325.27 MB/s
Decompress speed: 965.98 MB/s
AES256/GCM speed: 136.92 MB/s
Verify speed: 230.71 MB/s
┌───────────────────────────────────┬───────────────────┐
│ Name                              │ Value             │
╞═══════════════════════════════════╪═══════════════════╡
│ TLS (maximal backup upload speed) │ not tested        │
├───────────────────────────────────┼───────────────────┤
│ SHA256 checksum computation speed │ 171.80 MB/s (8%)  │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 compression speed    │ 325.27 MB/s (43%) │
├───────────────────────────────────┼───────────────────┤
│ ZStd level 1 decompression speed  │ 965.98 MB/s (81%) │
├───────────────────────────────────┼───────────────────┤
│ Chunk verification speed          │ 230.71 MB/s (30%) │
├───────────────────────────────────┼───────────────────┤
│ AES256 GCM encryption speed       │ 136.92 MB/s (4%)  │
└───────────────────────────────────┴───────────────────┘

For reference, the same result from one of our prod PVE hosts

Code:

root@pve12:~# proxmox-backup-client benchmark
SHA256 speed: 459.73 MB/s
Compression speed: 501.08 MB/s
Decompress speed: 1281.94 MB/s
AES256/GCM speed: 2056.84 MB/s
Verify speed: 335.78 MB/s
┌───────────────────────────────────┬─────────────────────┐
│ Name                              │ Value               │
╞═══════════════════════════════════╪═════════════════════╡
│ TLS (maximal backup upload speed) │ not tested          │
├───────────────────────────────────┼─────────────────────┤
│ SHA256 checksum computation speed │ 459.73 MB/s (23%)   │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 compression speed    │ 501.08 MB/s (67%)   │
├───────────────────────────────────┼─────────────────────┤
│ ZStd level 1 decompression speed  │ 1281.94 MB/s (107%) │
├───────────────────────────────────┼─────────────────────┤
│ Chunk verification speed          │ 335.78 MB/s (44%)   │
├───────────────────────────────────┼─────────────────────┤
│ AES256 GCM encryption speed       │ 2056.84 MB/s (56%)  │
└───────────────────────────────────┴─────────────────────┘

Restore test in both environments (same VM is restored, 32GB disk)

DR (slow) environment (speed doubled with the raid10 but still far off)

Code:

progress 97% (read 33332133888 bytes, zeroes = 12% (4135583744 bytes), duration 479 sec)
progress 98% (read 33676066816 bytes, zeroes = 12% (4139778048 bytes), duration 483 sec)
progress 99% (read 34019999744 bytes, zeroes = 12% (4139778048 bytes), duration 491 sec)
progress 100% (read 34359738368 bytes, zeroes = 12% (4139778048 bytes), duration 498 sec)
restore image complete (bytes=34359738368, duration=498.96s, speed=65.67MB/s)
rescan volumes...
TASK OK

Prod environment (raidz2)

Code:

progress 97% (read 33332133888 bytes, zeroes = 12% (4135583744 bytes), duration 253 sec)
progress 98% (read 33676066816 bytes, zeroes = 12% (4139778048 bytes), duration 256 sec)
progress 99% (read 34019999744 bytes, zeroes = 12% (4139778048 bytes), duration 259 sec)
progress 100% (read 34359738368 bytes, zeroes = 12% (4139778048 bytes), duration 262 sec)
restore image complete (bytes=34359738368, duration=262.01s, speed=125.06MB/s)
rescan volumes...
TASK OK

hepo · Nov 20, 2021

This starts to become TrueNAS performance comparison on two similar servers...
I have prepared a document describing the setups and test - here (too big to upload to the forum)
I welcome your review and comments!
Any other tests I can perform?

Thanks for your time

hepo · Nov 21, 2021

I have done another test today - in short removed the RAID10 setup on the problematic server and re-created the pool in RAIDZ2 without cache drive. This is to ensure that I am comparing apples to apples.
The results are posted in the link above.

I have no idea why the R730 host delivers such bad read speeds.
Fluctuation is expected due to ARC, but 2x less speed is unexpected.
The hardware is almost identical, the network speed and latency is proper.

I am thinking hardware problem, maybe start with exchanging the HBA330 card...
I have tested all disks when the system was delivered, all of them had the same result with fio.

Appreciate ideas!
thanks

Dunuin · Nov 21, 2021

Did you tried to place the HBA into another PCIe slot?

If that is a multi socket board maybe that PCIe slot is connected to the wrong CPU and the link between the CPUs might be a bottleneck.

Or maybe that PCIe slot is connected to the chipset instead of directly to the CPU.

hepo · Nov 21, 2021

Dunuin said:
Did you tried to place the HBA into another PCIe slot?

Good idea, will reach out to the HW vendor.
Will also ask them to exchange the HBA330 as a next step.

thanks

Search

Search

Improve VM restore speed - ZFS datastore over NFS

hepo

Well-Known Member

hepo

Well-Known Member

hepo

Well-Known Member

hepo

Well-Known Member

Dunuin

Distinguished Member

fabian

Proxmox Staff Member

hepo

Well-Known Member

hepo

Well-Known Member

hepo

Well-Known Member

hepo

Well-Known Member

Dunuin

Distinguished Member

hepo

Well-Known Member

We value your privacy