Min/Maxing EBS Volumes on an AWS backup server

Dec 8, 2021
26
0
6
32
USA
In a previous post, I discussed how an St1 Volume backs our remote backups on EC2. As we rapidly reached the Maximum size of an ST1 EBS Volume. ZFS has entered to save the day with v2.3 and the ability to quickly and easily add disks to zraids.

On paper, an array of 4 8TB ST1 volumes should vastly outperform our existing 16TB ST1 volume. But this begs the question, what about 7 SC1 Volumes? Has anyone tried this configuration? what are your experiences? It seems like populating the list of backups is dependent on storage speed. Has that been an issue for anyone?

It is likely we will move to S3 when that leaves Preview. If you would like to share your experience with that, feel free, but I'm looking for input from you penny pinchers. Is the performance hit Worth $1000 a month?
 
For some extra Data, I had AI throw this together for me:

Clearly this is best case senario, sequential reads and writes. and does not take into account overhead from ZFS.

Note: maximum EBS throughput is 1,250 MiB/s for the instance
OptionPer-vol (MiB/s) base / burstArray baseline (MiB/s)Array burst (MiB/s)Theoretical burst durationEff. baseline (MiB/s)Eff. burst (MiB/s)Effective burst duration
4 × 8 TiB ST1320 / 500 (each)1,2802,0009.29 h (≈9:17)1,192.09 (instance cap)1,192.09 (instance cap)0:00 — no headroom (instance is bottleneck)
7 × 4 TiB ST1160 / 500 (each)1,1203,5003.43 h (≈3:26)1,1201,192.09113.1 h (≈113:06)
4 × 8 TiB SC196 / 250 (each)3841,00015.13 h (≈15:08)3841,00015.13 h (≈15:08)
7 × 4 TiB SC148 / 250 (each)3361,75023.79 h (≈23:47)3361,192.0933.67 h (≈33:40)
 
Last edited:

UPDATE 1​

TL;DR My benchmarking was mixed and ultimately a waste. All of my benchmarking showed read and write speeds just above the average usage reported by the old PBS Server (1 16 TB ST1 Volume). The data transfer is going well. I will update with the final results in a few months.

Data Collection and Methodology:

I used "Flexible I/O tester" (fio) with 1GB files and eight threads. This resulted in about 85% CPU usage and generated 8GB files to test with. Average usage was verified against GUI stats and AWS monitoring. Only 4k Random reads performed below network speeds, for reference, that would be 119MiB/s. Most of our PBS chunks are around 150 MB. So I figure 4K random reads are not indicative of normal workload, so I ignored those results (Foreshadowing?).

As I expected, the AI was wrong. The Array of 7 SC1 drives performed significantly worse than the array of 4 ST1 drives, but the performance seemed "good enough". The SC1 drives can be upgraded seamlessly to ST1, so there was zero risk of trying this for a few weeks or months. I began practical benchmarking, here is the real world performance I observed.


Real World Performance.

ZFS compression may account for the discrepancy between AWS Monitors (pictured) and PBS Monitors
chrome_vzNOuGJXJc.png



Select Measurements:​

Average Latency: (Average of 7 Disks)Throughput: Maximum (AVG*6 Disks)
Read5ms/op63.51 MiB/s
Write1.7ms/op326.MiB/s


Preliminary Conclusions:

  • Backups take the same amount of time: On average, we move less than 5 GB of deduplicated Backups in small chunks, averaging <200MB per chunk. For this testing period, I am moving a 16 TB datastore between two EC2 Instances and am happy with those speeds.
  • Tasks take longer: Verification and populating the datastore content seem to take MUCH longer. I have not tested the impact this may have on PVE when populating backups. I expect timeouts may become an issue when viewing remote PBS storage.
  • Cloud restore times will need to be tested: The read performance of an SC1 Array for massive disaster recovery appears unacceptable. However, in the event of an emergency, the EBS volumes can be upgraded to ST1 or better with zero downtime, allowing for a rapid recovery when money is ultimately not a factor.
  • Snapshot times are also faster; however, I have not tested restoring a Zraid pool from snapshots. This will need to be tested for Viability.
I would love feedback and to get some ideas from the community.