Optimal HDD ZFS configuration for PBS

stevehughes · Dec 19, 2024

Hi,

I have read the comments that PBS is not designed with much consideration for HDDs, but ....

My PBS server is running 22x EXOS SATA HDDs in ZFS RAID10, with a special NVMe vdev for metadata. My VM storage is NVMe iSCSI conencted at 100G, and the connection to PBS is 10G.

I can run a single backup job at around 600MBps, and if I run 3 jobs concurrently I can achieve up to 1.2GBps (i.e. saturating my 10G link). I'm totally happy with that.

However a single full VM restore is only running at around 150MBps. I feel I should be able to do better than this.

If I run three full VM restores concurrently each restore runs at around 150MBps give or take (total of around 450-ish MBps).

It seems that the restore speed is being limited by something other than the storage performance of the PBS server.

The ZFS array was created using the PBS GUI with no custom tweaking.

Any thoughts about how to optimise the ZFS array for PBS use would be appreciated. Actually, any comments at all on how to improve restore speed would be appreciated

Thanks,
Steve

Azunai333 · Dec 19, 2024

stevehughes said:
have read the comments that PBS is not designed with much consideration for HDDs, but ....

Backup may be fast because the drives can write the data blocks (4 MB blocks before compression) directly one after another without seeking.

If you restore the drives need to seek the right blocks. Thats when not throughput but IO will be more relevant.
As you have metadata special vdev GC will be fast but the validation still needs to read those little chunks from the HDDs.

UdoB · Dec 19, 2024

Azunai333 said:
Backup may be fast because the drives can write the data blocks (4 MB blocks before compression) directly one after another without seeking.

It is much better than that: only modified data chunks have to be transported from PVE to PBS and be written to disk - so this usually is a very small amount of data. (Of course only if there has already been another, old backup "from yesterday" or similar.)

VictorSTS · Dec 19, 2024

stevehughes said:
However a single full VM restore is only running at around 150MBps. I feel I should be able to do better than this.

If I run three full VM restores concurrently each restore runs at around 150MBps give or take (total of around 450-ish MBps).

In one of my PBS I use ZFS RAID10 with 8 HDD + special device and I see a similar speed during restore, without any significant IO load on the drives themselves. I think there's some bottleneck somewhere else. Or better said "too". In my case, doing parallel restores reach like 230MBytes / sec, which starts to get the drives around 90% loaded.

Azunai333 said:
Backup may be fast because the drives can write the data blocks (4 MB blocks before compression) directly one after another without seeking.

If you restore the drives need to seek the right blocks. Thats when not throughput but IO will be more relevant.
As you have metadata special vdev GC will be fast but the validation still needs to read those little chunks from the HDDs.

If this is the case, iops would reach the max for that hardware, which for 22 HDD's in RAID10 should be at least ~1000 4k IOPs, more than enough to provide more than 150MBytes/s with much the bigger blocks that are the chunks themselves. Fragmentation will lower the performance, of course, specially if ZFS is >~85% full. Also, if the HDDs where the limiting factor here, adding more parallel restores would not increase the total restore speed to ~450Mbps.

UdoB said:
only modified data chunks have to be transported from PVE to PBS and be written to disk

In that case no data chunks are transferred and you see little traffic in the network. We lack detail here, but OP mentions saturating the network link, which makes me think that the backup did transfer a lot of new chunks.

Azunai333 · Dec 19, 2024

VictorSTS said:
If this is the case, iops would reach the max for that hardware, which for 22 HDD's in RAID10 should be at least ~1000 4k IOPs, more than enough to provide more than 150MBytes/s with much the bigger blocks that are the chunks themselves. Fragmentation will lower the performance, of course, specially if ZFS is >~85% full. Also, if the HDDs where the limiting factor here, adding more parallel restores would not increase the total restore speed to ~450Mbps.

Ah, dang. I've misread "the sum of total restore speed is 450 Mbps" as backup speed.

Yes, that would imply the bottleneck is somewhere else.
What do you think the limiting factor might be? One single CPU core on 100%? PBS itself?

VictorSTS · Dec 19, 2024

Haven't been able to take a deep look at it yet... Unfortunately don't have any full NVMe PBS with similar capacity as my HDD+special device ones, to really compare and get some conclusions.

stevehughes · Dec 20, 2024

Hi everyone, Thanks for your contributions.

My speed measurements for backup were for an initial full backup to a newly created array, so the entire VM was being transferred.

Similarly, my test restores closely followed the initial full backup with only a few incrementals in between, so I would expect the data to be read from disk very close to sequentially.

VictorSTS is reporting similar performance on his HDD RAID10. It seems the disks are not working anywhere near as hard as they could.

The CPU is a 16-core E5-2620 v4. I ran top while doing a single restore and got the following results.

top - 10:13:02 up 1 day, 17:14, 1 user, load average: 0.88, 1.84, 1.92
Tasks: 431 total, 1 running, 430 sleeping, 0 stopped, 0 zombie
%Cpu0 : 1.0 us, 2.3 sy, 0.0 ni, 80.3 id, 16.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 1.3 us, 1.0 sy, 0.0 ni, 97.3 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.3 us, 1.3 sy, 0.0 ni, 91.4 id, 6.6 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu3 : 0.3 us, 1.0 sy, 0.0 ni, 91.4 id, 7.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.7 sy, 0.0 ni, 94.0 id, 5.4 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.3 us, 0.3 sy, 0.0 ni, 91.7 id, 7.6 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.7 us, 1.0 sy, 0.0 ni, 89.7 id, 8.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu10 : 0.7 us, 0.7 sy, 0.0 ni, 93.7 id, 5.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 2.3 us, 1.0 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.3 us, 0.7 sy, 0.0 ni, 94.7 id, 4.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.3 us, 1.0 sy, 0.0 ni, 92.7 id, 6.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.7 us, 1.0 sy, 0.0 ni, 91.3 id, 7.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 128704.0 total, 47608.8 free, 79192.2 used, 3131.0 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 49511.8 avail Mem

TBH I am not sure how to interpret the above, but switching to 't' mode produces something that I can understand. I don't think the restore is core-bound.

top - 10:17:56 up 1 day, 17:19, 1 user, load average: 0.86, 1.32, 1.68
Tasks: 436 total, 1 running, 435 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.3/1.0 1[| ]
%Cpu1 : 0.7/0.7 1[|| ]
%Cpu2 : 0.7/1.0 2[|| ]
%Cpu3 : 0.3/0.7 1[| ]
%Cpu4 : 0.0/0.0 0[ ]
%Cpu5 : 2.3/1.3 4[||| ]
%Cpu6 : 0.0/0.3 0[ ]
%Cpu7 : 0.0/0.3 0[ ]
%Cpu8 : 0.0/0.3 0[ ]
%Cpu9 : 0.0/0.3 0[ ]
%Cpu10 : 1.0/1.7 3[||| ]
%Cpu11 : 0.3/1.3 2[| ]
%Cpu12 : 0.0/1.0 1[| ]
%Cpu13 : 0.0/0.3 0[ ]
%Cpu14 : 0.0/0.3 0[ ]
%Cpu15 : 0.3/1.0 1[| ]
MiB Mem : 128704.0 total, 47516.7 free, 79284.2 used, 3131.2 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 49419.8 avail Mem

I wonder if it's a queue depth issue, ie proxmox is executing reads one at a time rather than issuing multiple reads into a queue to get dispatched as fast as the hardware allows.

Or perhaps there is something else inherent in the way PBS transfers data back to PVE that limits the rate.

I have access to another Proxmox kit which uses NVMe flash in PAID-z2 for PBS and NVMe flash with Ceph for VM storage. It's been running for a couple of years so the ZFS would be quite fragmented, but being NVMe hopefully shouldn't matter. It's an older kit running PBS 2.4 and PVE 7.4. Surprisingly I'm actually acheiving a lower restore rate (around 60MBps) on this system.

stevehughes · Dec 20, 2024

Reading around a bit further I found this to be a highly discussed topic, with a recent and informative contribution from fabian here: https://forum.proxmox.com/threads/w...-pbs-and-what-is-slow-fast.120112/post-686822

This suggests the limitation is in the mechanism used for transferring data from PBS to PVE. We might be stuck with this limiation until the proxmox gurus can come up with a more efficient transport mechanism.

Search

Search

Optimal HDD ZFS configuration for PBS

stevehughes

Active Member

Azunai333

Active Member

UdoB

Distinguished Member

VictorSTS

Famous Member

Azunai333

Active Member

VictorSTS

Famous Member

stevehughes

Active Member

stevehughes

Active Member

We value your privacy