Opinions | ZFS On Proxmox

and MB/s are being chased then striped mirrors are the way to go for spinning rust.


Hi,

Let see what are Calomel figures(copy/paste from your link):

12x 4TB, 6 striped mirrors, 22.6 TB, w=643MB/s , rw=83MB/s , r=962MB/s
12x 4TB, 2 striped 6x raidz2, 30.1 TB, w=638MB/s , rw=105MB/s , r=990MB/s

6x 4TB, 3 striped mirrors, 11.3 TB, w=389MB/s , rw=60MB/s , r=655MB/s
6x 4TB, raidz2 (raid6), 15.0 TB, w=429MB/s , rw=71MB/s , r=488MB/s



So striped mirror are not the best at least in this cases(MB/s) ;)

Why you do not repet Calomel test in your case(3 striped mirror over 2 striped raidz over 2 striped mirror)?

bonnie++ -u root -r 1024 -s 16384 -d /storage -f -b -n 1 -c 4
 
Last edited:
  • Like
Reactions: velocity08
Hi,

Let see what are Calomel figures(copy/paste from your link):

12x 4TB, 6 striped mirrors, 22.6 TB, w=643MB/s , rw=83MB/s , r=962MB/s
12x 4TB, 2 striped 6x raidz2, 30.1 TB, w=638MB/s , rw=105MB/s , r=990MB/s

6x 4TB, 3 striped mirrors, 11.3 TB, w=389MB/s , rw=60MB/s , r=655MB/s
6x 4TB, raidz2 (raid6), 15.0 TB, w=429MB/s , rw=71MB/s , r=488MB/s



So striped mirror are not the best at least in this cases(MB/s) ;)

Why you do not repet Calomel test in your case(3 striped mirror over 2 striped raidz over 2 striped mirror)?

bonnie++ -u root -r 1024 -s 16384 -d /storage -f -b -n 1 -c 4

@guletz thanks for pointing that out and you are correct with the numbers.

Not sure if you can boot from a RaidZ2 pool?
Overall you are correct, i’m really just looking at our use case scenario with only 8 drive bays available:

  • 2 bays - mirrored pool for Larc2 Cache + Zil
  • Boot OS 1 or 2 bays (depending if we want to mirror the OS or not)
  • this only leaves 4-5 bays.

I would prefer to have 6 bays for RaidZ2 in this scenario So we can get 4 TB usable and 2 redundant drives for parity + more storage for VM’s.

The HP’s do have an SD Card but from what ive read in the forums ProxMox writes a lot of data and may quickly thrash the SD.

Maybe we should be looking at trying to find either an internal PCI card for 2 SATA or SAS 2.0 drives or trying to find a media upgrade kit for another 2 drives. (Hard to find atm)

Thoughts?

””Cheers
G
 
Hi,

Thx. @fabian for your response, really apreciated!

Basicaly I was explain with 8k example only striped mirror case(2 striped mirror versus 3 striped mirror) and not striped mirror versus striped raidz. I can tell only what I see with my only eyes, using

zpool iostat -v 2:

- if you have a striped pool, athe the vdevs components are not equal as size, then most of the time the write IO are done on the bigger vdev
- if your vdev members have the same size, most of the time(I would say > 80% in the worth cases) the write IO are done equally on all vdevs


Good luck/Bafta !

Hi @guletz

Where both of those vDevs created at the same time or was 1 created and then another added later to the pool?

""Cheers
G
 
if your vdev members have the same size, most of the time(I would say > 80% in the worth cases) the write IO are done equally on all vdevs

May i ask what do you mean by this?
Are you saying your vDevs are different sizes?

""Cheers
G
 
The HP’s do have an SD Card but from what ive read in the forums ProxMox writes a lot of data and may quickly thrash the SD.

I'm using SD cards for Proxmox in my servers with the mods from the Odroid/OMV/Armbian forums where they use log2ram and folder2ram to create zram devices to take the strain off writing to SD cards regularly for those small devices. They create ram log devices and sync to disc infrequently.
So far, it seems to be working...
 
  • Like
Reactions: velocity08
Best use of ZFS with you have 8 drive bays

2 boot, 6 in data volume. You can also do all 8 for a single pool but unless you absolutely must have 8 drives worth for your data pool its safer and more flexible to seperate your system from data. As others mentioned, striped mirror configuration.

"but I want a ZiL/l2ARC!"

search this forum. There is a ton of discussion on when and why you would want this. and the answer is usually "no."

"What about performance?"

use all SSDs. and add more ram.
 
  • Like
Reactions: velocity08
I'm using SD cards for Proxmox in my servers with the mods from the Odroid/OMV/Armbian forums where they use log2ram and folder2ram to create zram devices to take the strain off writing to SD cards regularly for those small devices. They create ram log devices and sync to disc infrequently.
So far, it seems to be working...

@YellowShed my question would be what happens when the host has a freeze and isn't able dump from memory to disk? is that something that could happen?

""Cheers
G
 
2 boot, 6 in data volume. You can also do all 8 for a single pool but unless you absolutely must have 8 drives worth for your data pool its safer and more flexible to seperate your system from data. As others mentioned, striped mirror configuration.

"but I want a ZiL/l2ARC!"

search this forum. There is a ton of discussion on when and why you would want this. and the answer is usually "no."

"What about performance?"

use all SSDs. and add more ram.

Hi @alexskysilk

thanks for your input, sure we can go and spend more money on additional SSD's and Ram but at the same token the goal is to also reuse what we already have in stock that's still in new condition and can still delivery what clients need at a cost competitive price.

Ram should be fine as all hosts will have 256 GB, once you go over this threshold on the HP DL360 the memory speed starts to drop, must be something about the design, in advising of that i think it drops from 2400 < 2100 Mhz not terrible but still a 20% drop approximately.

We are looking to strike a balance between performance and cost to have a well priced offering and to do this and be competitive in the market it means using what's already in stock or put simply it will increase end user costs so we would like to avoid this increase for our budget tier offering.

With this said we will also have a pure SSD tier as well which will fit in with your recommendations and will be a higher price bracket.

appreciate your input :)

""Cheers
G
 
Hi,

Let see what are Calomel figures(copy/paste from your link):

12x 4TB, 6 striped mirrors, 22.6 TB, w=643MB/s , rw=83MB/s , r=962MB/s
12x 4TB, 2 striped 6x raidz2, 30.1 TB, w=638MB/s , rw=105MB/s , r=990MB/s

6x 4TB, 3 striped mirrors, 11.3 TB, w=389MB/s , rw=60MB/s , r=655MB/s
6x 4TB, raidz2 (raid6), 15.0 TB, w=429MB/s , rw=71MB/s , r=488MB/s



So striped mirror are not the best at least in this cases(MB/s) ;)

Why you do not repet Calomel test in your case(3 striped mirror over 2 striped raidz over 2 striped mirror)?

bonnie++ -u root -r 1024 -s 16384 -d /storage -f -b -n 1 -c 4

Hi @guletz

I've just reviewed some interesting informaiton thats been brought to my attention from the FreeNas forums about issues with RaidZ,2,3 etc used for virtualisation, as the pool starts to fill up to 50% capacity it basically looses 1/2 it's performance and ZFS starts to struggle to allocate blocks.

Something the Calomel tests don't go into is repeating the tests again as the storage fills up and starts to degrade performance.

It's an interesting read and may be of interest for others.

Yes this is ProxMox but at the same token its also ZFS and will behave in the same way, maybe things have changed since the article was written but it does provide good insight into performance degradation and why mirrors are best for virtualization.

Some differences between RAIDZ and mirrors, and why we use mirrors for block storage

https://www.ixsystems.com/community...d-why-we-use-mirrors-for-block-storage.44068/

The path to success for block storage

https://www.ixsystems.com/community/threads/the-path-to-success-for-block-storage.81165/

""Cheers
G
 
the goal is to also reuse what we already have in stock that's still in new condition and can still delivery what clients need at a cost competitive price.
The performance difference between spinning disk and ssds cannot be overstated. If you want to create a low performance product for a lower price, thats up to you- but the key word in the above phrase (to me) is "clients need" which is likely at odds with "what we already have in stock." In my view, the initial cost difference between 6 HDDs and 6 SSDs for SATA is a few hundred bucks. over a projected 3 year lifespan this is peanuts; I'm sure you can take that old stuff and build it into a NAS for slow storage purposes.
 
The performance difference between spinning disk and ssds cannot be overstated. If you want to create a low performance product for a lower price, thats up to you- but the key word in the above phrase (to me) is "clients need" which is likely at odds with "what we already have in stock." In my view, the initial cost difference between 6 HDDs and 6 SSDs for SATA is a few hundred bucks. over a projected 3 year lifespan this is peanuts; I'm sure you can take that old stuff and build it into a NAS for slow storage purposes.

@alexskysilk thanks for your input, always interesting to see it from someone else’s perspective.
 
@YellowShed my question would be what happens when the host has a freeze and isn't able dump from memory to disk? is that something that could happen?

""Cheers
G
I'm only using the zram for the logs, not the whole disk. And the logs are sync'ed to the disk regularly. If the host freezes and crashes, then yeah, I could lose some log info I guess. By keeping the logs in zram, the write frequency to the disk is massively reduced.
Not sure if that's what you meant exactly...
 
  • Like
Reactions: velocity08
I'm only using the zram for the logs, not the whole disk. And the logs are sync'ed to the disk regularly. If the host freezes and crashes, then yeah, I could lose some log info I guess. By keeping the logs in zram, the write frequency to the disk is massively reduced.
Not sure if that's what you meant exactly...

In our VMware environment when a host crashes logs are lost even when streaming to disk, these logs are imperative to find the root cause of the issue especially when there may be little lead up to the issue it self so in VMware at least it’s important to get everything in its entirety to deduce root cause.

I have no idea how beneficial this is with ProxMox just insight from our own experience.
 
In PVE, the most writes are IMHO in /var/lib/pve-cluster, far more than conventional logs. The writes are also so minimal, that with write amplification of cheap SSDs, you can have problems (that even increase with ZFS).

I'm only using the zram for the logs, not the whole disk.

zram is for having compressed swap in memory, not for storing files. You mean tmpfs, which is a ram or memory based filesystem.
 
Ok All thanks for your input so far :)

I've decided to go with the following options and would love your feedback:

Single Server Options


New SuperMicro e20012623
10 bay 2.5" (4 bays can be occupied by NvME drives) or all by SAS/ SATA Drives
AMD Epic 7502 32 Core
256 GB Ram
HBA Broadcom 9300 SAS 8-port 12GB/s
4 x 4 TB Intel S4510 D3 Series SATA 6Gb/s
RaidZ + hot spare

Wondering if having 2xNvME in a mirror vDev as Special Device would be beneficial in this setup?

Potential Expansion Options:

Add another RaidZ x 3 drives to grow the pool and provide more IOP's across a second vDev

Simple to manage HA Server Options


Look into TrueNas to use ZFS over iSCSI.

WIth both single server nad HA Nas I link the idea of being able to automate in a simple way replication to another DC on a VM basis for clients that wish to have a DR option.

thoughts and suggestions?

""Cheers
G
 
I'm only using the zram for the logs, not the whole disk. And the logs are sync'ed to the disk regularly. If the host freezes and crashes, then yeah, I could lose some log info I guess. By keeping the logs in zram, the write frequency to the disk is massively reduced.
Not sure if that's what you meant exactly...
Do you mind share how you do that?
 
Yes, the special allocation class and SLOG on mirrored NVMe will speed up your pool if and only if the NVMe is significantly faster (speaking latency) than your SSDs.

Thanks @LnxBil

Just to clarify isn't the SLOG only for block devices?
Isn't ZFS local file based?

https://pve.proxmox.com/wiki/Storage

please correct me if i'm missing something as i may have crossed wires somewhere in my research.

https://www.ixsystems.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/

""Cheers
G
 
Hi,

Just to clarify isn't the SLOG only for block devices?

No. SLOG is for any SYNC write operations on zfs pool.

Isn't ZFS local file based?

ZFS is a volume manager, who can expose his own FS(like ext4, ntfs) and a block-device/zvols (like a disk, eg /dev/sdX ). So on the same pool,
you can have many datasets(can be used like any other FS, creating files and folders) and also many zvols(like /dev/sdX).

Good luck / Bafta!
 
  • Like
Reactions: velocity08

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!