[SOLVED] Best Practice ZFS/SW Raid HDD

Mmh, I'm not shure, we don't use any zvol at all until today. With zvol you use a "virtual device" and I think than it's again like 1 file and probably won't help ... but this depends on the zfs code internals to block management. As we don't use it I don't have experience there, sorry. But nervertheless if using zfs probably you will drive better with running in a dataset instead of using zvol, take a read on this:
https://jrs-s.net/2018/03/13/zvol-vs-qcow2-with-kvm/ 5
https://jrs-s.net/2016/06/16/psa-snapshots-are-better-than-zvols/ 2
https://jrs-s.net/2017/03/15/zfs-clones-probably-not-what-you-really-want/ 2
I‘ve found sth. here in forum:
https://forum.proxmox.com/threads/d...-a-metadata-special-device.106292/post-457502

So if I understand: metadata of guest VMs will be also covered by special device?!
 
I will say so, maybe it will help and more probably maybe for zvol not but anyway if using special device it won't slow your hdd pool down.
That must be tested which unfortunately isn't possible in our cluster (which is just for testing anythink) because missing the additional hw for that case yet.
 
But I do not have any raw or qcow2 files. I'm using zpools for VMs.
Both files and zvols are the "top level" of data management in ZFS. Below that the magic starts. This includes mechanisms like "Copy-on-Write", which is active on each and every write command. I am not sure which block size is relevant here, but if I write a single byte at least a block of 4KB needs to be written. Plus the meta data with information about this. Plus data inside the ZIL (prior the actual data is written). Plus metadata for this ZIL entry. Plus...

A lot of this happens on that fast Special Device and so its burden is taken away from the slow data drives.

I have one PVE node with rotating rust - with a fast Special Device. I also enabled the "write small blocks to here" feature. And while this is really a purely random data point I can confirm that in my specific situation approximately half of the IOPS land on the mechanical drives while the other half is handled by the Special Device. This means the overall felt performance is as double as high as before.

Again: I was happy to see the effects I hoped for, but I did not run structured performance tests.
 
When zfs special help in a zvol usage then it's nice but I cannot confirm that by myself yet and regarding to my actual untested knowing the configuration of saving small blocks in the special also doesn't have any effect on zvol usage ... but could be wrong and I would be happy if could be proved positive in numbers.
 
Both files and zvols are the "top level" of data management in ZFS. Below that the magic starts. This includes mechanisms like "Copy-on-Write", which is active on each and every write command. I am not sure which block size is relevant here, but if I write a single byte at least a block of 4KB needs to be written. Plus the meta data with information about this. Plus data inside the ZIL (prior the actual data is written). Plus metadata for this ZIL entry. Plus...

A lot of this happens on that fast Special Device and so its burden is taken away from the slow data drives.

I have one PVE node with rotating rust - with a fast Special Device. I also enabled the "write small blocks to here" feature. And while this is really a purely random data point I can confirm that in my specific situation approximately half of the IOPS land on the mechanical drives while the other half is handled by the Special Device. This means the overall felt performance is as double as high as before.

Again: I was happy to see the effects I hoped for, but I did not run structured performance tests.
Sorry I cannot follow you. Do you know if the metadata of the VM will also store at special device? I only use special device for metadata because we have not enough SSD capacity, so no small files wil be stored there.
I was wondering how the special device know about the file system metadata of the vm. I cannot find any proofed statements about this.
 
VM filesystem metadata will not be stored in zfs special because it's in it's own ecosystem and could be ntfs for win VM or ext4 for a linux VM.
 
  • Like
Reactions: UdoB
VM filesystem metadata will not be stored in zfs special because it's in it's own ecosystem and could be ntfs for win VM or ext4 for a linux VM.
So I see no benefit to speed up the metadata of VMs when I use a special device in pve.
 
So I see no benefit to speed up the metadata of VMs when I use a special device in pve.

The problem is that you can not "just test it". After adding a Special Device you can never remove it.

For me my statement from above means that I will never run a HDD pool without a Special Device.
 
  • Like
Reactions: news
The problem is that you can not "just test it". After adding a Special Device you can never remove it.

For me my statement from above means that I will never run a HDD pool without a Special Device.
Okay in my situation I have the OS on separat disks so Idk which small files should written to the HDD pool if there are just images for vms...
 
"For me my statement from above means that I will never run a HDD pool without a Special Device."
Yes, when raidz"x" I think always using special could only be better as without for any usecase and in mirror pools depend on disk and file numbers.
 
"For me my statement from above means that I will never run a HDD pool without a Special Device."
Yes, when raidz"x" I think always using special could only be better as without for any usecase and in mirror pools depend on disk and file numbers.
When I'm right, then we have per vm disk 1 file. So nothing what can speed up anything.
 
Just to be sure: Do I can check if the metadata are really stored at the special devices? I've just added the special devices as mirror and nothing else.
 
Code:
root@pbs01:~# zdb -bbbs  HDD

Traversing all blocks to verify nothing leaked ...

loading concrete vdev 2, metaslab 148 of 149 .....
2.98G completed ( 239MB/s) estimated time remaining: 6hr 27min 56sec

do not look like SSDs...
 
One "problem" is not mentioned yet: the Special Device will get involved only when new data arrives. The old data (including meta data) already stored on the "old" devices will not get moved here.

So to see the SD "working" you need to write new data. This may also be already existing data just shoveled around once. (Or twice to move it back to the original position.)
 
  • Like
Reactions: news
One "problem" is not mentioned yet: the Special Device will get involved only when new data arrives. The old data (including meta data) already stored on the "old" devices will not get moved here.

So to see the SD "working" you need to write new data. This may also be already existing data just shoveled around once. (Or twice to move it back to the original position.)
I created the pool directly with special device. So all data should written to the SD.
 
see zpool list -v <zfs-pool>
You see "special" and then, i hope mirror-N oder better, your device and acquired space.

Check also, your root pool, with zfs get quota <zfs-pool> an set it to approx 85 % of the max HDD Storrage size.
sure I see a mirror-2
Pool is not that filled
 
Please show us zpool list -v
and
lsblk
Code:
NAME                                                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
HDD                                                       40.1T  5.49T  34.7T        -         -     0%    13%  1.00x    ONLINE  -
  mirror-0                                                  20T  2.71T  17.3T        -         -     0%  13.5%      -    ONLINE
    sda                                                   20.0T      -      -        -         -      -      -      -    ONLINE
    sdb                                                   20.0T      -      -        -         -      -      -      -    ONLINE
  mirror-1                                                  20T  2.76T  17.2T        -         -     0%  13.8%      -    ONLINE
    sdc                                                   20.0T      -      -        -         -      -      -      -    ONLINE
    sdd                                                   20.0T      -      -        -         -      -      -      -    ONLINE
special                                                       -      -      -        -         -      -      -      -         -
  mirror-2                                                 149G  25.5G   123G        -         -    29%  17.1%      -    ONLINE
    nvme-SAMSUNG_MZVL21T0HCLR-00B00_S676NF0X312234-part4   150G      -      -        -         -      -      -      -    ONLINE
    nvme-SAMSUNG_MZVL21T0HCLR-00B00_S676NF0X312242-part4   150G      -      -        -         -      -      -      -    ONLINE
rpool                                                      148G  1.79G   146G        -         -     0%     1%  1.00x    ONLINE  -
  mirror-0                                                 148G  1.79G   146G        -         -     0%  1.20%      -    ONLINE
    nvme-eui.002538b341b2a96c-part3                        149G      -      -        -         -      -      -      -    ONLINE
    nvme-eui.002538b341b2a988-part3                        149G      -      -        -         -      -      -      -    ONLINE
 
  • Like
Reactions: news

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!