Mounting Large ZFS disk but cant use it 100%

Egert143

Member
Mar 26, 2021
22
0
6
43
Hello

Quick question, i added ZFS disk with size 43.66TiB but when mounting it to virtual machine i can only used 41993G of it, seems like 2714G goes missing somewhere ?

Egert
 
What exactly do you mean? You got a pool with 43.66TiB usable capacity and can't create a zvol that is bigger than 41993G?

First you are always loosing some space due to padding, metadata and other overhead. Second thing is that you should never use 100% of the available space. After a pool reaches 80% capacity it gets slow and after 90% zfs will switch into panic mode. So you always want some space unused.

Output of zpool status and zfs list is always useful to get a better understanding what exactly you are doing.
 
Last edited:
Thats correct i made raid 6 array with 8x8tb drives using dell raid controller. Then i wanted to use that space for vm. Can i do something differently to use most of that space as i could with not virtualized machine? What is suggested disk size to use and what% should left to waste?


zpool status

Code:
pool: DataA
 state: ONLINE
config:


        NAME                                      STATE     READ WRITE CKSUM
        DataA                                     ONLINE       0     0     0
          scsi-362cea7f0923c9100285ccdc64d721c33  ONLINE       0     0     0


errors: No known data errors


  pool: DataB
 state: ONLINE
config:


        NAME                                      STATE     READ WRITE CKSUM
        DataB                                     ONLINE       0     0     0
          scsi-362cea7f0923c9100285ccdde4edb490c  ONLINE       0     0     0


errors: No known data errors


  pool: DataC
 state: ONLINE
config:


        NAME                                      STATE     READ WRITE CKSUM
        DataC                                     ONLINE       0     0     0
          scsi-362cea7f0923c9100285cce0150f64a35  ONLINE       0     0     0


errors: No known data errors

zfs list:

Code:
NAME                  USED  AVAIL     REFER  MOUNTPOINT
DataA                 432K  42.3T       96K  /DataA
DataB                42.3T   443M       96K  /DataB
DataB/vm-101-disk-0  42.3T  42.3T       56K  -
DataC                 408K  42.3T       96K  /DataC

Hard Disk (scsi1) DataB:vm-101-disk-0,backup=o,discard=on,replicate=0,size=41993G
 
Last edited:
So i should present disks directly to proxmox and skip hw raid? Dooing so i will also lose idrac disk monitoring functionality. Is there any suggestion if one would like to still use hw raid?
 
You could use "LVM thin" instead of ZFS. If you really want bit rot prevention and other ZFS features skipping HW raid would be the way to go.
 
So i should present disks directly to proxmox and skip hw raid? Dooing so i will also lose idrac disk monitoring functionality. Is there any suggestion if one would like to still use hw raid?
zfs likes raw "dumb" disks, because any layer of raid or smth may hide or lie about some information that zfs needs
some modern hw raid implementation have a self-healing mechanism (like checksums in zfs), so you can recover from a situation like in raid1 you get 2 different data blocks from the disks, it's impossible to tell which one is the right one without a checksum
also by using a hw raid zfs will see the drives as a single disk; if something happens (bad data read) zfs will tell you, but it will be unable to recover, as there is a single copy of that data (from zfs point of view)
 
Thanks for explanations.

I tried to create proper zfs, made all disks passthrought so proxmox can see them separately. i made raidz (raid5) 8x 8TB drive group with size 58,22TiB, so far so good. But when assigning them to vm i can again assign only 28339G. So its even less than i had with hw raid 6 volume previosly. Is that normal ? Usualy raid 5 with 8 x 8TB should give ~56 TB.
 
You need to increase the volblocksize if using raidz or you will waste alot of space due to bad padding. A 8x disk raidz2 with the default volblocksize of 8k would loose 2/3 of the raw capacity to parity+padding. If you increase the volblocksize to 16k you should only loose 1/3 of the raw storage.
To change the volblockisze in GUI: "Datacenter -> Storage -> select your ZFS pool -> Edit -> Blocksize = 16K"
Volblocksize can only be changed at creation so you need to recreate your zvol (destroy the old virtual disk and create a new one).

And again, remember not to use the full capacity that ZFS is allowing you to use. 10-20% of the storage always should be kept free if you don't want to run into performance/fragmentation problems. ZFS is a Copy-on-Write filesystem so it needs always a bit of free space to organize stuff. So its not a bad idea to set a quota for the pool that only 80 or 90% of the pool can be used.
 
Last edited:
zfs set copies=2 ... may help for hwraid
But as far as I understand you would loose 50% of your storage. And if a disk fails you could have bad luck and both copies could be stored on the same physical drive because ZFS can't know where the HW raid is storing the data.
 
But as far as I understand you would loose 50% of your storage. And if a disk fails you could have bad luck and both copies could be stored on the same physical drive because ZFS can't know where the HW raid is storing the data.

And what if disk is replaced quickly and rebuild made by hwraid before monthly scrub? I think everything will be allright
 
On
And what if disk is replaced quickly and rebuild made by hwraid before monthly scrub? I think everything will be allright
Only if you are already using some kind of parity in HW raid.
With HW raid0 + single ZFS with 2 copies you would loose 50% of raw capacity and if a drive fails everything is gone.
With HW raid1 + single ZFS with 2 copies you loose 75% of raw capacity but could replace the drive before the scrub without dataloss. And you would loose performance because you are doing like a raid1 ontop of a raid1, so writes are doubled.

So in my optinion that isn't really a good option if you just could skip the HW raid and use a ZFS mirror instead to get more usable space and better performance.
 
So in my optinion that isn't really a good option if you just could skip the HW raid and use a ZFS mirror instead to get more usable space and better performance.

I have experienced with this some time ago. And got an opposite results:

1. LSI 9361-24i card with 16 hdd in hwraid 10 mode - pveperf show more then 2000 fsyncs/sec (NOTE hwraid was in write-through mode without BBU)
2. LSI 9361-24i card with 16 hdd in JBOD mode - pveperf shows only ~200 fsyncs/sec (created zfs raid10 via proxmox gui)
Disks cache enabled in two cases..
 
Last edited:
I have experienced with this some time ago. And got an opposite results:
HWRAID
8x 8TB = 64TB raw capacity.
- 16TB (HW raid6) = 48TB
- 50% (ZFS single disk with 2 copies) = 24TB
- 20% (80% ZFS quota) = 19.2TB

So only 19.2 TB of the 64TB would be usable. If you would use faster HW raid10 instead of HW raid6 it even would drop down to 12.8TB of usable capacity. So by skipping HW raid you basically double your usable capacity.
 
I think il stick to hw raid for the moment and see how it goes, With software raid (ZFS) its using alot of ram and cpu to simulate what hw raid can already do anyway. Currently made raid 5 with 8*8 disks in hw and added them to vm via lvm-thin and voila 50,8TB usable space in windows, that can also be used to 100%. + can keep all the ram for actual vm usage.
 
I think il stick to hw raid for the moment and see how it goes, With software raid (ZFS) its using alot of ram and cpu to simulate what hw raid can already do anyway. Currently made raid 5 with 8*8 disks in hw and added them to vm via lvm-thin and voila 50,8TB usable space in windows, that can also be used to 100%. + can keep all the ram for actual vm usage.
it's your server, but personally for more than 3 TB per disk (maybe even lower that limit) I would use no less than raid6 (raidz2); because at large disk sizes there is a great risk that a second drive will crash when rebuilding/resilvering and in that case you will lose data
 
Thats true, quess it depends on what is stored and if backup is made. But the general idea was to use hw raid in general if server already has it anyway and keep other resources (ram,cpu) for virtual machines.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!