[SOLVED] Cannot store on ZFS RaidZ volume "out of space"

delpiero3 · Aug 11, 2023

Hi everyone,

i am trying to create a RaidZ volume where i would migrate the data from my old server (look this thread : https://forum.proxmox.com/threads/p...ot-sector-no-grub-nothing.131120/#post-577169) and i decided to split my volume as the backup would be much easier to manage in a volume with data that i have to save, RaidZ10, that will be backup on a Proxmox backup server and another volume in RaidZ that i don't care if i lose with purpose of local share for various things, movies, temp files, ...
The RaidZ10 is created well, i can store up to it's capacity, no issue.
If i try to create a VM disk on the RaidZ, i have no success, i am always reported that i am running out of space.
I tried to move the storage from my old server to the new RaidZ and then i would from there do the split of important data that goes to the RaidZ10 and leave what's not important there, using the function "move disk" in the VM of the Proxmox GUI, and i am always reported with out of space as well :

Code:

create full clone of drive virtio1 (Data-VM:vm-101-disk-0)
TASK ERROR: storage migration failed: zfs error: cannot create 'movie/vm-101-disk-0': out of space

Code:

zfs get all

movie                     type                  filesystem             -
movie                     creation              Fri Aug 11  7:13 2023  -
movie                     used                  1.04M                  -
movie                     available             48.4T                  -
movie                     referenced            171K                   -
movie                     compressratio         1.00x                  -
movie                     mounted               yes                    -
movie                     quota                 none                   default
movie                     reservation           none                   default
movie                     recordsize            128K                   default
movie                     mountpoint            /movie                 default
movie                     sharenfs              off                    default
movie                     checksum              on                     default
movie                     compression           off                    local
movie                     atime                 on                     local
movie                     devices               on                     default
movie                     exec                  on                     default
movie                     setuid                on                     default
movie                     readonly              off                    default
movie                     zoned                 off                    default
movie                     snapdir               hidden                 default
movie                     aclmode               discard                default
movie                     aclinherit            restricted             default
movie                     createtxg             1                      -
movie                     canmount              on                     default
movie                     xattr                 on                     default
movie                     copies                1                      default
movie                     version               5                      -
movie                     utf8only              off                    -
movie                     normalization         none                   -
movie                     casesensitivity       sensitive              -
movie                     vscan                 off                    default
movie                     nbmand                off                    default
movie                     sharesmb              off                    default
movie                     refquota              none                   default
movie                     refreservation        none                   default
movie                     guid                  1539010778459645801    -
movie                     primarycache          all                    default
movie                     secondarycache        all                    default
movie                     usedbysnapshots       0B                     -
movie                     usedbydataset         171K                   -
movie                     usedbychildren        896K                   -
movie                     usedbyrefreservation  0B                     -
movie                     logbias               latency                default
movie                     objsetid              54                     -
movie                     dedup                 off                    default
movie                     mlslabel              none                   default
movie                     sync                  disabled               local
movie                     dnodesize             legacy                 default
movie                     refcompressratio      1.00x                  -
movie                     written               171K                   -
movie                     logicalused           197K                   -
movie                     logicalreferenced     42K                    -
movie                     volmode               default                default
movie                     filesystem_limit      none                   default
movie                     snapshot_limit        none                   default
movie                     filesystem_count      none                   default
movie                     snapshot_count        none                   default
movie                     snapdev               hidden                 default
movie                     acltype               off                    default
movie                     context               none                   default
movie                     fscontext             none                   default
movie                     defcontext            none                   default
movie                     rootcontext           none                   default
movie                     relatime              off                    default
movie                     redundant_metadata    all                    default
movie                     overlay               on                     default
movie                     encryption            off                    default
movie                     keylocation           none                   default
movie                     keyformat             none                   default
movie                     pbkdf2iters           0                      default
movie                     special_small_blocks  0                      default

i don't get it.
Thanks for the help.

Chris · Aug 11, 2023

Hi,
how big is the disk vm-101-disk-0? I assume that the movie pool is not sparse and that you are trying to reserve the full disk size, which is larger than the space you have available, therefore the error.

delpiero3 · Aug 11, 2023

No, it is smaller, 36TB for the vm-101-disk-0 and the movie ZFS pool is 48TB and has nothing on it right now. Even trying to manually create a disk as you can see in one of the screenshot lead to the same issue, while i was trying to create a 30TB disk size (i know the error message overlap the size, sorry for that).
What do you mean by "sparse" ? Definitely, the ZFS pool is much larger than the disk, and even as i said if i try to create manually a disk from a VM smaller than the available size, i fail. I tried to recreate the ZFS volume (as it was empty), failing with the same issue.

Chris · Aug 11, 2023

delpiero3 said:
No, it is smaller, 36TB for the vm-101-disk-0 and the movie ZFS pool is 48TB and has nothing on it right now. Even trying to manually create a disk as you can see in one of the screenshot lead to the same issue, while i was trying to create a 30TB disk size (i know the error message overlap the size, sorry for that).
What do you mean by "sparse" ? Definitely, the ZFS pool is much larger than the disk, and even as i said if i try to create manually a disk from a VM smaller than the available size, i fail. I tried to recreate the ZFS volume (as it was empty), failing with the same issue.

Well, then the problem rather seems to be that the source storage is out of space and you cannot create a snapshot to send to the new storage. What kind of storage is Data-VM? Is this ZFS as well?

delpiero3 · Aug 11, 2023

Chris said:
Well, then the problem rather seems to be that the source storage is out of space and you cannot create a snapshot to send to the new storage. What kind of storage is Data-VM? Is this ZFS as well?

it is a LVM. The entire volume is 40TB , and 36TB are assigned to the VM disk vm-101-disk-0.
Initially, i created one ZFS volume in RAID10 with 18 disks of 5TB, so a volume of 45TB, the transfer was working, i was at 50% done after 7hours, which looks to be a fair bandwidth transfer of 700MB/s, and i canceled it (the tickbox delete source was not checked of course), because i had a thought about my overall strategy and wanted to reconsider it. First i was about to transfer everything to that RAID10 volume, and then i thought why having all movies and non-critical stuff stored in such a way that i waste a lot of space for nothing. So i deleted the RAID10 of 18 disks, and went with a RAID10 of 4 disks for my critical data (ZFS volume data in the screenshot), and 12 disks in RAIDZ for the non-critical data (ZFS volume movie in the screenshot).
What puzzle me about your idea is that even creating manually a VM disk of several TB is not working on my movie ZFS volume, so i don't think the snapshot space has something to do with it, what do you think ?

Neobin · Aug 11, 2023

Have a search/read about "padding overhead" with raidZ, e.g.:
https://forum.proxmox.com/search/5932181/?q=padding+overhead&t=post&c[users]=Dunuin&o=date

delpiero3 · Aug 11, 2023

Neobin said:
Have a search/read about "padding overhead" with raidZ, e.g.:
https://forum.proxmox.com/search/5932181/?q=padding+overhead&t=post&c[users]=Dunuin&o=date

Thanks for the tip, looks like this is indeed my issue.
Shouldn't i go for a dRaid instead ? looks to be more in line with my use case, as it looks to be a good fit for larger amount of drive raid :
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_raid_considerations https://openzfs.github.io/openzfs-docs/Basic Concepts/dRAID Howto.html#introduction
https://docs.google.com/presentation/d/1uo0nBfY84HIhEqGWEx-Tbm8fPbJKtIP3ICo4toOPcJo/edit#slide=id.p1

Dunuin · Aug 11, 2023

delpiero3 said:
Shouldn't i go for a dRaid instead ?

Usually only recommended with more disks/capacity.

delpiero3 · Aug 11, 2023

Dunuin said:
Usually only recommended with more disks/capacity.

I am not sure I follow, in the documentation I shared, they say from 10 to 15 drives, I have 12x 5TB, isn't it what is expected?

Dunuin · Aug 11, 2023

delpiero3 said:
I am not sure I follow, in the documentation I shared, they say from 10 to 15 drives, I have 12x 5TB, isn't it what is expected?

Yes, minimum of 10-15 disks. And your disks aren't that big with common disks these days at 16-22TB each, so the resilvering time wouldn't be that bad, when striping 4x 3 disk raidz1 or 2x 6 disk raidz2. So it's more an edge case between striped raidz1/2 and draid.

delpiero3 · Aug 12, 2023

Hi everyone, thanks for the tips, i left draid aside for now, it anyway looks more complex than RaidZ approach and i must say that i am still too newbie in context of ZFS to even consider draid for now.
I took a look for the vblock size, went in datacenter => storage => changed my raidZ1 vblock size to 32k and now indeed i can start the transfer of my VM disk.
However, it is slow like hell.
I don't know why my RaidZ1 would perform so badly compare to my old Hardware Raid6, even with consumer disks, i was having a single disk benchmark of 160MB/s sequential write, as far as i know, in RaidZ1 i should approach the write performance of 8 to 9 disks in theory, right ? I am not even sure i hit the performance of one single drive.
And if i use a Raid10, then it performs as expected, i have a sequential write equal to single disk bandwidth X number of disk / 2. I already tried the following command from other posts supposed to optimize the performance :
zfs set sync=disabled movie
zfs set atime=off movie
zfs set compression=off movie
I also tried with compression=lz4 as it is reported to be close to null performance drop in case of non-compressible data.

I am a bit lost, sorry but i am not at ease with zfs, i was a lazy hardware raid user, my apologies.
I could go with Raid10, but that's a waste of disk and space for what i need to do with that volume. In clear, that partition will hold media files that i want to stream, movies, sitcoms, music, pictures (backed up in Nextcloud as well anyway) so the importance of the volume is not fundamental, i can go with Raid5 equivalent without an issue because even if i would have a second drive failure during resilvering, i won't die in losing that volume i even don't plan to make any backup of it, the 4 disks Raid10 of 10TB is my sensitive one that i will backup on a Proxmox backup server once a week.

Edit :

According to this article, a RAIDZ3 of 12 disks (my use case), should have "theoretical" bandwidth of N-p single disk performance, so in theory, my 12-3 disks pool in this case should have bandwidth level of 160*9 = 1440MB/s in sequential write. I will try that one, i guess it is more inline anyway with my 12disks array that i try to create to use RaidZ3 instead of 2 or 1, from what i read in this article, they divided the pool in 2 vdev for the RaidZ2 and 4 for RaidZ1. I suppose that creating through the GUI the pool doesn't take this in consideration and just creates the pool using all disks and that's it ? I will give it a try, we will see. I transferred 26TB in a bit more than 11hours with the Raid10, so a theoretical value of 650MB/s IMO quite honest for such setup.

Dunuin · Aug 12, 2023

Keep in mind that IOPS performance scales with the amount of vdevs, not the amount of disks. No matter if you use 3, 12 or 100 disks in a raidz1/2/3, the IOPS performance will always be comparable to a single disk pool (in my exerieance even a bit slower). So yes, troughput performance might be great as this indeed scales with number of disks, but maybe your ~100 IOPS are the bottleneck, even wh3n doing sequential writes?
And did you verify before buying the disks, that these use CMR and not SMR, as the write performance with shingled magnetic recording will be terrible and should be avoided with ZFS at all cost.
CPU isn't too weak and enough RAM (32-64GB spare RAM juat for ZFSs ARC would be nice to have)?

delpiero3 · Aug 12, 2023

Dunuin said:
Keep in mind that IOPS performance scales with the amount of vdevs, not the amount of disks. No matter if you use 3, 12 or 100 disks in a raidz1/2/3, the IOPS performance will always be comparable to a single disk pool (in my exerieance even a bit slower). So yes, troughput performance might be great as this indeed scales with number of disks, but maybe your ~100 IOPS are the bottleneck, even wh3n doing sequential writes?
And did you verify before buying the disks, that these use CMR and not SMR, as the write performance with shingled magnetic recording will be terrible and should be avoided with ZFS at all cost.
CPU isn't too weak and enough RAM (32-64GB spare RAM juat for ZFSs ARC would be nice to have)?

Hi there,
sure, about IOPS i agree, but the sequential write performance should scale, right ? At least, that's what i am mainly looking at for that volume, as explained in my use case, this volume will store movies, sitcoms, books, music and pictures that i want to share through Plex Media server. There won't be any other VMs volumes stored on that zpool. So IOPS isn't my priority, but the sequential bandwidth (and good storage ratio) is.
With a zpool in RaidZ3, i am now having "decent" transfer rate from the old VM disk, according to my current transferred data size and the time the task has been started, i have a write bandwidth of 375MB/s, which is much better than before (something like 4x more).

According to your other question, you are right, i posted the CPU specs and so on in another thread, but didn't repeat it here, my apologies :
- Dual Xeon E5-2630 V3 (8 cores each plus Hyperthreading)
- 128GB of RAM (64 used for ZFS according to :

Code:

root@proxmox:~# awk '/^size/ { print $1 " " $3 / 1048576 }' < /proc/spl/kstat/zfs/arcstats
size 64330.6

- Disks are Seagate Barracuda SATA 5TB 2,5" ST5000LM000 and yes, are SMR based :-(

So obviously, the disks are definitely not enterprise grade and SMR based which according to what you are saying are terrible in write performance, what i have now looks to be kind of ok, i know situation is not ideal, i should have be more cautious at initial step of this project, but when i was looking to enterprise grade storage of that capacity, the price was way to high for my budget (or maybe you have good address to advise me, even refurbished or so ?).

Dunuin · Aug 12, 2023

You don't need enterprise HDDs but at least some consumer CMR HDDs. Try to write something like 1TB of media at once. Performance should significantly drop once the CMR cache is full.
Problem with SMR disks is not only that it might get unusable slow. The latency then might become that bad, that ZFS thinks the disks failed, because they can't answer in time so ZFS times out, counts the IO as read/write errors and with too much of these errors the pool might switch into degraded state, while actuallt the disks are still perfectly healthy.

SMR vs CMR for HDDs is like QLC vs SLC NAND for SSDs. Just way worse as even a good HDD is terrible these days when any kind of IOPS/latency is needed.

delpiero3 · Aug 12, 2023

Dunuin said:
You don't need enterprise HDDs but at least some consumer CMR HDDs. Try to write something like 1TB of media at once. Performance should significantly drop once the CMR cache is full.
Problem with SMR disks is not only that it might get unusable slow. The latency then might become that bad, that ZFS thinks the disks failed, because they can't answer in time so ZFS times out, counts the IO as read/write errors and with too much of these errors the pool might switch into degraded state, while actuallt the disks are still perfectly healthy.

SMR vs CMR for HDDs is like QLC vs SLC NAND for SSDs. Just way worse as even a good HDD is terrible these days when any kind of IOPS/latency is needed.

In 2,5" you don't find any consumer HDD with that capacity, or at least i couldn't find any by myself.
I agree with you, tendency in computers is to go for cheap and bad ... so sad.
I tried already this and was able to perform sequential write on 2 HDD of the lot over the entire drive keeping a steady 120-150MB/s with sometimes small spike to 170. I haven't seen any drop in writing, but of course it was sequential write, but that's the main purpose of that volume so i guess it should be kind of ok, what do you think ?

Dunuin · Aug 12, 2023

The write performance should significantly drop when not only doing short small bursts of writes. No matter if it is random or sequential.
Maybe your disk models aren't that bad, but I for example owned a SMR HDD that dropped from ~160MB/s down to the very low KB/s or even B/s range when doing something like extracting a 50GB zip file.

And yes, with 2,5" CMR you would be limited to 2,4TB per disk. If more is needed I would get a chassis with enough 3,5" slots or get some entry-level enterprise TLC SSDs where eben more than 5TB would be possible.

delpiero3 · Aug 12, 2023

Dunuin said:
The write performance should significantly drop when not only doing short small bursts of writes. No matter if it is random or sequential.
Maybe your disk models aren't that bad, but I for example owned a SMR HDD that dropped from ~160MB/s down to the very low KB/s or even B/s range when doing something like extracting a 50GB zip file.

And yes, with 2,5" CMR you would be limited to 2,4TB per disk. If more is needed I would get a chassis with enough 3,5" slots or get some entry-level enterprise TLC SSDs where eben more than 5TB would be possible.

At least that's not what i see on my side, maybe i am lucky in my mistake indeed, but still right now, i can tell that after 10TB transferred of the VM disk the bandwidth is still steady at 340MB/s, which is honest from my perspective. when the setup was with the 12 disks in Raid10, i ended up the transfer of 35TB with 740MB/s average, i think i am in a good shape.
Later on, if they start to have weaknesses, i may check for something else, but changing the chassis after all these expenses isn't something i can consider short term.
However, maybe enterprise SSD may become cheaper in 3.84 or even 7.68TB in the future, who knows ? I am still not sure where to buy good refurbished enterprise hardware, i was buying from a UK company, not sure i can tell which one so i won't say it right now, but with Brexit it became more complex and prices went up.

delpiero3 · Aug 14, 2023

I was able to recover most of my stuff, transfer ended successfully with an average final bandwidth of 280MB/s, not the fastest for sure, but enough for what i need to do for now. I thank you all for the help, now i just have to reconfigure all my environment and dockers, a lot to do but at least i don't start empty handed.
And if by chance you have some good place to find refurbished drives i am still interested

.
cheers.

Search

Search

[SOLVED] Cannot store on ZFS RaidZ volume "out of space"

delpiero3

Member

Attachments

Chris

Proxmox Staff Member

delpiero3

Member

Chris

Proxmox Staff Member

delpiero3

Member

Neobin

Famous Member

delpiero3

Member

Dunuin

Distinguished Member

delpiero3

Member

Dunuin

Distinguished Member

delpiero3

Member

Dunuin

Distinguished Member

delpiero3

Member

Dunuin

Distinguished Member

delpiero3

Member

Dunuin

Distinguished Member

delpiero3

Member

delpiero3

Member