Question regarding LVM/ZFS

vlad76

Member
Dec 18, 2019
6
1
8
32
I have 4 disks, each 2TB. I've set them in RAID5 using RAIDZ-1. I should have 6TB available.

When I click on ZFS in the node itself is states that rpool size is 7.99TB.
There are two storages:
local (shows as 3.23TB) and local-zfs (5.67TB)
Why those numbers? Why does that add up to 8.9TB?

Before I noticed all this mess I've made a CT with 5632GB disk (5.5TB) for file server. I thought I'd just leave 500GB to PVE. Felt like a good round number.
But when I click on "local-zfs" it shows "subvol-100-disk-0" sized 6.05TB?

None of this adds up to me. I either am misunderstanding something basic here, or I configured everything, or both. I've already moved 4TB of stuff to the file server, so if I need to redo everything, I'll be sad.
 
I have 4 disks, each 2TB. I've set them in RAID5 using RAIDZ-1. I should have 6TB available.

When I click on ZFS in the node itself is states that rpool size is 7.99TB.
There are two storages:
local (shows as 3.23TB) and local-zfs (5.67TB)
Why those numbers? Why does that add up to 8.9TB?

Before I noticed all this mess I've made a CT with 5632GB disk (5.5TB) for file server. I thought I'd just leave 500GB to PVE. Felt like a good round number.
But when I click on "local-zfs" it shows "subvol-100-disk-0" sized 6.05TB?

None of this adds up to me. I either am misunderstanding something basic here, or I configured everything, or both. I've already moved 4TB of stuff to the file server, so if I need to redo everything, I'll be sad.

Basically with 4x 2TB disks in a raidz1 and ashift of 12 you got this:

8TB raw storage where you will loose 2TB because of parity so you only got 6TB. "zpool" always shows the raw storage (so the full 8TB even if 8TB aren't usable). "zfs" command will always show the raw storage - parity, so 6TB.

But that doesn't mean you can use the full 6TB. First you got overhead, so you will never be able to use the full 6TB. Then a ZFS pool should not be filled up more than 80% (and never more than 90%). So if you got 6TB you shouldn't store more than 4.8TB on it.
So for datasets (which LXCs will use) you got round about 4.8TB of usable storage.
But for zvols (used by VMs) you will also get padding overhead. You can't directly see this padding overhead because not your storage gets smaller, but all data written to a zvol will be bigger. So if you for example got 100% padding overhead that would mean writing 2.4TB of data written to a zvol will result in 4.8TB space consumed of that pool.
How big your padding overhead is mostly depends on your used volblocksize. With a volblocksize of 8K everything will be 50% bigger so you can only store 3.2TB of zvols to that pool. Increase the volblocksize to 64K and the padding overhead will reduce and you could store nearly 4.8TB of zvols too (but performance and SSD wear will be horrible for all writes smaller than 64K).

So right now you got a usable space of 4.8TB for LXCs OR 3.2TB for VMs.

And then don't forget that PVE uses most of the time TiB and not TB. 4.8TB is just 4.36 TiB.

And in case you want to use snapshots they also need space and snapshots can consume a multiple of the actual data if you got alot of changes and keep them for too long. So in case you want to snapshot keep a big part of your storage free for them.
 
Last edited:
  • Like
Reactions: vlad76
Basically with 4x 2TB disks in a raidz1 and ashift of 12 you got this:

8TB raw storage where you will loose 2TB because of parity so you only got 6TB. "zpool" always shows the raw storage (so the full 8TB even if 8TB anren't usable). "zfs" command will always show the raw storage - parity, so 6TB.

But that doesn't mean you can use the full 6TB. First you got overhead, so you will never be able to use the full 6TB. Then a ZFS pool should not be filled up more than 80% (and never more than 90%). So if you got 6TB you shouldn't store more than 4.8TB on it.
So for datasets (which LXCs will use) you round about 4.8TB of usable storage.
But for zvols (used by VMs) you will also get padding overhead. You can't directly see this padding overhead because not your storage gets smaller, but all data written to a zvol will be bigger. So if you for example got 100% padding overhead that would mean writing 2.4TB of data written to a zvol will result in 4.8TB space consumed of that pool.
How big your padding overhead is mostly depends on your used volblocksize. With a volblocksize of 8K everything will be 50% bigger so you can only store 3.2TB of zvols tot that pool. Increase the volblocksize to 64K and the padding overhead will reduce and you could store nearly 4.8TB of zvols too.

So right now you got a usable space of 4.8TB for LXCs OR 3.2TB for VMs.

And then don't forget that PVE uses most of the time TiB and not TB. 4.8TB is just 4.36 TiB.

And in case you want to use snapshots they also need space and snapshots can consume a multiple of the actual data if you got alot of changes and keep them for too long. So in case you want to snapshot keep a big part of your storage free for them.

This is the first time I've played with software raid, so this is all new to me. Thanks for the explanation. I'm going to have to wrap my mind around it.

That being said, at this moment, do I need to fix anything? Did I over provision storage? And if I overfill it, will it stop me from causing a catastrophic failure, or will the whole setup blow up once I reach fill up storage to a certain point?
 
You overprovisioned it. So should be fine for now but nothing will prevent the pool from running full where your pool will stop working and switch to read-only. So you should monitor your pool with zfs list rpool and make sure your "USED" won't get above 4.8TB.

At around 80% your pool starts slowly to get slow and starts to fragment fast, which is bad, because there is no way to defrag it. Only way to remove fragmentation is to destroy the pool start from scratch. At around 90% ZFS will switch into panic mode and performance/fragmentation will get way more worse. And if it reaches 100%, your pool will completely stop working.
 
Last edited:
You overprovisioned it. So should be fine for now but nothing will prevent the pool from running full where your pool will stop working and switch to read-only. So you should monitor your pool with zfs list rpool and make sure your "USED" won't get above 4.8TB.

At around 80% your pool starts slowly to get slow and starts to fragment fast, which is bad, because there is no way to defrag it. Only way to remove fragmentation is to destroy the pool start from scratch. At around 90% ZFS will switch into panic mode and performance/fragmentation will get way more worse. And if it reaches 100%, your pool will completely stop working.
Well, good to know. This all started with my hardware raid card dying and I've rebuilt this server with whatever I had laying around. So, I've already been thinking about expanding storage. I think I'll just take it easy with this file server, make another proxmox node with more storage and migrate it over.

Thanks for all your help. This has is all a good learning experience for me.
 
Best you ask here before creating that pool or buying drives. ZFS is a complex topic and there are alot of factors and it really depends on your hardware and your workload.

For example you don'T want to use SMR HDDs and most of the time also no consumer SSDs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!