ZFS, file or block level storage

sukerman

Active Member
Aug 29, 2019
53
4
28
51
I'm using Proxmox 5.4.31 at OVH, and installed using their Proxmox ZFS template. The entire disk is a ZFS filesystem, but it shows as a local directory in the storage panel. I guess this is more flexible since otherwise you can't store ISO images etc, because block devices only allows Disk Image & Containers.

Sorry - I'm new so I have many questions.....

1) Should I partition so I have space to add a ZFS block device later and use .raw format files, or qcow? I'm guessing in this case I should use ZFS thin provision, set compression=lz4 and use raw files to avoid double cow.

2) If I add the ZFS pool as a directory, should I avoid or qcow diskformat?

3) If performance much worse with thin provisioning or does it settle down once the disk fills?

I have done a lot of Googling and reading, but I'm hoping for someone who's been there and done it to short cut me through the maze.
What's the best way to do this, ideally for performance, but thin provisioning is attractive as I can install machines with Tb disks when I'm not sure what the eventual capacity needs to be giving them room to grow even though they probably won't need to. Advantages / Disadvantages? Slow to backup ? No snapshot support?

Thanks

Jack.
 
Should I partition so I have space to add a ZFS block device later and use .raw format files, or qcow? I'm guessing in this case I should use ZFS thin provision, set compression=lz4 and use raw files to avoid double cow.

Hi,

If you want to use zfs, then you must use raw and not qcow2 (who is cow type). And lzo in most cases is ok.


If performance much worse with thin provisioning or does it settle down once the disk fills?


Yes. could be some problems ... I do not used. I allocated a fixed storage size, and when I need I increase the size. This take only few minutes.

The only caveats that I see it the fact that maybe you will thin allocate let say 1 Tb. Now is Ok, but maybe then in the future you will get in a situation when your zfs pool will be almost fulll. Then your thin VM will need more space, but this free space will not be available. So your VM will be .... ? And are many tools that can tell you automatically .... your free space is 10%.

So for my experience is more safely to no use thin allocation. Decide what will be the best for you!

And qcow/raw is supported only for zfs datasets/folders. Block device from your post subject?
 
1) File device - Add ZFS pool as a directory
2) Block device - Add ZFS pool as type ZFS

2) only allows Image + Container, but are there are performance differences?

Thanks!
 
1) Should I partition so I have space to add a ZFS block device later and use .raw format files, or qcow? I'm guessing in this case I should use ZFS thin provision, set compression=lz4 and use raw files to avoid double cow.

I would only use QCOW2 on "ZFS Directory" (as PVE storage type) if you want to switch in between snapshots. ZFS has a linear snapshot hierarchy in which you can only go back once, whereas QCOW2 has a tree-like hierarchy in which you can jump around without interfering with other snapshots.
 
Thanks all, I've gone with ZFS filesystem underneath with thin provisioning and then added them as directorys and added vms with raw disks. This provides everything I need, storing backups and iso images, vm's, snapshots, raw files only actually take up the space that's used and performance is good
 
Why on earth would you want to store RAW on ZFS directory?

ZFS directory is for iso and backups
ZFS is for container and VM storage with snapshots
 
Its 'raw' but its not really raw. It shows as 1Tb in proxmox, but physically on disk because ZFS is set to thin provision the server takes up 18Gb
 
Yes you could, but then you can't store ISO images etc. I don't know if there's other advantages / disadvantages performance wise of either option? I'm not arguing, I'm new to all this, just trying to find out how all this works.
 
My 5 cents.
Create a new dataset and add it as storage option (disk image, etc..) using PM GUI.
Then PM will create ZVOLs for your VMs in that dataset. Usually it is already created (rpool/data called local-zfs), but I do not know what your provider did in it's PM install recipe.
 
At OVH, by default if you choose to install server with ZFS root filesystem, it mounts as a directory, I guess because, if not you would not be able to store anything but vm images + backups. You can see the 5 vm's have 100Gb disks but it only takes up 115Gb on the actual disk.
 

Attachments

  • Captura de Pantalla 2019-09-04 a la(s) 11.44.11.png
    Captura de Pantalla 2019-09-04 a la(s) 11.44.11.png
    39.3 KB · Views: 23
  • Captura de Pantalla 2019-09-04 a la(s) 11.46.43.png
    Captura de Pantalla 2019-09-04 a la(s) 11.46.43.png
    557 KB · Views: 20
Ok, I added a /rpool/data as a ZFS vol 'vms'. When I created a VM the file format is 'raw' and you can't change it, its now greyed out, but now snapshots are available. LnxBil, anything I'm missing here? are there still missing features with this setup?

I'm using HDD backed files, I'll do fio on the old and the new to see if there's any difference
 

Attachments

  • Captura de Pantalla 2019-09-04 a la(s) 15.50.43.png
    Captura de Pantalla 2019-09-04 a la(s) 15.50.43.png
    50.6 KB · Views: 17
  • Captura de Pantalla 2019-09-04 a la(s) 15.51.10.png
    Captura de Pantalla 2019-09-04 a la(s) 15.51.10.png
    55.4 KB · Views: 16
Take this with a bucket of salt, all sorts of caching / compression may be going on and you may get very different results. Got very different results when the box was unloaded and loaded with other VM's too. These are under 'medium' load, not very scientific I know but with no load I got impossibly fast results sometimes.

Both using raw disks and writeback cache.

vm on ZFS mounted as directory :

4k blocks test 75% r/w mix: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64

IO delay ~12 - 25%.

test: (groupid=0, jobs=1): err= 0: pid=3730: Wed Sep 4 15:07:09 2019
read: IOPS=7125, BW=27.8MiB/s (29.2MB/s)(3070MiB/110297msec)
bw ( KiB/s): min=13000, max=237440, per=99.85%, avg=28457.40, stdev=24567.35, samples=220
iops : min= 3250, max=59360, avg=7114.34, stdev=6141.84, samples=220
write: IOPS=2381, BW=9525KiB/s (9754kB/s)(1026MiB/110297msec)
bw ( KiB/s): min= 4144, max=79208, per=99.83%, avg=9509.10, stdev=8162.83, samples=220
iops : min= 1036, max=19802, avg=2377.27, stdev=2040.71, samples=220
cpu : usr=4.71%, sys=11.86%, ctx=606473, majf=0, minf=8

vm on ZFS mounted as zvol :

4k blocks test 75% r/w mix: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64

IO delay ~25-75%.

test: (groupid=0, jobs=1): err= 0: pid=1068: Wed Sep 4 15:03:56 2019
read: IOPS=11.6k, BW=45.3MiB/s (47.5MB/s)(3070MiB/67753msec)
bw ( KiB/s): min= 552, max=264296, per=100.00%, avg=46915.60, stdev=35010.28, samples=134
iops : min= 138, max=66074, avg=11728.90, stdev=8752.57, samples=134
write: IOPS=3876, BW=15.1MiB/s (15.9MB/s)(1026MiB/67753msec)
bw ( KiB/s): min= 144, max=87704, per=100.00%, avg=15680.47, stdev=11679.87, samples=134
iops : min= 36, max=21926, avg=3920.10, stdev=2919.96, samples=134
cpu : usr=6.01%, sys=14.31%, ctx=259374, majf=0, minf=6

Ran it several times, this one was fast:

test: (groupid=0, jobs=1): err= 0: pid=1072: Wed Sep 4 15:08:25 2019
read: IOPS=72.1k, BW=282MiB/s (295MB/s)(3070MiB/10903msec)
bw ( KiB/s): min=116152, max=314328, per=100.00%, avg=291615.67, stdev=45944.17, samples=21
iops : min=29038, max=78582, avg=72903.90, stdev=11486.06, samples=21
write: IOPS=24.1k, BW=94.1MiB/s (98.7MB/s)(1026MiB/10903msec)
bw ( KiB/s): min=39472, max=106136, per=100.00%, avg=97485.57, stdev=15334.59, samples=21
iops : min= 9868, max=26534, avg=24371.38, stdev=3833.66, samples=21
cpu : usr=31.69%, sys=62.09%, ctx=16564, majf=0, minf=8

Results are so wildly different and vary from 20Mb/s to 3000Mb/s I don't know what to make of it, given that 3000M/s is clearly impossible on HDD. Oh well, thanks all, I'll go with zvols just so I can have snapshot in the GUI.
 
Hi,

@sukerman ,

During many years I see a lots of tests like yours. I also do for myself. But in the end, I find (my own opinion ) that are not so useful. So I decided that the best is to stop use tests, but to use my real load and try to optimize my setups.

Why? Because is very hard or almost impossible to imagine a test who it will emulate your own load. In many cases I optimize my setups using tests like yours but using my real load it was a disaster.
Recently one guy ask me to optimize his setup for a better performance. I have do what I think it was ok for his load. Then he want to run a test with fio (he was very confident that if the test results are bad then in production with his real load it will be too bad).
After the worst test results (compared with his working setup) he conclude that my setup will be very very bad.
But in the end the results was very good. A backup was finished in several hours insted of several days!

So be aware what you test and how you will spent your time!

Good luck!
 
I agree with you I was aware that the results were too crazy to be trusted, also when they system is overloaded and iobound performance plummets, maybe you get very different results with SSD, NVMe. I agree test your own workload in practice and see what happens. Maybe someone else has done much more detailed reliable testing on this.
 
I see you did what I have told you, and since rpool/data was already made, all you have had to do is to add it as a storage in PM GUI.
Hopefully you ticked the thin provision box also. :)
Now you are all good, and tuning is another separate issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!