Reduce ram requirement/usage due to having ZFS

DurzoBlint123

New Member
Mar 16, 2022
9
0
1
44
hello all,

New to this forum and new to Proxmox. I also only dabble in Linux and am VERY new to ZFS. I'm used to enterprise level servers with dedicated RAID controllers. So please forgive me if this has been asked many times as I've scoured many forums and seem to find differing answers.
I am replacing my old (free) Dell T610 server running VMWare as my home-lab server because it's just too noisy and is a giant space heater. I wanted to get more for my money but in a smaller package and be more efficient. So I went out and bought an i7 12700 proc, Asus MB, M.2 for OS, 2 (soon to be 3 but they only had 2 at the time of purchase) 8TB NAS 7200 rpm drives and 32gb of DDR5 ram. In my old server, I'm using about 41gb of RAM for all my VM's. I only got 32 on this because DDR5 is so expensive that I figured I'd just scale back for now and then add on later.
I want to run the 3 drives (maybe even a 4th) in a Raid 5 (or raidz) for either 16 or 24tb. Currently since I only have 2 of the 3 drives, I've been testing in a mirrored zfs config.
When I first build the mirror, RAM usage sits at about 1.5 gb. As soon as I build a windows 11 VM with 4gb on the array, the ram usage of the server spikes up to near 20gb used.
Ok, so now on to the help.
I've read that ZFS actually requires a lot of RAM usage. Didn't know that going into this. And that I'd need 4gb base and 1gb for ever tb of raw disk space. That's a ton of "wasted" ram for 3 or 4 - 8tb drives (imo). In my rabbit hole of searching, I found a mention that I could turn off compression and that would reduce the RAM requirement. That would be fine for my usage as long as thin provision still works. Well I turned off compression and it didn't seem to help.
I've also read that the current RAM in use is more "soft", that it's going to use what it needs but not at the expense of the VM's requirements. That if I build more VM's that require more ram than what's currently available - ZFS will simply reduce what is currently in use. Is this the case?
I'm debating about returning this MB and RAM and downgrading to a DDR4 setup so I can start at the 64gb spot with room to grow into 128 later. But is it worth the hassle of doing this? I hate to buy previous gen tech when I could "future proof" now.
I thought about getting a dedicated RAID controller to use hardware raid rather than relying on ZFS at all and just set up the resulting hardware volumes as LVM-Thin disks. But it looks like "consumer grade" cheap raid controllers in the $50 range have REALLY mixed reviews. And I'm worried about going with a server pull like a PERC H710 from ebay and having compatibility issues with the rest of the hardware.
What is your advice here?
Keep the equipment I have now, load my VM's and the usage will adjust on it's own?
Return/exchange and downgrade to DDR4 setup? I have about 12 days left in my return window for the MB at Microcenter.
Get a dedicated hardware raid controller? and if so, do you have any budget friendly recommendations?
Or return it all and pick up a used PowerEdge 630 or something similar that's likely to be quieter and cooler than this dinosaur I'm running now?
Thanks all if you made it this far. I owe you a beer.
Brian
 
Hi,
I also only dabble in Linux and am VERY new to ZFS.
I'd recommend skimming over our reference docs regarding ZFS:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#chapter_zfs
I've read that ZFS actually requires a lot of RAM usage. Didn't know that going into this. And that I'd need 4gb base and 1gb for ever tb of raw disk space. That's a ton of "wasted" ram for 3 or 4 - 8tb drives (imo).
Note that the rule of thumb is mostly for "raw used space" and that it's for the ideal case, you can get away with less; some say that since ZFS 0.7+, which got some memory optimizations, one can be totally fine with 256 - 512 MiB of memory per raw TiB, YMMV.

I've also read that the current RAM in use is more "soft", that it's going to use what it needs but not at the expense of the VM's requirements. That if I build more VM's that require more ram than what's currently available - ZFS will simply reduce what is currently in use. Is this the case?
Yes, but it has its limits, especially if there's a lot of IO, the IO pressure can impact your system quite a bit in a reduced memory situation.

I'm debating about returning this MB and RAM and downgrading to a DDR4 setup so I can start at the 64gb spot with room to grow into 128 later. But is it worth the hassle of doing this? I hate to buy previous gen tech when I could "future proof" now.
If your planned setup is good with writing off ~8-10 GiB to ZFS (i.e., not heavily rely on VMs being able to use that), I'd personally stay with the DDR5 memory, with memory often being a bottleneck in lots of computations it's just nice to have double the bandwidth and (normally) a bit less latency that DDR5 brings over DDR4. But yeah, we recently got an Alder Lake system with 128 GiB memory for a workstation, so I definitively feel your pain regarding current DDR5 memory prices.
Albeit, I don't know your setups specific requirements nor return policy, so take my advice with a grain of salt.

I thought about getting a dedicated RAID controller to use hardware raid rather than relying on ZFS at all and just set up the resulting hardware volumes as LVM-Thin disks. But it looks like "consumer grade" cheap raid controllers in the $50 range have REALLY mixed reviews. And I'm worried about going with a server pull like a PERC H710 from ebay and having compatibility issues with the rest of the hardware.
What is your advice here?
I personally dislike HW RAID controller a lot, they're frequently a proprietary mess. That said, we have quite some users that use (enterprise) HW RAID just fine, and while ZFS surely is great and flexible, it has its quirks too.
 
I thought about getting a dedicated RAID controller to use hardware raid rather than relying on ZFS at all and just set up the resulting hardware volumes as LVM-Thin disks.
Real hardware raid is something i have abandoned for a long time.
One reason was that those systems do depend on, hardware. So as long as you don't use raid 1 with 2 disks: if your raid controller dies, you need to have the exact same model to have a chance to get your data back. Raid 5/6 is a nightmare.
You are also depend more on the implementation in the is, potentially raid management tools, etc.
Because of the lack of drivers for the os I needed to throw out my loved dell perc5i and perc6i 6 years ago. From the perf and other specs they would have suited me until today.
That was my reason to switch to mdadm and a Linux software raid / implementation. Have chosen LSI 9211i jbod controllers for this.

But it looks like "consumer grade" cheap raid controllers in the $50 range have REALLY mixed reviews.
Don't do it. This is no real HW raid, it is a software implementation within a chip, no cache but all the negative side effects. Usually these controllers are less good supported. It
And I'm worried about going with a server pull like a PERC H710 from ebay and having compatibility issues with the rest of the hardware.
I have done this for years. Percs are solid hw devices. There is a good community around them and a lot of these cards are basically LSI or LSI based cards. Good cards. If you want to go down that route choose a perc vor an original LSI.
Possible you need a pin mod to get it Wirkung in your mainboard but typically the run just fine.
They also offer to do scrubs but technical they can't ensure the validity of your data since they don't implement checksumming.

And this was the point (checksumming) as well as the availability of openzfs/Zfs on Linux where I abandoned mdadm as well.

All things have pros and cons. Zfs is heavy on memory usage compared to other things. It is not (yet) as flexible as he or sw raid in terms of expansions. But since it is sw it will get there. It is in the works. It is also a bit more picky on the hardware end. I have had great experience with my mdadm he combo but zfs did not like it, so I needed to dum0 my SAS expander and get dedicated controllers. Anyways.

It is flexible in usage, keeps my data safe and has over all turned out very reliable. All my data now is housed on ZFS. Regular scrubs ensure the data is not rotten. Snapshots are a first line of protection, dataset replication is used for my image level backups.
Get yourself some decent hba or just use the onboard controller. Get some additional memory and you are good to go.

I am not looking back ;)
 
Real hardware raid is something i have abandoned for a long time.
One reason was that those systems do depend on, hardware. So as long as you don't use raid 1 with 2 disks: if your raid controller dies, you need to have the exact same model to have a chance to get your data back. Raid 5/6 is a nightmare.
You are also depend more on the implementation in the is, potentially raid management tools, etc.
Because of the lack of drivers for the os I needed to throw out my loved dell perc5i and perc6i 6 years ago. From the perf and other specs they would have suited me until today.
That was my reason to switch to mdadm and a Linux software raid / implementation. Have chosen LSI 9211i jbod controllers for this.


Don't do it. This is no real HW raid, it is a software implementation within a chip, no cache but all the negative side effects. Usually these controllers are less good supported. It

I have done this for years. Percs are solid hw devices. There is a good community around them and a lot of these cards are basically LSI or LSI based cards. Good cards. If you want to go down that route choose a perc vor an original LSI.
Possible you need a pin mod to get it Wirkung in your mainboard but typically the run just fine.
They also offer to do scrubs but technical they can't ensure the validity of your data since they don't implement checksumming.

And this was the point (checksumming) as well as the availability of openzfs/Zfs on Linux where I abandoned mdadm as well.

All things have pros and cons. Zfs is heavy on memory usage compared to other things. It is not (yet) as flexible as he or sw raid in terms of expansions. But since it is sw it will get there. It is in the works. It is also a bit more picky on the hardware end. I have had great experience with my mdadm he combo but zfs did not like it, so I needed to dum0 my SAS expander and get dedicated controllers. Anyways.

It is flexible in usage, keeps my data safe and has over all turned out very reliable. All my data now is housed on ZFS. Regular scrubs ensure the data is not rotten. Snapshots are a first line of protection, dataset replication is used for my image level backups.
Get yourself some decent hba or just use the onboard controller. Get some additional memory and you are good to go.

I am not looking back ;)
Thank you so much for this very thorough reply. I really appreciate it. Sorry for the delayed response - I've been tinkering practically non-stop.

Well since I'm not ready to buy MORE DDR5 at these prices... and now that I HAVE DDR5, my OCD won't allow me to downgrade to DDR4... by my own stubbornness, I am stuck with what I have for a while.

So I tried my hand at LVM instead. It looked like Proxmox can't do LVM Raid1 (and eventually Raid5) natively in the webgui, but I was able to set up a mirrored LVM Mirror via command line and at least the results were shown in the gui. Ok good.

My main goal for the largest mass of storage is for a giant CIFS/Samba share for my entire home. Most of it will be used for my Plex server to access it's content. And then also for general storage.
What I have in my old ESXi server is an OpenMediaVault VM serving up the samba share there. So I was trying the same here.
I stood up OMV, gave it a virtual disk on the large LVM array and then set up a share. Ok, maybe this will work...
Then I did some network based read/write tests from one of my hardwired desktops...
The write speeds were TERRIBLE compared to my existing ESXi host/vm combo.
I was getting write speeds in the 40 Mbps range with read speeds in the 800's. While my ESXi/VM combo was getting around 600Mbps write.
Also, I built a couple Windows and a Linux VM and put them on that LVM storage as well, and they were also TERRIBLY slow.
I searched around multiple forums and I found mentions of turning off caching of the virtual disk. Did that. I also changed all manor of other options including the virtual controller, virtio, scsi, sata... nothing seemed to make it much better.
At one point, I went into the console of Proxmox host and installed samba, then manually created a share on what I THOUGHT was the LVM (I also have a single SSD on this box as well).
I did the same network speed test from one of my hardwired workstations to that proxmox smb share and the write speed was great at something around 400 to 500 Mbps I think.
So I figured my problem HAD To be at the VM level.
After a LOT of troubleshooting - I just got fed up and deleted my LVM because I wanted to test the zfs performance at least to see if it was worth downgrading to DDR4 so I could afford more RAM.... wait a minute, the share is still there?!?
I had destroyed the LVM-Thin, Destroyed the physical volumes, and even performed a wipe on the physical disks... but I was still able to access and write to the share that I had supposedly put on the LVM. What the hell!! Sometimes I hate linux!
So after all that mess, I gave up.
I disabled the Efficiency Cores of my processor in my BIOS and installed an Intel NIC (the onboard was realtek) so that the hardware was compatible with ESXi. I installed ESXi and so far I'm much happier.
Unfortunately, ESXi has no raid capabilities at all. So I installed the base OS of OMV on my SSD datastore (similar to what I did in proxmox). Then I just created each physical 8TB drive as an individual datastore.
Then created a single 7.2TB VMDK from each datastore to the OMV vm.
Then in OMV I created a Raid1 out of the two vmdk's, and went through the rest of the process to create the test share.
The network write performance from my hardwired desktop was now 140Mbps. Still not the 400 to 600 I am getting on my old ESXi box but it's better than 40!. And there are a couple of factors here. Since the raid is happening within OMV this time, it seems to be doing a sync/initialization that has been running for the last several hours and is only at 30%. So there are a lot of writes happening to it. I'm betting that once the initialization is done, that the performance would go up. I don't remember seeing any kind of initialization phase in proxmox. So I don't know if it was doing that or not and maybe that's why it was so slow there?
The other factor is that I had already expected my writes to be slower than my original ESXi host because that's running seven 3tb drives in a raid 5. This is currently only running two 8tb drives in a raid1. But I didn't expect it to be as bad as 40.
Plus, I also have a synology NAS with only two mechanical drives in a Raid1 as well and I was using that as a control. I was getting over the 100's in write speed there as well. I can't remember exactly because I deleted my screen shots, but I think it was in the 300's..
So again, I'm sure that once the initialization cools off, I'll be happy as a clam.
My problem is that none of these solutions are perfect for what I want.
I really liked SOME of what proxmox had to offer. I REALLY liked that Proxmox has the ability to create a virtual TPM module for being able to install a Win 11 VM without having to hack the installer. That was a nice surprise.
And while very limited, the mobile interface you get when accessing proxmox from your phone was pretty cool too.
But I really wish that proxmox had a built in Samba sharing capabilities in the GUI. It would would be nice if it had the varying RAID levels for LVM in the GUI as well as an alternative to ZFS if you don't have the RAM.
But I also had trouble with guest VM video resolutions. it took a lot of tinkering and various driver installs and setting the default resolution in the virtual bios and blah blah, but I was finally able to set a better resolution in windows... but it wasn't perfect. And it wasn't dynamic. On ESXi, I can make the guest resolution match whatever the viewing client window size is. It could be 1111x765 by just dragging the border. Or if I open it as a new chrome tab, it automatically fills the available space. Much more pleasant to work with.
But Proxmox has a much better dashboard to look at in regards to it's resource usage. ESXi is pretty dated there.
And I thought about just running OMV as the host OS since what I wanted was software raid and samba serving... but the only options for VM's on top are Dockers there. Supposedly it used to support VirtualBox in version 5. But they dropped that support in Version 6.
Lastly, I tried a demo of Unraid. but I still didn't like the VM capabilities. And I had problems getting the drives set up the way I wanted them... Like I couldn't seem to get my SSD to be used as a single storage location for VM's to be built on. I'm sure with more tinkering maybe I could have gotten it. But I just didn't have the patience anymore.
Oh, and I even thought of just going with Windows 11 Pro on the host to serve up the SMB share, with HyperV or Virtual Box for the VM's. But I was really looking for more of a true hypervisor.
Oh well. Thanks again for your assistance. I'm sure I'll revisit Proxmox in the future.
 
Did you read my reply? Why not just stay with ZFS and reduce its memory impact? It allows to expose any ZFS dataset as nfs and or CIFS share with a simple one line command...
 
Did you read my reply? Why not just stay with ZFS and reduce its memory impact? It allows to expose any ZFS dataset as nfs and or CIFS share with a simple one line command...
So I guess I misunderstood since you also said there could be performance issues with setting a ram limit. So since you wrote this, I decided to go back and try raidz again... plus I also got my extra drives. I originally was going to go with 3 drives, but I decided to go with four 8tb drives, giving a total of 32tb raw. Of course this is a problem for my current RAM setup, as I only have 32gb of RAM... but I wanted to try out zfs anyway.
New problem though... I created these 4 drives in a Raidz1 (raid-5 equivalent). It's odd though because in one screen it says the zfs capacity is 32tb. Shouldn't it say 24tb if losing one to parity? Soldier on...
I add a virtual disk to my OMV VM, when I select the drive to put it on, my "BigDataZ" storage says 22TB available... ok, that seems more like what I'm expecting.
I try giving it a 20TB virtual disk and I get an error saying it's not enough space. What?!? this is a fresh array with nothing on it. I go down in 1 TB increments until it lets me create it... 14 TB. So how does a Raidz (supposed to be a Raid 5 equivalent) with 32 TB raw, 24 TB usable (22 apparently) only allow me to actually use 14 TB? yet again ZFS is pissing me off.
 
No idea what you actually did, so please post the zpool status and zfs list outputs to get a better understanding of that.

Also note the classic TB (base 10) vs. TiB (base 2) mismatch, the former is ~17% smaller.
 
Also read this: https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz

I guess you didn't increased your volblocksize so you are loosing 50% of your raw capacity. 25% loss of the raw capacity to parity and 25% loss to padding overhead. Also keep in mind that a zfs pool should not be filled up more than 80%, because otherwise it will get slow and fragment faster. So actually you now only got 40% (12.8TB which is 11.64 TiB) of your raw storage usable for virtual disks.

When increasing the volblocksize from the default 8K to 16K you could use 15.37 TiB for virtual disks and if you increase the volblocksize to 64K your virtual disks could use 17 TiB.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!