Recommended RAM for ZFS

carles89

Renowned Member
May 27, 2015
76
7
73
Hello,

We've a 5 node cluster with local storage. All servers are the same:
  • Intel Xeon E-2288G
  • 128GB RAM
  • 4 x 960GB SSD in RAID10 with hardware controller.
We try not to provision more than the 80% of RAM (100GB+-) to have room for balancing VMs if necessary.

The point is that we miss ZFS replication on LVM-Thin, so we're thinking about replacing the SSDs and the hardware controller for 2 x 1,92TB NVMe SSD and switch to ZFS, so the new configuration would be:

5 node cluster, each node with:
  • Intel Xeon E-2288G
  • 128GB RAM
  • 2 x 1,92 TB NVMe SSD in ZFS Mirror
  • Replication of VMs between nodes via ZFS-Replication
And the question is: Considering the speed of NVMe, how much RAM do we need to reserve for ZFS to make sure it works properly and does not cause unexpected reboots? We would like to limit it in some way, just to be sure how much RAM ZFS will us and know how many RAM we have left for VMs.

Thank you
 
Proxmox uses default of 50%, that's in your case 64gb.
Somehow sometimes arc grows even above that, it happened to me on earlier version, dunno why.

However, you can limit arc size with:
/etc/modprobe.d/zfs.conf
options zfs zfs_arc_min=
options zfs zfs_arc_max=

I would set it to max 64gb and min 48gb, or max 80gb and min 64gb... Whatever your need is for the rest of vm's you can decide that yourself. The value is always in Bytes...

However, while your nvme storage is pretty fast, ram is still 20-80x faster (4gb/s vs 220-800gb/s (depends on dual/quad/octa channel))
So you still speed it still up + in some cases, while your vm reads from arc, it can write the same time to the ssds. (In theory)

However, about stability... Arc has no impact on stability, you can go with 4gb arc cache or with 500gb arc cache... It doesn't matters stability wise.

Cheers
 
Hi Ramalama,

I know that ZFS uses 50% by default, that's why I want to limit it, I don't want to lose 64GB of RAM when now I'm using 100GB of 128GB.

So, in theory, no matter how much limit I put for ARC (i.e. max 15GB), since I'm using NVMe the performance would be at least the same as now, right?

Because now I'm using SATA SSDs with a HW RAID controller, so the only cache here may be the one from the controller, which is not even being used because RAID volume it's configured with WriteThrough policy.

Thank you
 
Yeah, you can limit it to what you want.
I see no downsides, i personally won't go below 16gb, since you have so much.
Use as min 2gb and as max 16...
So zfs should calculate as target arc size, probably around 8-12gb, dunno if it will use more if it needs then and sets it's target then to 16gb max. Dunno how intelligent zfs is with calculating target arc size xD

However, the is probably a small performance impact cause of less zfs ram, but not any stability impact.

I mean you can even control the arc sizes on the fly live... Even with a script, like if vm 1 gets turned on, echo "whatever" > /sys/module/zfs/parameters/zfs_arc_max
If you set less as actual, it makes the space free in ram instantly. (But dunno how intelligent zfs is and which data it drops first)

However, setting it on the fly too often isn't recommended anyway, but you can use it, to find your best size and experiment.

Stability wise, it all shouldn't have any impact.

Cheers
 
  • Like
Reactions: carles89
Forgot to mention, but you probably know yourself, hwraid and zfs aren't best friends xD
So if you exchange it to native zfs, it's a good way.
I had myself once a shit time with a hwraid using in freenas... And the gui freezed after some days, the system was unstable and in the end the data corrupted too. Even in jbod mode...

Cheers
 
  • Like
Reactions: bensode
Yeah, you can limit it to what you want.
I see no downsides, i personally won't go below 16gb, since you have so much.
Use as min 2gb and as max 16...
So zfs should calculate as target arc size, probably around 8-12gb, dunno if it will use more if it needs then and sets it's target then to 16gb max. Dunno how intelligent zfs is with calculating target arc size xD

However, the is probably a small performance impact cause of less zfs ram, but not any stability impact.

I mean you can even control the arc sizes on the fly live... Even with a script, like if vm 1 gets turned on, echo "whatever" > /sys/module/zfs/parameters/zfs_arc_max
If you set less as actual, it makes the space free in ram instantly. (But dunno how intelligent zfs is and which data it drops first)

However, setting it on the fly too often isn't recommended anyway, but you can use it, to find your best size and experiment.

Stability wise, it all shouldn't have any impact.

Cheers
As far as I know you can't make the ARC as small as you want. ARC isn't only read cache, it will also cache ZFS operations (dnode and so on) and if you run out of space ZFS won't be able to operate...or atleast only in slow motion because ZFS can't cache results of ZFS algorithms.
It should run with 4 to 8GB RAM but I also wouldn't go below 16GB.
You can run arc_summary so see if your ARC is big enough. I for example ran out of dnode space and needed to increase it.
 
As far as I know you can't make the ARC as small as you want. ARC isn't only read cache, it will also cache ZFS operations (dnode and so on) and if you run out of space ZFS won't be able to operate...or atleast only in slow motion because ZFS can't cache results of ZFS algorithms.
It should run with 4 to 8GB RAM but I also wouldn't go below 16GB.
You can run arc_summary so see if your ARC is big enough. I for example ran out of dnode space and needed to increase it.
Thanks for the explanation, didn't knew that.
 
The OpenZFS documentation isn't mentioning how much RAM you need. It only mentions that ZFS will run with 2GB RAM but you might want to use atleast 8GB or it will be slow.

Proxmox documentation is mentioning this:
ZFS works best with a lot of memory. If you intend to use ZFS make sure to have enough RAM available for it. A good calculation is 4GB plus 1GB RAM for each TB RAW disk space.

TrueNAS documentions mentions this:
A minimum of 8 GB of RAM is required for basic TrueNAS operations with up to eight drives.
...
An additional 1GB per additional drive after eight will benefit most use cases.
...
Deduplication depends on an in-RAM deduplication table with a suggestion of 5 GB per TB of storage.
Attaching an L2ARC drive to a pool will actually use some RAM, too. ZFS needs metadata in ARC to know what data is in L2ARC. As a conservative estimate, plan to add about 1 GB of RAM for every 50 GB of L2ARC in your pool.

So how much RAM you ARC really needs depends on many factors. The more RAM you allow ARC to use, the faster your ZFS storage will be.
 
Hi,

Can we expect at least the same performance in those two scenarios?

HWRaid with WriteThrough + SSD:
  • Intel Xeon E-2288G
  • 128GB RAM
  • 4 x 960GB SSD in RAID10 with hardware controller.
NO HWRaid + NVMe + ARC limited to 16GB:
  • Intel Xeon E-2288G
  • 128GB RAM
  • 2 x 1,92 TB NVMe SSD in ZFS Mirror
  • (Replication of VMs between nodes via ZFS-Replication)
From your explanations, I assume in theory the performance of ZFS should be at least the same as with HWRaid, since in HWRaid we don't have any cache enabled and we're using SSD, while in ZFS we'll switch to NVMe (faster than SSD).

The only reason of doing this change is to have replication capability, but at the same time we don't want to lose a ton of ram.

Thank you all
 
Thank you for the information, I saw a paper about CEPH but not this one about ZFS.

As far as I can see, the performance is reasonably good, taking in account that in the paper, ARC is being limited to 4GB:

Code:
# limit maximum ARC size
options zfs zfs_arc_max=4294967296

I'll try to put the NVMe disks at least on one server and do some testing.
 
I have a similar question: How much RAM do I have to dedicate to ZFS in this pools for storing SQLServer Databases:
- 10x HDD 900GB Stripped Mirror (raid10) + 2 Spare. 3,9TB Usable. 10,8TB Raw
- 10x HDD 900GB Stripped Mirror (raid10) + 2 Spare. 3,9TB Usable. 10,8TB Raw

- Compression = none;
- Secondarycache = none;
- PrimaryCache = metadata;
- RecordSize=64k;
- Logbias = throughput;
SQLServer handles it's own cache in RAM, so no primarycache needed. This case still fits the "1GB for every TB" rule? Will it be 8GB RAM (Usable storage) or 22GB (Raw Storage)?

Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!