Suggestions for optimizing total DRAM cost?

jb_wisemo · Friday at 11:24

Hello forum,

I am considering building a new production (not lab) Proxmox cluster with HA for both running VMs and the underlying storage. Given the current DRAM shortage, I have to consider how to optimize the design in terms of total DRAM purchase for the entire cluster.

I am currently considering two options:

A. Two Proxmox nodes and a third non-proxmox machine acting as tie-breaker to help the cluster choose the machine that will be responsible for production execution if the network links between the two nodes fail. In terms of DRAM, this means that each Proxmox node will need enough physical memory to run all the VMs while the other node is down or isolated, thus the DRAM per node should be (sum of all VMs + clustering overheads such as ZFS overhead), total cost is thus 2 x (sum of all VMs) + 2 x (cluster overheads)

B. Three Proxmox nodes with no tiebreaker outside the cluster. In terms of DRAM, this means that if one node fails, the two remaining machines could split the VMs between them, thus the DRAM per node should be (sum of half the VMs + clustering overheads such as ZFS overhead), total cost is thus 1.5 x (sum of all VMs) + 3 x (cluster overheads).

Option B, thus theoratically saves 25% of the VM HA memory cost, but adds 50% to the cluster overhead memory cost, and also adds the cost of a third physical machine.

Official Proxmox "system requirements" were clearly written when DRAM was cheap, suggesting, without stating reasons, that an additional 1GB RAM/TB disk be added to the clustering overheads. Question is how much this can be safely squeezed for cost, perhaps to 0.5GB/TB or 0.25GB/TB corresponding to 2 bytes/disk block or 1 byte/disk block. Fundamental issue is how much of the stated overhead must be in memory, versus how much is just cached data that can be reloaded from disk/regenerated on the fly versus how much is somehow forced to be kept in physical node RAM at all times.

Another question affecting the purchase calculation is if Proxmox HA requires complete copies of all running VM memory on the node that would take over if the running node crashes, or if Proxmox uses a mechanism that just keeps the memory snapshots on redundant disks until the moment of failover. Obviously, if Proxmox reboots VMs after their active physical node fails, then no DRAM is needed on the node that will potentially run the VM after failover. My calculations for scenario B above assume near zero physical DRAM reservation for potential failover of VMs running on other nodes, thus if each of 3 nodes use y/3 GB for VM memory each, each node needs y/2 GB memory for VMs, of which y/6 GB will just idle waiting for the arrival of HA reloaded VMs from other nodes, whereas keeping alive VM memory clones would need 2/3 * y GB, of which y/3 GB is idle VM memory clones (y/6 GB from each of the other nodes).

Impact · Friday at 12:04

PVE needs about 1.5G for the OS and its services. Some of the memory can likely be swapped out without causing issues if needed. KSM and ZRAM can help too. I'm not sure I fully understand your question but PVE does not reserve memory for a VM that doesn't exist. The requirements are not a necessity. Just recommendations to achieve the "best" results. For example ZFS' ARC is limited to 10% by default nowadays.

louie1961 · Friday at 13:32

I think the answer is less in the hardware and more in how you run your apps. Run as much stuff in docker containers as possible to reduce your memory needs.

jb_wisemo · Friday at 13:59

Impact said:
PVE needs about 1.5G for the OS and its services. Some of the memory can likely be swapped out without causing issues if needed. KSM and ZRAM can help too. I'm not sure I fully understand your question but PVE does not reserve memory for a VM that doesn't exist. The requirements are not a necessity. Just recommendations to achieve the "best" results. For example ZFS' ARC is limited to 10% by default nowadays.

The point is the totality of RAM needed for a new cluster, and if that will be smaller for 2 nodes or 3 nodes.

I was not talking about reserving memory for VMs that don't exist, but using or reserving memory for HA failover of VMs that do exist on other nodes and may need to suddenly failover to different node in the cluster than where they were running before the failure. The top level presentation of Proxmox to potential users isn't clear how HA works for VMs and thus how many resources are needed to make them actually available across single node failures.

For the "overhead" discussion, it was not about the relatively small amount of RAM used to load the OS, but the RAM used to operate the HA mechanisms for storage etc. in a fully operational HA cluster where many of the virtual machines are constantly making small changes to their state, such as processing new requests according to their purpose (for example, a forum server will process logins and posts) and/or updating log files with such trivialities as remembering when the VM was last in a fully running state (classic example: The MARK lines in syslog every 20 minutes).

Johannes S · Friday at 14:27

louie1961 said:
I think the answer is less in the hardware and more in how you run your apps. Run as much stuff in docker containers as possible to reduce your memory needs.

How does running a app in a docker inside a vm needs more RAM than running the same App inside a vm without docker?

We are not talking about lxcs here since they are not suitable for running docker, are less isolated than vms ( which might make them a hard no due to compilance or security requirements) and are not an option for non-linux workloads

Search

Search

Suggestions for optimizing total DRAM cost?

jb_wisemo

New Member

Impact

Famous Member

louie1961

Well-Known Member

jb_wisemo

New Member

Johannes S

Distinguished Member

We value your privacy