Hardware requirements advice for more than 100 VMs

thenemesis584 · Sep 24, 2019

Hello guys,

I need an advice.

Currently I have 8 computers, just regular PC-s with similar hardware specs:

I5 or I7 CPU (2GHz or 3GHz)
32GB DDR4
120GB SSD

Each of these computer is running 16 VMs with configuration:

1.5GB ram
CPU 2 sockets 2 cores
20GB drive
CentOS 6

And every CentOS is running only one Java application.

So, I would like to buy a real server that can run for about 125 VMs or containers, and i prefer HP ProLiant but don't know what CPU and RAM to use in this case
I'm still new to Linux, networking and virtualization, need one machine to replace this 8 machines

My opinion is that I need CPU with 32 or 64 cores, or maybe better 2 cpu-s with 32 cores, but now sure for RAM

adamb · Sep 24, 2019

Sounds like you overprovisioned your storage considerably as 20Gx16 is 320G. That could get you into a bad situation pretty quickly.

You definitely are going to need something with dual sockets and lots of cores. Maybe start with 12x 16G dimms for 192G of ram. Then you could expand if its a standard two socket system (24 dimms).

Your also going to need a SSD array of some sort as well if your used to them running on SSD already.

thenemesis584 · Sep 24, 2019

Sorry, every machine have 500GB SSDs, not 120GB

It's not problem in storage, CPU-s and RAM are a bit difficult for me.

I don't know how will CPU work with 100 VM-s or containers, don't even know what to use in this case, what is better, VM or container?

adamb · Sep 24, 2019

thenemesis584 said:
Sorry, every machine have 500GB SSDs, not 120GB

It's not problem in storage, CPU-s and RAM are a bit difficult for me.

I don't know how will CPU work with 100 VM-s or containers, don't even know what to use in this case, what is better, VM or container?

Sounds like your ok on the disks then.

It really depends on your workload and what your trying to do. I prefer the isolation of KVM and running my own OS, containers will use the kernel of the host so your a bit limited there.

We currently run HP DL 380 Gen9/Gen10's with 386G of ram and have roughly 60 VM's on each front end. Our workload and requirements are different, but I do think you could push a host like this to 100 VM's pretty easily depending on the workload. Don't forget to put some thought into the storage.

thenemesis584 · Sep 24, 2019

Ok, so here is more info

I have created one VM and installed CentOS 6, network is private so IP is something like 10.xxx.xxx.128
I have installed Java JRE8 and I'm running it
When I finish configuration of the VM, then I do a cloning, so every VM is same, difference is only in IP, MAC and Java app that is running in it

Now, Java application is a client, it is parsing data of devices that are connecting to it, and every connection is a thread

This is htop from one VM

and this is htop from it's host machine

t.lamprecht · Sep 24, 2019

As adamb said, it really depends on the load which depends on the application and it's load pattern.

You should first see what load types are produced, here CPU, Network and (storage) IO need to be considered.
Memory is fixed and should not be over-commited.
Regarding CPU, what do the Java Apps do? Constant computation (htop screenshot would suggest that this is not the case) or more like an occasional request/computation. If it's occasional you could over-commit CPU cores (to a certain limit), else I'd ensure that there are as much as physical CPU cores as the sum of possible running virtual machines CPU cores.

Network/IO depends again on what the App does. Does it produce network traffic? Does it needs to read and/or write from/to storage? And if, how much average bandwidth does it requires?

adamb · Sep 24, 2019

I do believe he could benefit from KSM as well? I know in some of our environments it provides huge amounts of savings on the RAM side.

psionic · Sep 24, 2019

My single server running over 100 LXCs as an example, LAN network is all 10G...

adamb · Sep 24, 2019

James Pass said:
View attachment 11924
My single server running over 100 LXCs as an example, LAN network is all 10G...

Good example, I bet he would like to know your underlying hardware.

thenemesis584 · Sep 24, 2019

adamb said:
Good example, I bet he would like to know your underlying hardware.

Yes of course

James Pass said:
View attachment 11924
My single server running over 100 LXCs as an example, LAN network is all 10G...

I need to know what kind of hardware you use what is the purpose of your LXC, are they the same or every container is different?

thenemesis584 · Sep 24, 2019

t.lamprecht said:
As adamb said, it really depends on the load which depends on the application and it's load pattern.

You should first see what load types are produced, here CPU, Network and (storage) IO need to be considered.
Memory is fixed and should not be over-commited.
Regarding CPU, what do the Java Apps do? Constant computation (htop screenshot would suggest that this is not the case) or more like an occasional request/computation. If it's occasional you could over-commit CPU cores (to a certain limit), else I'd ensure that there are as much as physical CPU cores as the sum of possible running virtual machines CPU cores.

Network/IO depends again on what the App does. Does it produce network traffic? Does it needs to read and/or write from/to storage? And if, how much average bandwidth does it requires?

Ok, i understand that, but as I said I'm new to all of this and it is very hard for me to determine what hardware I should use.

As for Java, Java's purpose is to parse data that is received from GPS devices and sends results to remote database. So, it is occasional request/computation because I don't know what device will connect and for how long that connection would last.

Here is Iotop and Iftop screenshot

ofcourse, these values are changing, but you'll understand it I hope

adamb · Sep 24, 2019

thenemesis584 said:
Ok, i understand that, but as I said I'm new to all of this and it is very hard for me to determine what hardware I should use.

As for Java, Java's purpose is to parse data that is received from GPS devices and sends results to remote database. So, it is occasional request/computation because I don't know what device will connect and for how long that connection would last.

Here is Iotop and Iftop screenshot

View attachment 11925

View attachment 11926

ofcourse, these values are changing, but you'll understand it I hope

Are you ok with having all your eggs in one basket? With your current setup, if you loose 1 host, your still operational. If you loose this new host, your 100% down.

I definitely think a standard 2 socket system with 192G of ram would be a good start. Use ram dimms big enough to hit 192G without populating all the slots so you have more room for growth.

Or you could come up with some type of shared storage and run multiple front ends for redundancy and scaling.

thenemesis584 · Sep 24, 2019

No, of course it will not be one machine, I will buy two machines, the other will be backup machine

Size of storage is not that much important, it just need to be SSD-s for speed, the size is minimal just where I can keep the logs that are not older than 6 months

adamb · Sep 24, 2019

thenemesis584 said:
No, of course it will not be one machine, I will buy two machines, the other will be backup machine

Size of storage is not that much important, it just need to be SSD-s for speed, the size is minimal just where I can keep the logs that are not older than 6 months

If your buying another server, then in my honest opinion you should consider some type of shared storage. You don't need to go full out HA. Then you can make use of both servers and have redundancy.

thenemesis584 · Sep 24, 2019

Well my plan is to buy two servers, but I it depends on the finance, for start it will be one.

When I buy another than I will set it up to work as backup machine, that is redundancy, not backup for data or something else, we just misunderstood each other

Now, just to determine what kind of hardware to buy and should I go for VM's or LXC

Since I wll have same OS, maybe is better to go with LXC

psionic · Sep 24, 2019

thenemesis584 said:
Yes of course

I need to know what kind of hardware you use what is the purpose of your LXC, are they the same or every container is different?

I'm using this supermicro server for distributed computing with several global projects. there's 10 different types of applications running, but the LXCs are all based on Ubuntu Server 18.04.

https://www.supermicro.com/en/products/chassis/2u/825/SC825TQ-R740LPB

thenemesis584 · Sep 24, 2019

James Pass said:
I'm using this supermicro server for distributed computing with several global projects. there's 10 different types of applications running, but the LXCs are all based on Ubuntu Server 18.04.

https://www.supermicro.com/en/products/chassis/2u/825/SC825TQ-R740LPB

What CPU and how many?
Size of RAM?

adamb · Sep 24, 2019

thenemesis584 said:
What CPU and how many?
Size of RAM?

He provided that in a screen shot above.

Its a 2 socket system with 48 cores and 252G of ram.

LnxBil · Sep 25, 2019

thenemesis584 said:
Size of storage is not that much important, it just need to be SSD-s for speed, the size is minimal just where I can keep the logs that are not older than 6 months

If you really clone all your containers from one, it would be great if you could use linked clones, so is the storage requirement really minimal.

Why do you need so much VMs for computation? If every machine has work to do, you are hopelessly overcommited if each of your 100 VMs has 2 CPUs. Even with two 64-core Ryzen, you have "only" 128 cores.

As it has been pointed out before, KVM has the advantage of using KSM, LXC can't, so you can work with less memory if all VMs are almost identical.

thenemesis584 · Sep 25, 2019

LnxBil said:
If you really clone all your containers from one, it would be great if you could use linked clones, so is the storage requirement really minimal.

Why do you need so much VMs for computation? If every machine has work to do, you are hopelessly overcommited if each of your 100 VMs has 2 CPUs. Even with two 64-core Ryzen, you have "only" 128 cores.

As it has been pointed out before, KVM has the advantage of using KSM, LXC can't, so you can work with less memory if all VMs are almost identical.

As I already mentioned, Java is creating thread for each established connection, take a look screenshot of htop in post #5
And because of that, it is using CPU resources, so I've limited the number of devices in java, but it never happened that all devices are connected at the same time.

But, the scenario that happens sometime is that VM stopped working, just freezes, and let say that VM is shutdown for 5-6 hours. So, when this happens, the data is on hold and accumulating. When VM is started then all the data (huge amount of data) needs to be parsed + PostGreSql data, and that is overload for CPU. In this situation CPU is 100% and machine is very slow, need some time to compute all of this data that accumulated over the time.

In near future, number of devices will be a much bigger, that means that the computation time will be bigger, so I need to prepare for that.

As I already said, my problem is not RAM, I will manage with that
I need to determine what type of CPU to use, need to decide should I go with LXC or VM

Hardware requirements advice for more than 100 VMs

New Member

Famous Member

New Member

Famous Member

New Member

Proxmox Staff Member

Famous Member

Member

Famous Member

New Member

New Member

Famous Member

New Member

Famous Member

New Member

Member

New Member

Famous Member

Distinguished Member

New Member