What kind of design should I make for Proxmox?

adarguner · Apr 13, 2025

Dear Friends,

I am planning to do a project. I want to use Proxmox as a virtualization technology. I want to design HCI. I will make an HCI design with Ceph with 3 servers. I will make this design for a medium-sized company. For example;

This is a factory.
40 virtual machines are running.
There are 500 clients
ERP software (Oracle or SAP) is running.
There is no big budget.

Should I choose Intel or AMD server microprocessor as CPU, which one do you recommend? Which one is more efficient and less problematic in Proxmox?

Are SAS (10K) disks sufficient or should I choose NVME disk as HCI?

If one of the 3 servers breaks down and the capacity is sufficient (2 servers can meet the capacity of the 3rd server), will Ceph cause problems?

Is 25G port required between these 3 servers or is 10G port sufficient?
(maybe 5Gbps traffic will occur)

Thank you for your help.

Regards

leesteken · Apr 13, 2025

adarguner said:
If one of the 3 servers breaks down and the capacity is sufficient (2 servers can meet the capacity of the 3rd server), will Ceph cause problems?

Probably yes: https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/

FrankList80 · Apr 13, 2025

Please be aware of one point: sadly Proxmox is NOT A SAP supported Hypervisor. Here are the supported ones. I hope someone could put enough pressure on SAP to change this. If you meant the two vendors just as an ERP-example this might differ of course.

adarguner · Apr 13, 2025

FrankList80 said:
Please be aware of one point: sadly Proxmox is NOT A SAP supported Hypervisor. Here are the supported ones. I hope someone could put enough pressure on SAP to change this. If you meant the two vendors just as an ERP-example this might differ of course.

Does SAP impose a limitation on proxmox? According to the link you provided, HyperV, Citrix XEN and others are not available. So does SAP not support them either?

adarguner · Apr 13, 2025

leesteken said:
Probably yes: https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/

Thank you for the link you provided. However, the recommended Ceph OSD capacities cannot exceed 30% for 3 nodes. Thus, the other two server nodes carry the disk OSDs of the 3rd node.

FrankList80 · Apr 13, 2025

adarguner said:
Does SAP impose a limitation on proxmox? According to the link you provided, HyperV, Citrix XEN and others are not available. So does SAP not support them either?

This is correct. They do not support them. SLES with KVM is supported with defined versions. SAP itself is pushing their clients towards SAP-aaS, so not love from them towards a new KVM-Distro like e.g. Proxmox. The internet finds 1x Reference towards SAP and Proxmox - might be that its SAP ECC or the old "Business One". I do not know. You can utilize ANYTHING for DEV or TEST, but SAP will deny and Tickets and Support for Not Supported Hypervisors-Distros either on PROD, TEST OR DEV (Or INT).

alexskysilk · Apr 13, 2025

adarguner said:
Thank you for the link you provided. However, the recommended Ceph OSD capacities cannot exceed 30% for 3 nodes. Thus, the other two server nodes carry the disk OSDs of the 3rd node.

I have no idea what is meant by this.

you need a MINIMUM of 3 osd nodes. the USABLE CAPACITY will be 1/3 of the deployed disk capacity. The reason 3 nodes can be insufficient for optimal operation is because there is no target for rebalance/self heal in case of a node failure, so you really want AT LEAST 4 nodes. There is also the matter of performance- 3 nodes means that all IO hits the same nodes limiting the subsystem's capacity to respond to requests. More nodes=more performance.

On that note, make sure to match your network design to account for the needs of your cluster, ceph private traffic, ceph public traffic, and your vm traffic.

FrankList80 said:
Please be aware of one point: sadly Proxmox is NOT A SAP supported Hypervisor

I dont know if this matters meaningfully, since RHEV , SLES, and Huawei's hypervisors are official supported (all KVM.) As long as you're not opening a ticket for pve specific issue I dont think it'll invalidate your maintenance contract- but SAP being SAP I'd be careful not to even mention it

adarguner · Apr 13, 2025

alexskysilk said:
I have no idea what is meant by this.

you need a MINIMUM of 3 osd nodes. the USABLE CAPACITY will be 1/3 of the deployed disk capacity. The reason 3 nodes can be insufficient for optimal operation is because there is no target for rebalance/self heal in case of a node failure, so you really want AT LEAST 4 nodes. There is also the matter of performance- 3 nodes means that all IO hits the same nodes limiting the subsystem's capacity to respond to requests. More nodes=more performance.

On that note, make sure to match your network design to account for the needs of your cluster, ceph private traffic, ceph public traffic, and your vm traffic.

I dont know if this matters meaningfully, since RHEV , SLES, and Huawei's hypervisors are official supported (all KVM.) As long as you're not opening a ticket for pve specific issue I dont think it'll invalidate your maintenance contract- but SAP being SAP I'd be careful not to even mention it

I don't understand this. Why are 4. nodes requested? Shouldn't the capacity not be more than 25% when there are 4 nodes? Wouldn't the OSD disk capacity usage be less this time? What I mean is, if it is 30% for each node for 3 nodes, doesn't it mean a 25% capacity disk and OSD limitation for 4 nodes? After all, shouldn't disk space be allocated in other nodes for each node?

Or should I understand this.. Is the 4th node only used for disk space?

alexskysilk · Apr 13, 2025

adarguner said:
I don't understand this.

start reading

https://docs.ceph.com/en/latest/start/beginners-guide/

adarguner said:
Why are 4. nodes requested?

when you read the above it will start making sense.

adarguner said:
houldn't the capacity not be more than 25% when there are 4 nodes? Wouldn't the OSD disk capacity usage be less this time? What I mean is, if it is 30% for each node for 3 nodes, doesn't it mean a 25% capacity disk and OSD limitation for 4 nodes? After all, shouldn't disk space be allocated in other nodes for each node?

No. storage utilization follows your deployed crush rules, which will dictate HOW data is written to disks, no matter how many disks or nodes you deploy. whats cool is that this is defined PER POOL which means you can have pools with different crush rules using the same disks at the same time. The most common rule for rbd use (which is what you will be deploying) is replication:3, which means each write is sent to three seperate osds (disk.)

adarguner · Apr 13, 2025

alexskysilk said:
start reading https://docs.ceph.com/en/latest/start/beginners-guide/

when you read the above it will start making sense.

No. storage utilization follows your deployed crush rules, which will dictate HOW data is written to disks, no matter how many disks or nodes you deploy. whats cool is that this is defined PER POOL which means you can have pools with different crush rules using the same disks at the same time. The most common rule for rbd use (which is what you will be deploying) is replication:3, which means each write is sent to three seperate osds (disk.)

I apologize. But I asked the AI to verify. It gave me a sample table like the one below. Don't you think it's the right choice and calculation for 3 nodes?

Number of servers 3
Number of disks 3 (1 x 12 TB on each server)
Total physical space 36 TB
Usable space 12 TB (3x replication)
Replication 3x

Commands:
ceph-volume lvm create --data /dev/sdb
ceph osd pool create vm-pool 128
ceph osd pool set vm-pool size 3
ceph osd pool set vm-pool min_size 2

pveceph pool create vm-pool --size 3

/etc/pve/storage.cfg->
rbd: ceph-vm-storage
pool vm-pool
content images,rootdir
krbd 0

Durability No data loss even if 1 server / disk is lost

alexskysilk · Apr 13, 2025

adarguner said:
Don't you think it's the right choice and calculation for 3 nodes?

it is (calculation; choice is a separate matter since the number of nodes is an arbitrary number you chose.)

adarguner said:
Durability No data loss even if 1 server / disk is lost

Correct, but you're not considering the consequences in their totality. Ceph is designed to maintain its full resilience even in the event of a failure. if an OSD fails, it will AUTOMATICALLY redeploy the contents of that OSD to other surviving OSDs in the pool as long as crush rules are maintained.

If a node fails but there are not sufficient survivors to rebuild, the subsystem will not be able to self heal and will remain degraded (read: not functioning with full high availability.) You may be ok with this- it really depends on your goals.

non sequitur- be VERY CAREFUL making decisions based on generative AI. there is no guarantee that what they generate is not an outright hallucination. if it were me, and I'm planning on deploying a system which has substantial costs associated as well as providing services my business depends on, I'd be sure to understand my decisions fully on my own.

adarguner · Apr 13, 2025

alexskysilk said:
it is (calculation; choice is a separate matter since the number of nodes is an arbitrary number you chose.)

Correct, but you're not considering the consequences in their totality. Ceph is designed to maintain its full resilience even in the event of a failure. if an OSD fails, it will AUTOMATICALLY redeploy the contents of that OSD to other surviving OSDs in the pool as long as crush rules are maintained.

If a node fails but there are not sufficient survivors to rebuild, the subsystem will not be able to self heal and will remain degraded (read: not functioning with full high availability.) You may be ok with this- it really depends on your goals.

non sequitur- be VERY CAREFUL making decisions based on generative AI. there is no guarantee that what they generate is not an outright hallucination. if it were me, and I'm planning on deploying a system which has substantial costs associated as well as providing services my business depends on, I'd be sure to understand my decisions fully on my own.

Thanks for the information. I will do some more research. I will pay attention to your advice. I think it is better to calculate as "Total OSD (disk capacity) / Node number". Or is your calculation in the other forum more accurate? max capacity= (number of Nodes-1)*(osd capacity/node *.08) / nodes.

alexskysilk · Apr 13, 2025

adarguner said:
I think it is better to calculate as "Total OSD (disk capacity) / Node number". Or is your calculation in the other forum more accurate? max capacity= (number of Nodes-1)*(osd capacity/node *.08) / nodes.

at the risk of questioning your reading comprehension, I'll restate:

alexskysilk said:
storage utilization follows your deployed crush rules, which will dictate HOW data is written to disks, no matter how many disks or nodes you deploy.

number of nodes is irrelevant for usable capacity ratio calculation.

Search

Search

What kind of design should I make for Proxmox?

adarguner

New Member

leesteken

Distinguished Member

FrankList80

Member

adarguner

New Member

adarguner

New Member

FrankList80

Member

alexskysilk

Distinguished Member

adarguner

New Member

alexskysilk

Distinguished Member

adarguner

New Member

alexskysilk

Distinguished Member

adarguner

New Member

alexskysilk

Distinguished Member

We value your privacy