How to achieve distribution of a cohort of vms across more than one host?

wolfspyre

Member
Jan 6, 2023
20
4
8
48
Austin, TX
wolfspyre.com
howdy all!

I've dug thru the forums here, but haven't found much that speaks directly to my challenge.

I have a 6 node proxmox cluster.
4x dell r730xd
2x dell r720xd (in the process of moving to 730's... but been slow in migration)
I have a few pools of HA VMs ...
ie
a pair of webservers...
a triad of ci runners...
a pair of VMs running HA mysql
a pair of VMs running ha postgres
a pair of VMs running ha redis

what I'm trying to determine is how to express to proxmox that I would like to inject an aversion to schedule multiple vms of the same pool on the same physical host, to achieve wider resource distribution.

I understand I can associate a weight with a phyiscal host to bias a pool towards, or away from that specific hardware...

but I don't see a mechanism to express how densely or loosely to leverage the hardware available to the pool.

there are certainly scenarios where binpacking vms into a tighter hardware footprint is preferential

likewise there are scenarios where one would prefer to have the VMs inclined to live on different hosts, but are not precluded from cohabitation if necessary...

how can I tell the proxmox ha scheduler:

in a perfect world I would like the VM members of pool A to be deployed on divergent hosts.
however, if necessary, they **MAY** cohabitate.
 
in a perfect world I would like the VM members of pool A to be deployed on divergent hosts.
however, if necessary, they **MAY** cohabitate.
Don't really understand your problem as you already said you can associate a weight with a phyiscal host to bias the vm's.
We have a ha group to each node with prio set to number of nodes, eg prio 6 to node groupname itself. The other hosts (5) can get staggered prio down for second, maybe third and last run prio. So if you generate a vm or have your vm's just give each with the ha group a "homehost" and your your perfect world is reached (which we do so also).
 
Don't really understand your problem as you already said you can associate a weight with a phyiscal host to bias the vm's.
We have a ha group to each node with prio set to number of nodes, eg prio 6 to node groupname itself. The other hosts (5) can get staggered prio down for second, maybe third and last run prio. So if you generate a vm or have your vm's just give each with the ha group a "homehost" and your your perfect world is reached (which we do so also).
Heya @waltar
1) THANK YOU for your perspective and thoughts !!!!
seriously.... I really appreciate it.


2) That idea makes for weird groupings when you have services ...
A (these three VMs are contextually co-related)
B (these two VMs are co-related) ....
C (these four VMs are co-related) ....

A and B are related.
I want A to be as wide as possible, as each VM consumes a lot of resouces.
I want B VMs to cohabitate with an A VM when possible

C has nothing to do with A or B..

I do not want more than one A VM on the same node when possible.
I do not want more than one C VM on a node when possible.

I thought (PERHAPS MISTAKENLY) that an HA group was a 'purpose' construct, not a locality construct ...
It felt like the wrong tool to use for conveying physical hardware affinity ....
am I just totally misunderstanding the construct?
 
Last edited:
You are right as the ha grouping tool isn't perfect but normally still a ok working point.
In a 6 node pve cluster upt to 2 nodes can fail while having quorum >50%.
vm to ha groups looks easy for vm's A and B requirements but impossible for vm's C.
Would be easy with 7th node or you need a script permanently running which check if 2 nodes fail you have to rebias the ha grp of 1 vm of type C.
But on the other hand your wish restrictions are little bit funny as the cluster should run all vm's as ha services and so at the end it doesn't make sense to a user from where it's served.
 
You are right as the ha grouping tool isn't perfect but normally still a ok working point.
In a 6 node pve cluster upt to 2 nodes can fail while having quorum >50%.
vm to ha groups looks easy for vm's A and B requirements but impossible for vm's C.
Would be easy with 7th node or you need a script permanently running which check if 2 nodes fail you have to rebias the ha grp of 1 vm of type C.
But on the other hand your wish restrictions are little bit funny as the cluster should run all vm's as ha services and so at the end it doesn't make sense to a user from where it's served.
I disagree....
There are services which are resource intensive, which can cause problems for cohabitation where your services to experience noisy neighbor problems ...

There are services which have a benefit to being on the same hardware ....

Not all components have the same affinity factors...

Setting an affinity within an HA group to bias slightly to server A vs server B will often result in all of the services of that group residing on server A ....

Having the ability to additionally express a cohabitation penalty for an ha group ....

....or perhaps being able to create an HA group OF other HA groups ... which could allow one to essentially stack the resulting weights... thus allowing for a more elegant declaration of workload distribution ...
 
So you want 1 A VM and 1 B VM to be spread on a different host (because A1 and A2 has insufficient resources if on the same host and A1 and B1 require each other to be as near as possible).

So why don’t you just group them as such, make an AB1 group with A1 and B1, AB2 group with A2 and B2 etc and assign them to a particular host (or pair of hosts), as long as you don’t set the group to reserved, when ‘disaster’ happens and your host goes down, your VMs will boot on ‘a’ random node not in the HA group, you can set the HA scheduler to re-migrate after the preferred host comes back. You can do the same with a C VM, either put them in individual HA groups with a single node (or a single node with a particular backup), group them with an AB, however you want to organize that.

Given the small amount of hosts, I wouldn’t plan for more than 1 host to go down at the same time, so you can reserve 1 host (leave it empty) and add it to every HA group with a lower priority. When ‘a’ host goes down, the group on that 1 host migrates there completely and still doesn’t impact anything until 2 hosts go down, then you potentially have issues where you overload the backup resource, if the backup resource is the second host that goes down, VMs are allocated to other hosts (that’s where you can then calculate a percentage on RAM and use memory ballooning)
 
Last edited:
So you want 1 A VM and 1 B VM to be spread on a different host (because A1 and A2 has insufficient resources if on the same host and A1 and B1 require each other to be as near as possible).

So why don’t you just group them as such, make an AB1 group with A1 and B1, AB2 group with A2 and B2 etc and assign them to a particular host (or pair of hosts), as long as you don’t set the group to reserved, when ‘disaster’ happens and your host goes down, your VMs will boot on ‘a’ random node not in the HA group, you can set the HA scheduler to re-migrate after the preferred host comes back. You can do the same with a C VM, either put them in individual HA groups with a single node (or a single node with a particular backup), group them with an AB, however you want to organize that.

Given the small amount of hosts, I wouldn’t plan for more than 1 host to go down at the same time, so you can reserve 1 host (leave it empty) and add it to every HA group with a lower priority. When ‘a’ host goes down, the group on that 1 host migrates there completely and still doesn’t impact anything until 2 hosts go down, then you potentially have issues where you overload the backup resource, if the backup resource is the second host that goes down, VMs are allocated to other hosts (that’s where you can then calculate a percentage on RAM and use memory ballooning)

First, Thanks a bunch for your insight and expertise.... let me try and explain a bit better

I have a cohort of VMs ... 'generic webservers' (deployed workloads)
I have a 'gitlab vm' and a cohort of vms ... 'gitlab runners'
I have a cohort of VMs for doing CPU inference tasks
I have a cohort of VMs doing logging and monitoring.
I have a cohort of VMs which each stay on a particular host for ephemeral stateless work, balanced by haproxy
I have a few image serving hosts.

I have a matrix rig
I have a mastodon rig
I have a pixelfed rig
I have a peertube rig
(intention is to eventually ha-ify these once I get a stronger grok on how best to manage their needs...)

They each sorta have their own usage patterns....

I wanted to be able to generally associate bubbles of vms which do a thing as a group ...

bubbles of vms that depend on each other, and are coalesced together in the effort of delivering a particular service ...

and a bubble of hardware that services these groups
and somehow express a desired a distribution pattern that allowed one to express that this grouping of VMs all have a consistent set of access privilege and importance... but also that not all of them should necessarily be eager to pack themselves densely onto a particular host...

it felt to me like the grouping / ha-ness is a bit lacking in the ability to express desired width of a group ....

if I could group (all ATYPE VMs) and (all B type VMs) while ALSO grouping (AB1 as a1+b1) ..... well....
I'll make up a terrible breakdown for elucidation

A1 is a member of RoleGroup_A "webservers" and SVCGroup_A "somesite.com"
A2 is a member of RoleGroup_A "webservers" and SVCGroup_F "othersite.net"
A3 is a member of RoleGroup_A "webservers" and SVCGroupSVC_G "INTERNALTHING"
A4 is a member of RoleGroup_A "webservers" and SVCGroupSVC_Z "snowflake"
B1 is a member of RoleGroup_B "DBHosts" and SVCGroup_A "somesite.com" and affinityGroup_C "expensive_resource_hogs"
B2 is a member of RoleGroup_B "DBHosts" and SVCGroup_F "othersite.net" and affinityGroup_C "expensive_resource_hogs"
B3 is a member of RoleGroup_B "DBHosts" and SVCGroup_G "INTERNALTHING" and affinityGroup_C "expensive_resource_hogs"
C1 is a member of RoleGroup_C "Search" and affinityGroup_C "expensive_resource_hogs"
C2 is a member of RoleGroup_C "Search" and affinityGroup_C "expensive_resource_hogs"
C3 is a member of RoleGroup_C "Search" and affinityGroup_C "expensive_resource_hogs"
D1 is a member of SVCGroup_D "CodeManagement" and affinityGroup_C "expensive_resource_hogs"
E1 is a member of SVCGroup_D "CodeManagement" and RoleGroup_H "Builders" and affinityGroup_E "Runners"
E2 is a member of SVCGroup_D "CodeManagement" and RoleGroup_H "Builders" and affinityGroup_E "Runners"
E3 is a member of SVCGroup_D "CodeManagement" and RoleGroup_H "Builders" and affinityGroup_E "Runners"
E4 is a member of SVCGroup_D "CodeManagement" and RoleGroup_H "Builders" and affinityGroup_E "Runners"

These groups are multidimensional... expressing purpose, function, and density... (albeit somewhat poorly... came up with this on the spot)
I don't know how to express these dimensions within the constructs of Proxmox's ha group constructs ....

Am I just thinking about the groups the wrong way?
 
Last edited: