Scaling for large infrastructures

r3dsn0w

New Member
Oct 21, 2022
2
0
1
Hi everyone,

i am currently evaluating proxmox for a large cluster hosting 10.000 vms and above.
I read about in a forum post from 2018 that about 100.000 vms should be possible https://forum.proxmox.com/threads/maximum-number-of-vms-per-proxmox-server.47233/ but the number of nodes should be limited even though reports of 50+ node clusters seem to exist.

Currently running:
pve-manager/8.1.5
Linux 6.5.13-1-pve

In my testing with a single proxmox node i seem to have reached a limit of around 11.000 VMs before the pmxcfs file system says no more exits.
For this i created a vm with a 1GB disk as a template and created full clones via the api. All vms were off. Each vm config was around 400 bytes in size, so we shouldn´t reach any pmxcfs maximum size limits.
Meanwhile even at a couple of thousand vms the webui becomes very unresponsive. The proxmox server seems to be able to generate the resources json, which the webui parses, in about 1 second for about 10.000 hosts but parsing it in the browser takes quite a bit longer at around 20 seconds. Attached is a fIrefox profile image of a page refresh. The firefox tab is at 100% cpu load during this time.

Also the extract from journalctrl when the pve-cluster service exited at slightly above 11.000 cloned vms:

Mar 21 09:23:20 Proxmox-VE pmxcfs[639]: [ipcs] crit: qb_ipcs_response_send: Resource temporarily unavailable
Mar 21 09:23:20 Proxmox-VE pmxcfs[639]: [libqb] error: error receiving from setup sock (/dev/shm/qb-639-3448779-10-oXOLjo/qb): Bad file descriptor (9)
Mar 21 09:23:22 Proxmox-VE pmxcfs[639]: [ipcs] crit: qb_ipcs_response_send: Resource temporarily unavailable
Mar 21 09:23:22 Proxmox-VE pmxcfs[639]: [libqb] error: error receiving from setup sock (/dev/shm/qb-639-3448779-10-jPtylZ/qb): Bad file descriptor (9)
Mar 21 09:23:24 Proxmox-VE pvedaemon[3448779]: ipcc_send_rec[10] failed: Transport endpoint is not connected
Mar 21 09:23:29 Proxmox-VE pmxcfs[639]: [ipcs] crit: qb_ipcs_response_send: Resource temporarily unavailable
Mar 21 09:23:29 Proxmox-VE pmxcfs[639]: [libqb] error: error receiving from setup sock (/dev/shm/qb-639-3437793-16-40SH9q/qb): Bad file descriptor (9)
Mar 21 09:23:29 Proxmox-VE pmxcfs[639]: [libqb] error: ref:0 state:3 (/dev/shm/qb-639-3437793-16-40SH9q/qb)
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Main process exited, code=killed, status=6/ABRT
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Failed with result 'signal'.
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Consumed 35min 58.924s CPU time.
Mar 21 09:23:29 Proxmox-VE systemd[1]: corosync.service - Corosync Cluster Engine was skipped because of an unmet condition check (ConditionPathExists=/etc/corosync/corosync.conf).
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 1.
Mar 21 09:23:29 Proxmox-VE systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Consumed 35min 58.924s CPU time.
Mar 21 09:23:29 Proxmox-VE systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: fuse: failed to access mountpoint /etc/pve: Transport endpoint is not connected
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: [main] crit: fuse_mount error: Transport endpoint is not connected
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: [main] crit: fuse_mount error: Transport endpoint is not connected
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: [main] notice: exit proxmox configuration filesystem (-1)
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: [main] notice: exit proxmox configuration filesystem (-1)
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Mar 21 09:23:29 Proxmox-VE systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
 

Attachments

  • Screenshot 2024-03-22 093716.png
    Screenshot 2024-03-22 093716.png
    92.2 KB · Views: 16
  • Like
Reactions: Kingneutron
Nice to hear there is something in the development pipeline to suit the need of larger infrastructures. Is there an timeframe when we can expect the new Multi Datacenter Management plane to be available? Our current solution would also be to keep each individual cluster to a reasonable size.
 
no concrete ETA yet, unfortunately.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!