Hi everyone,
i am currently evaluating proxmox for a large cluster hosting 10.000 vms and above.
I read about in a forum post from 2018 that about 100.000 vms should be possible https://forum.proxmox.com/threads/maximum-number-of-vms-per-proxmox-server.47233/ but the number of nodes should be limited even though reports of 50+ node clusters seem to exist.
Currently running:
pve-manager/8.1.5
Linux 6.5.13-1-pve
In my testing with a single proxmox node i seem to have reached a limit of around 11.000 VMs before the pmxcfs file system says no more exits.
For this i created a vm with a 1GB disk as a template and created full clones via the api. All vms were off. Each vm config was around 400 bytes in size, so we shouldn´t reach any pmxcfs maximum size limits.
Meanwhile even at a couple of thousand vms the webui becomes very unresponsive. The proxmox server seems to be able to generate the resources json, which the webui parses, in about 1 second for about 10.000 hosts but parsing it in the browser takes quite a bit longer at around 20 seconds. Attached is a fIrefox profile image of a page refresh. The firefox tab is at 100% cpu load during this time.
Also the extract from journalctrl when the pve-cluster service exited at slightly above 11.000 cloned vms:
i am currently evaluating proxmox for a large cluster hosting 10.000 vms and above.
I read about in a forum post from 2018 that about 100.000 vms should be possible https://forum.proxmox.com/threads/maximum-number-of-vms-per-proxmox-server.47233/ but the number of nodes should be limited even though reports of 50+ node clusters seem to exist.
Currently running:
pve-manager/8.1.5
Linux 6.5.13-1-pve
In my testing with a single proxmox node i seem to have reached a limit of around 11.000 VMs before the pmxcfs file system says no more exits.
For this i created a vm with a 1GB disk as a template and created full clones via the api. All vms were off. Each vm config was around 400 bytes in size, so we shouldn´t reach any pmxcfs maximum size limits.
Meanwhile even at a couple of thousand vms the webui becomes very unresponsive. The proxmox server seems to be able to generate the resources json, which the webui parses, in about 1 second for about 10.000 hosts but parsing it in the browser takes quite a bit longer at around 20 seconds. Attached is a fIrefox profile image of a page refresh. The firefox tab is at 100% cpu load during this time.
Also the extract from journalctrl when the pve-cluster service exited at slightly above 11.000 cloned vms:
Mar 21 09:23:20 Proxmox-VE pmxcfs[639]: [ipcs] crit: qb_ipcs_response_send: Resource temporarily unavailable
Mar 21 09:23:20 Proxmox-VE pmxcfs[639]: [libqb] error: error receiving from setup sock (/dev/shm/qb-639-3448779-10-oXOLjo/qb): Bad file descriptor (9)
Mar 21 09:23:22 Proxmox-VE pmxcfs[639]: [ipcs] crit: qb_ipcs_response_send: Resource temporarily unavailable
Mar 21 09:23:22 Proxmox-VE pmxcfs[639]: [libqb] error: error receiving from setup sock (/dev/shm/qb-639-3448779-10-jPtylZ/qb): Bad file descriptor (9)
Mar 21 09:23:24 Proxmox-VE pvedaemon[3448779]: ipcc_send_rec[10] failed: Transport endpoint is not connected
Mar 21 09:23:29 Proxmox-VE pmxcfs[639]: [ipcs] crit: qb_ipcs_response_send: Resource temporarily unavailable
Mar 21 09:23:29 Proxmox-VE pmxcfs[639]: [libqb] error: error receiving from setup sock (/dev/shm/qb-639-3437793-16-40SH9q/qb): Bad file descriptor (9)
Mar 21 09:23:29 Proxmox-VE pmxcfs[639]: [libqb] error: ref:0 state:3 (/dev/shm/qb-639-3437793-16-40SH9q/qb)
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Main process exited, code=killed, status=6/ABRT
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Failed with result 'signal'.
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Consumed 35min 58.924s CPU time.
Mar 21 09:23:29 Proxmox-VE systemd[1]: corosync.service - Corosync Cluster Engine was skipped because of an unmet condition check (ConditionPathExists=/etc/corosync/corosync.conf).
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 1.
Mar 21 09:23:29 Proxmox-VE systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Consumed 35min 58.924s CPU time.
Mar 21 09:23:29 Proxmox-VE systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: fuse: failed to access mountpoint /etc/pve: Transport endpoint is not connected
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: [main] crit: fuse_mount error: Transport endpoint is not connected
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: [main] crit: fuse_mount error: Transport endpoint is not connected
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: [main] notice: exit proxmox configuration filesystem (-1)
Mar 21 09:23:29 Proxmox-VE pmxcfs[3463821]: [main] notice: exit proxmox configuration filesystem (-1)
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Mar 21 09:23:29 Proxmox-VE systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Mar 21 09:23:29 Proxmox-VE systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.