Webfrontend not usable with many simultanous users

udo

Distinguished Member
Apr 22, 2009
5,975
196
163
Ahrensburg; Germany
Hi,
during this weekend I give an pve-workshop on the chemnitzer linux tage.
The user limit for the workshop should be 16, but there are 30 people in the session.

Minor issues:
The webgui often stay on the selected VM - an reload helps in this case...
The performance graphs don't updated reliable...

Major issue:
After a while some user can't work with pve (we install 16 pve-hosts on one physical pve-node) because their console-window (NoVNC) disconnect after few seconds, reconnect and diconnect again and again.

First, I believe it has something to do with the server (an older HP DL380 G6 with 72GB Ram), but after the session, one teacher of an professional school ask me about the same issue: If there are more than 10-20 User, they can't realy work with the VM-console (they creates VM and networks).
And this is an pitty - normaly they should work with pve and appreciated how nice this pice of software is - in this case they say: pve is shit, you can't work with this...

The pveversion on the server was from the actual enterprise repo.

Is it possible to tune the webgui? Is it an limit of free vnc-ports?

Should I open an bug-report?


Udo
 
should not be an issue of free ports (the range allows up to 100 spice/vnc connections each). could you file an issue?
 
just some napkin math:
generally the platform is already 7 years old!

CPU:
you have 16 virtual pve instances
with at least 2 cores i guess?
then 32 thread on a machine with maximum a single six core with ht = 12c thread (2.6 times over commitment)
which then gets maybe overcommitted in the pve instances again

RAM:
with each at least say 4GB ram?
so you need at least 4*16 = 64Gb (which ok you have)

Network:
then i guess you have a single gigabit connection from the server
a single windows 10 vm needs at average about 5mbit (and 100 at peak) for a novnc connection (in 1920x1080 resolution) so lets downplay that an say 4 mbit * 30 = 120 mbit only for novnc at average
and 3Gbit peak (if all start a windows vm at the same time)

(not calculated are the 30 open webinterfaces which each consume bandwitdth for the api calls)
also, are the users connected via wifi or cable?

Storage:
you did not say, but even if you have 8 hdds in an raid 10 in there, you will have to few iops (~150 per disk * 8 = 1200 / 16 => 75 per pve instance ) or do you have ssds in there?

so regardless if this 'works', this sounds like it is way underpowered for this scenario

i just tested here for example:
(i7 6700k, 32GB ram, crucial mx300 ssd) 3 vms running, 2 containers, opened the webinterface about 15 times and about 25 novnc consoles (linux vms/host mixed), all running htop/nload/etc. worked without problems
(ok i sit directly on the box, so network is not an issue)
 
Hi Dominik,
just some napkin math:
generally the platform is already 7 years old!
yes I know - it's an test system and in this case for an workshop.
And, like I mentioned in the first post, other people have the same with much better hardware (more and newer nodes).
CPU:
you have 16 virtual pve instances
with at least 2 cores i guess?
then 32 thread on a machine with maximum a single six core with ht = 12c thread (2.6 times over commitment)
which then gets maybe overcommitted in the pve instances again
the cpu-load was not low, but it's look not, that this was an problem.
RAM:
with each at least say 4GB ram?
so you need at least 4*16 = 64Gb (which ok you have)
Mem use was with ksm below 60%.
Network:
then i guess you have a single gigabit connection from the server
a single windows 10 vm needs at average about 5mbit (and 100 at peak) for a novnc connection (in 1920x1080 resolution) so lets downplay that an say 4 mbit * 30 = 120 mbit only for novnc at average
and 3Gbit peak (if all start a windows vm at the same time)

(not calculated are the 30 open webinterfaces which each consume bandwitdth for the api calls)
also, are the users connected via wifi or cable?
The NoVNC-Console are used for the virtual pve-installation only - no Win10 with higher resolution.
But the connection used wireless lan - so perhaps there is an issue.

But here is the same point like before - the other people had the same issue in an school-lab (i assume no wlan).
Storage:
you did not say, but even if you have 8 hdds in an raid 10 in there, you will have to few iops (~150 per disk * 8 = 1200 / 16 => 75 per pve instance ) or do you have ssds in there?

so regardless if this 'works', this sounds like it is way underpowered for this scenario
Of course was the server underpowered (IO), but this explained not, why some user has trouble with the vnc-console (so that they can't realy work) and other not. If it has something to do with the network (wifi) bandwith should not many poeple have reconnecting consoles?
i just tested here for example:
(i7 6700k, 32GB ram, crucial mx300 ssd) 3 vms running, 2 containers, opened the webinterface about 15 times and about 25 novnc consoles (linux vms/host mixed), all running htop/nload/etc. worked without problems
(ok i sit directly on the box, so network is not an issue)
sorry, that isn't the point. The point is: many simultanous user - not one user with many consoles!
(not so easy to test).

Udo
 
sorry, that isn't the point. The point is: many simultanous user - not one user with many consoles!
(not so easy to test).

So far we are unable to reproduce. Can you reproduce that somehow? If so, how exactly?
 
ok, fair point about utilizations (except when io crawls, everything will be slow and prone to timeouts)

But the connection used wireless lan - so perhaps there is an issue.
since i do not know what wireless access point you use i can only guess, but in my experience, having 30 people on wifi with all stable speeds/connections requires a very beefy access point (enterprise level gear)

But here is the same point like before - the other people had the same issue in an school-lab (i assume no wlan).
why do you assume no wlan there, my experience is rather the reverse, in a school/university i would expect having each person with their own laptop on wifi

sorry, that isn't the point. The point is: many simultanous user - not one user with many consoles!
technically it is not very different to have 1 client having 30 connections open to 30 clients having each 1 connection open, with the exception of the load on the network gear, which i already wrote about

I can try to reproduce that (perhaps next weekend), if I collect all my computer as clients together...
that would be great if you can try to reproduce that

Is there an debug-option to see what happens for pve-proxy/NoVNC?
for pveproxy you can start it in debug mode (needs to run in foreground)
pveproxy stop
pveproxy start -debug

for novnc you would have to edit the browser local storage variable 'logging' to either 'debug' or 'info', then there should be more messages in the javascript console
 
I will update this thread when I have tried to reproduce the issue.
any news on this? i would really like to see if we have a problem with many clients
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!