Hi all!
We are in the planning phase for a Proxmox-VE-cluster to expand our current IT-services at our institute. The main goal for now is to provide a basic programming environment and a web-based User-Interface for up to 100 concurrent users (teachers, students, institute staff) at times. User software is subject to discussion for now, but tests are going on with jupyterlab and sandstorm on ubuntu-20.04-LTS-server VMs, which could make a very nice combination with user-platform independence.
The hardware setup currently considered is the following (not yet decided specs given as spans):
+ COMPUTING
++ 2 'application nodes' (1 active/ 1 backup), 64-128 threads, 512-1TB RAM, SAS-SSD-RAID1 256GB for proxmox, 6 NICs (4x 1Gb, 2x 10Gb RJ45), LOCAL RAID for images (?)
++ 7 'storage nodes' CEPH, 16 threads, 16GB RAM, SAS-SSD-RAID1 256GB for proxmox, 4x2TB SATA-SSDs for the CEPH-OSDs, 6 NICs (4x 1Gb, 2x 10Gb RJ45)
+ NETWORKING
++ 1Gb Switch 20x RJ45 (Proxmox Control Network webinterface/ssh)
++ 1Gb Switch 20x RJ45 (Proxmox corosync 1)
++ 1Gb Switch 20x RJ45 (Proxmox corosync 2)
++ 10Gb Switch 20x RJ45 (Proxmox CEPH public network)
++ 10Gb Switch 20x RJ45 (Proxmox CEPH cluster network)
The model we found here (https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster) is highly attractive to us, but we (for now) do not have the hardware to test the concept, that's why we want to discuss the concept here in the Proxmox community, and see what you people think!
That said, we already run another system with 6 CEPH nodes with a safe storage of 100TB on bare metal nodes. The above will yield much lower (16TB at a replication of 3), but should be enough for our use case (it's calculated to provide 100GB/user max, plus 6TB for images, isos and snapshots (possibly split up into CEPH pools).
The above link describes the construction to be capable of running the VM-images straight from CEPH-storage, what is your experience with that? From an admin's point of view this is very tempting for VM-migration and storage overview -- what about performance compared to placing the VMs onto, say, a separate (local) SSD-RAID (SATA or SAS, Nvme is out of financial reach [that's the use of 'unclustered' Proxmox-VEs we have most experience with and is very good])?
The other big question is VM-distribution/count. We currently look at two models - two separate big VMs (like 20 threads, 256GB RAM for jupyter and sandstorm each), a couple of small ones (1 thread, 512M RAM) for administrative services like LDAP. This could be the easiest way to maintain the system but brings some 'security' issues as we will at least partly have to expose a shell to offer a 'real experience'. The other way would be to provide a small (like 1 threads, 8GB RAM) VM to each user, which eliminates the one-person-can-freeze-the-system part, but creates quite some extra-work in the areas DNS- and authentication-management (also, more than 2 application nodes would then be necessary). What do you think?
I'm looking forward to your input/thoughts/criticism!
Best regards,
Daniel
We are in the planning phase for a Proxmox-VE-cluster to expand our current IT-services at our institute. The main goal for now is to provide a basic programming environment and a web-based User-Interface for up to 100 concurrent users (teachers, students, institute staff) at times. User software is subject to discussion for now, but tests are going on with jupyterlab and sandstorm on ubuntu-20.04-LTS-server VMs, which could make a very nice combination with user-platform independence.
The hardware setup currently considered is the following (not yet decided specs given as spans):
+ COMPUTING
++ 2 'application nodes' (1 active/ 1 backup), 64-128 threads, 512-1TB RAM, SAS-SSD-RAID1 256GB for proxmox, 6 NICs (4x 1Gb, 2x 10Gb RJ45), LOCAL RAID for images (?)
++ 7 'storage nodes' CEPH, 16 threads, 16GB RAM, SAS-SSD-RAID1 256GB for proxmox, 4x2TB SATA-SSDs for the CEPH-OSDs, 6 NICs (4x 1Gb, 2x 10Gb RJ45)
+ NETWORKING
++ 1Gb Switch 20x RJ45 (Proxmox Control Network webinterface/ssh)
++ 1Gb Switch 20x RJ45 (Proxmox corosync 1)
++ 1Gb Switch 20x RJ45 (Proxmox corosync 2)
++ 10Gb Switch 20x RJ45 (Proxmox CEPH public network)
++ 10Gb Switch 20x RJ45 (Proxmox CEPH cluster network)
The model we found here (https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster) is highly attractive to us, but we (for now) do not have the hardware to test the concept, that's why we want to discuss the concept here in the Proxmox community, and see what you people think!
That said, we already run another system with 6 CEPH nodes with a safe storage of 100TB on bare metal nodes. The above will yield much lower (16TB at a replication of 3), but should be enough for our use case (it's calculated to provide 100GB/user max, plus 6TB for images, isos and snapshots (possibly split up into CEPH pools).
The above link describes the construction to be capable of running the VM-images straight from CEPH-storage, what is your experience with that? From an admin's point of view this is very tempting for VM-migration and storage overview -- what about performance compared to placing the VMs onto, say, a separate (local) SSD-RAID (SATA or SAS, Nvme is out of financial reach [that's the use of 'unclustered' Proxmox-VEs we have most experience with and is very good])?
The other big question is VM-distribution/count. We currently look at two models - two separate big VMs (like 20 threads, 256GB RAM for jupyter and sandstorm each), a couple of small ones (1 thread, 512M RAM) for administrative services like LDAP. This could be the easiest way to maintain the system but brings some 'security' issues as we will at least partly have to expose a shell to offer a 'real experience'. The other way would be to provide a small (like 1 threads, 8GB RAM) VM to each user, which eliminates the one-person-can-freeze-the-system part, but creates quite some extra-work in the areas DNS- and authentication-management (also, more than 2 application nodes would then be necessary). What do you think?
I'm looking forward to your input/thoughts/criticism!
Best regards,
Daniel