I'm reaching here again insights and recommendations as I am currently in the process of reassessing our compute grid system. We've been relying on Hadoop/Spark/Yarn in isolated environments without the need for additional security measures. Unfortunately, they are no longer free and the cost has become significantly high (around $8,000 per server) to either add more nodes or upgrade to a newer version. So, I've decided to explore alternatives and install a new compute grid.
Here are our grid requirements:
I was considering integrating Rancher and K3s. However, as they're only available via VMs and not LXC, I have some reservations. My experience is largely with LXC (we have around 100 different containers across the Proxmox cluster), and I assume the performance difference would be minimal.
If anyone has insights, suggestions, or alternatives to suggest, they would be greatly appreciated. Thank you in advance for your time and help.
Here are our grid requirements:
- Standalone compute process (CPU/RAM requirement known before execution).
- No resource sharing between executions (when running a task with 100 jobs, each job is standalone).
- Queue management for tasks when all resources are in use.
- Multi-user support.
- Ability to query task status and errors (though the latter is less critical as I write logs to a shared folder).
- Easy addition of more nodes (even if it requires stopping and restarting the grid).
- Support for GPU resources.
- All hosts are Ubuntu 22.04.
- All managers/workers will be managed by Proxmox.
- Task can be executed on host directly (ubuntu 22) (docker wrapping not wanted to do time it needs to create )
I was considering integrating Rancher and K3s. However, as they're only available via VMs and not LXC, I have some reservations. My experience is largely with LXC (we have around 100 different containers across the Proxmox cluster), and I assume the performance difference would be minimal.
If anyone has insights, suggestions, or alternatives to suggest, they would be greatly appreciated. Thank you in advance for your time and help.
Last edited: