Pardon my less-than-intelligent question, but is there a way to install Proxmox on a Ceph cluster?

alpha754293

Member
Jan 8, 2023
132
22
23
Pardon my less-than-intelligent question, but is there a way to install Proxmox on a Ceph cluster such that Proxmox boots off of a Ceph cluster? Or is this not possible?
 
The usual boot process uses the BIOS firmware to read the very first blocks of the operating system. This is before even the "initrd"/"initramfs" is available = "pre-boot".

While boot devices may be local hardware and network devices with some different flavors of well established network-boot protocols, I have never seen Ceph being offered here. It would probably need a part of a Ceph stack residing in the EFI area.

Just my (very) limited understanding...
 
  • Like
Reactions: Johannes S
The background to my question is rooted in the fact that right now, for any given Proxmox host, I don't really have a particularly great way of backing up said Proxmox host.

Thus, one idea would be to nest Proxmox inside another Proxmox install (not an ideal situation, because then you have the Matryoshka doll syndrome) and whilst this idea is to install Proxmox on Ceph since Ceph is self-healing.

So I was thinking that if I had a system or a cluster of systems that was serving up Ceph over the network (whether it's GbE or my 100 Gbps IB system interconnect), and after installing ceph-mgr-dashboard that I might also be able to configure iSCSI targets that Proxmox will then be able to use.

That's the idea of it.

So I'm just checking to see a) how stupid of an idea this is, and b) the feasibility of implementation/how to implement.
 
iSCSI is deprecated in the Ceph project and should not be used any more.

And there is no need to backup a single Proxmox node (if you have a cluster).

You may want to backup the VM config files but everything else is really not that important.
If you want to lower the time needed to bring up a new Proxmox host write some Ansible playbooks for their basic configuration before they can join the cluster.

Usually a RAID1 on some smaller SSDs is enough for the Proxmox operating system. Remember it is based in Debian and can be automated similarly.
 
  • Like
Reactions: Johannes S
iSCSI is deprecated in the Ceph project and should not be ised any more.
Oh...I didn't realise this. Thank you.

You may want to backup the VM config files but everything else is really not that important.
I would imagine that I would want to backup for example /etc/default/grub.conf and like the storage (and possibly network) configuration settings no?

(As grub has the kernel boot parameters necessary for PCIe passthrough)
 
I would imagine that I would want to backup for example /etc/default/grub.conf and like the storage (and possibly network) configuration settings no?

(As grub has the kernel boot parameters necessary for PCIe passthrough)
Think about this the other way around: If you have some automation helping you to setup a Proxmox host you do not need to backup these settings.
 
  • Like
Reactions: Johannes S
Think about this the other way around: If you have some automation helping you to setup a Proxmox host you do not need to backup these settings.
This would be true if you know and have systems like Ansible and/or Terraform deployed (and know how to use them).

As a homelabber, I have yet to learn these systems/platforms.

Right now, all of my deployment notes are in OneNote.

There are a few scripts/threads on backing up hosts to PBS, like this.
Thank you.

Sorry -- I must be missing something as the thread that is referenced doesn't actually contain any scripts.
 
Last edited:
I don't really have a particularly great way of backing up said Proxmox host.
In a cluster you dont need or even want to backup a host. everything important lives in /etc/pve which exists on all nodes. If you DID back up a host(s), you'd open the possibility of restoring a node that has been removed from the cluster and causing untold damage when turning it on.
Or you use a PXE network boot where the initrd contains all necessary things to continue with a Ceph RBD as root device.
This is the way for headless deployment, although I'd probably not use RBD here as its simpler and more manageable to use NFS instead. and DO NOT use the storage served by the nodes for this purpose or you'll not be able to actually power on the cluster.
 
everything important lives in /etc/pve which exists on all nodes.
Well....that's everything important to pve lives in /etc/pve.

But as I said, if you have set up your system for PCIe passthrough, then you will need more than just /etc/pve for a successful and rapid redeployment.
restoring a node that has been removed from the cluster
Adding a node to a Proxmox cluster is easy. especially via the GUI. I have yet to learn how to remove a node, from said cluster. (I don't think that's an option in the PVE GUI.)

DO NOT use the storage served by the nodes for this purpose or you'll not be able to actually power on the cluster.
Yeah, I have been working out the infrastructure architecture in my head that if I want to deploy this, I will need a system to serve up Ceph that Proxmox will then be able to use or try to use.

But as noted, if I am using Proxmox to serve up said Ceph, then those nodes won't be able to leverage the fault tolerance that Ceph can offer, so I am still thinking though architectural details like this.

(And yes, I did read the comment earlier about how putting the boot disk in RAID1 can be sufficient for that, but I am also thinking "how awesome would it be. if Proxmox was supported by a Ceph cluster, that way if one or more drive dies, you can just swap the drive out and Ceph will start to re-distribute the data and "heal thyself", especially as hard drive capacities increase?".)

Like imagine having a bunch of Proxmox nodes where they reside on I think the newly announced 36 TB ePMR CMR HDDs last year, so you can have a pretty massive LVM-thin volume, but then have that be a part of an expandable Ceph cluster.[/I][/I]
 
Last edited:
But as I said, if you have set up your system for PCIe passthrough,
I think you need to carefully consider what your end goal is. PCIe passthrough is not a good citizen in a PVE cluster, since VMs with PCIe pins not only cannot move anywhere, but also liable to hang the host. if you MUST use PCIe passthrough, consider leaving that node outside the cluster. I understand that you also want hyperconverged ceph- understand what the tradeoffs are and act accordingly. In any event, backup up a host for PCIe passthrough reasons is actually poor practice since you are not guaranteed the same hardware/slot order in a replacement.

What are you using pcie passthrough for?

have yet to learn how to remove a node, from said cluster. (I don't think that's an option in the PVE GUI.)
https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node
 
you dont need pci passthrough for lxc- just would need to install the proper nvidia driver based on hardware and kernel deployed. You are better off creating an installation script, especially if you intend on having multiple nodes with GPUs.

FYI, that 4x node solution is VERY old and has 6x3.5" slots per node; kind of an odd beast. Also, there are no risers in the node chassis, which means you can only install LP sized GPUs- food for thought.
 
you dont need pci passthrough for lxc- just would need to install the proper nvidia driver based on hardware and kernel deployed.
I don't?

Huh. Good to know.

My deployment notes were written, possibly originally for GPU passthrough to Windows VM but I have found that for AI workloads, that it has been better for me to share GPU resources between LXCs than using a Windows VM.

You are better off creating an installation script, especially if you intend on having multiple nodes with GPUs.
So, there actually two things at play here:

1) The Ceph cluster is being evaluated to take over for my main "do-it-all" Proxmox server (from my mass consolidation project of Jan 2023). I have found that recently, I/O has become an issue, so I might need to split the tasks back out where instead of having one server literally "do it all", that I might need to split the storage back out so that compute can focus on compute and storage can focus on storage. (The original idea behind the 4-to-1 mass consolidation was because the four server, plus network supporting infrastructure was consuming 1242 W whereas now my "do-it-all" single 36-bay 4U server consumes around 700 W or so. So not quite cutting the power bill by half, but it's pretty darn close.

2) The compute layer is separate (including AI, etc.) The AI systems now are (or is being prepped) to be moved from my 6700K system to my 5950X system (because the 5950X supports a max of 128 GB of RAM whereas my 6700K tops out at only 64 GB of RAM). I do have an RTX A2000 6 GB in said "do-it-all" Proxmox system, but that's mostly for Plex transcoding moreso than AI workloads. It can handle some of the lighter workloads, but not by much. That is better off being relegated to my 3090s.

FYI, that 4x node solution is VERY old and has 6x3.5" slots per node; kind of an odd beast.
A lot of the stuff I have is very old due to budget constraints. Wife will kill me if I spend thousands on new(er) hardware.

My "do-it-all" Proxmox server is also rocking an X10DRI-T4+ with a pair of Intel Xeon E5-2697A v4s, but it's been working well for me actually.

Dual EPYC 7763 is still too expensive for me.

But for two 3.5" HDD bays for OS in the back and six 3.5" HDD bays in the front, then I can put 24 6 TB SATA HDDs there, set them all us as OSDs for Ceph and away we go.

It'll work for what I need it to do (based on this idea).

Also, there are no risers in the node chassis, which means you can only install LP sized GPUs- food for thought.
I know.

The intent is that if that's going to my new Ceph cluster storage system, they'll be talking to each other over 100 Gbps IB and to my main, current Proxmox "do-it-all" server, also over 100 Gbps IB.

I'm very early in the planning phase of this, to see whether this idea would even be feasible.