Pardon my less-than-intelligent question, but is there a way to install Proxmox on a Ceph cluster?

alpha754293 · 2026-02-21T06:08:53+0100

Pardon my less-than-intelligent question, but is there a way to install Proxmox on a Ceph cluster such that Proxmox boots off of a Ceph cluster? Or is this not possible?

UdoB · 2026-02-21T09:13:11+0100

The usual boot process uses the BIOS firmware to read the very first blocks of the operating system. This is before even the "initrd"/"initramfs" is available = "pre-boot".

While boot devices may be local hardware and network devices with some different flavors of well established network-boot protocols, I have never seen Ceph being offered here. It would probably need a part of a Ceph stack residing in the EFI area.

Just my (very) limited understanding...

gurubert · 2026-02-21T12:57:01+0100

Ceph can deploy NVMEoF gateways. You need to find hardware that is able to boot from that.
Or you use a PXE network boot where the initrd contains all necessary things to continue with a Ceph RBD as root device.

alpha754293 · 2026-02-21T13:18:18+0100

The background to my question is rooted in the fact that right now, for any given Proxmox host, I don't really have a particularly great way of backing up said Proxmox host.

Thus, one idea would be to nest Proxmox inside another Proxmox install (not an ideal situation, because then you have the Matryoshka doll syndrome) and whilst this idea is to install Proxmox on Ceph since Ceph is self-healing.

So I was thinking that if I had a system or a cluster of systems that was serving up Ceph over the network (whether it's GbE or my 100 Gbps IB system interconnect), and after installing ceph-mgr-dashboard that I might also be able to configure iSCSI targets that Proxmox will then be able to use.

That's the idea of it.

So I'm just checking to see a) how stupid of an idea this is, and b) the feasibility of implementation/how to implement.

gurubert · 2026-02-21T14:08:05+0100

iSCSI is deprecated in the Ceph project and should not be used any more.

And there is no need to backup a single Proxmox node (if you have a cluster).

You may want to backup the VM config files but everything else is really not that important.
If you want to lower the time needed to bring up a new Proxmox host write some Ansible playbooks for their basic configuration before they can join the cluster.

Usually a RAID1 on some smaller SSDs is enough for the Proxmox operating system. Remember it is based in Debian and can be automated similarly.

alpha754293 · 2026-02-21T14:11:28+0100

gurubert said:
iSCSI is deprecated in the Ceph project and should not be ised any more.

Oh...I didn't realise this. Thank you.

gurubert said:
You may want to backup the VM config files but everything else is really not that important.

I would imagine that I would want to backup for example /etc/default/grub.conf and like the storage (and possibly network) configuration settings no?

(As grub has the kernel boot parameters necessary for PCIe passthrough)

gurubert · 2026-02-21T14:42:07+0100

alpha754293 said:
I would imagine that I would want to backup for example /etc/default/grub.conf and like the storage (and possibly network) configuration settings no?

(As grub has the kernel boot parameters necessary for PCIe passthrough)

Think about this the other way around: If you have some automation helping you to setup a Proxmox host you do not need to backup these settings.

SteveITS · 2026-02-21T14:49:09+0100

alpha754293 said:
I don't really have a particularly great way of backing up said Proxmox host.

There are a few scripts/threads on backing up hosts to PBS, like this.

alpha754293 · 2026-02-21T15:40:50+0100

gurubert said:
Think about this the other way around: If you have some automation helping you to setup a Proxmox host you do not need to backup these settings.

This would be true if you know and have systems like Ansible and/or Terraform deployed (and know how to use them).

As a homelabber, I have yet to learn these systems/platforms.

Right now, all of my deployment notes are in OneNote.

SteveITS said:
There are a few scripts/threads on backing up hosts to PBS, like this.

Thank you.

Sorry -- I must be missing something as the thread that is referenced doesn't actually contain any scripts.

SteveITS · 2026-02-21T16:06:21+0100

My post there linked to https://forum.proxmox.com/threads/backing-up-pve-host-zfs-boot-disks-to-pbs.165169/post-767215

alpha754293 · 2026-02-21T16:13:03+0100

SteveITS said:
My post there linked to https://forum.proxmox.com/threads/backing-up-pve-host-zfs-boot-disks-to-pbs.165169/post-767215

Thank you.

This looks like that it is only backing up /etc/pve correct?

So if you have configured PCIe passthrough, it won't back up any of those changes as well, correct?

Just want to double check.

Thanks.

gurubert · 2026-02-21T19:36:54+0100

In this case just backup the complete /etc. All configuration should be contained therein.

SteveITS · 2026-02-21T19:37:17+0100

IIRC that was discussed in the Reddit thread. Originally the script used tar but /etc/pve can be included directly and is in that version. It backs up most everything…run one and take a look at the backup.

alpha754293 · 2026-02-21T19:45:28+0100

SteveITS said:
IIRC that was discussed in the Reddit thread. Originally the script used tar but /etc/pve can be included directly and is in that version. It backs up most everything…run one and take a look at the backup.

Thank you.

alexskysilk · 2026-02-21T19:55:13+0100

alpha754293 said:
I don't really have a particularly great way of backing up said Proxmox host.

In a cluster you dont need or even want to backup a host. everything important lives in /etc/pve which exists on all nodes. If you DID back up a host(s), you'd open the possibility of restoring a node that has been removed from the cluster and causing untold damage when turning it on.

gurubert said:
Or you use a PXE network boot where the initrd contains all necessary things to continue with a Ceph RBD as root device.

This is the way for headless deployment, although I'd probably not use RBD here as its simpler and more manageable to use NFS instead. and DO NOT use the storage served by the nodes for this purpose or you'll not be able to actually power on the cluster.

alpha754293 · 2026-02-22T01:50:36+0100

alexskysilk said:
everything important lives in /etc/pve which exists on all nodes.

Well....that's everything important to pve lives in /etc/pve.

But as I said, if you have set up your system for PCIe passthrough, then you will need more than just /etc/pve for a successful and rapid redeployment.

alexskysilk said:
restoring a node that has been removed from the cluster

Adding a node to a Proxmox cluster is easy. especially via the GUI. I have yet to learn how to remove a node, from said cluster. (I don't think that's an option in the PVE GUI.)

alexskysilk said:
DO NOT use the storage served by the nodes for this purpose or you'll not be able to actually power on the cluster.

Yeah, I have been working out the infrastructure architecture in my head that if I want to deploy this, I will need a system to serve up Ceph that Proxmox will then be able to use or try to use.

But as noted, if I am using Proxmox to serve up said Ceph, then those nodes won't be able to leverage the fault tolerance that Ceph can offer, so I am still thinking though architectural details like this.

(And yes, I did read the comment earlier about how putting the boot disk in RAID1 can be sufficient for that, but I am also thinking "how awesome would it be. if Proxmox was supported by a Ceph cluster, that way if one or more drive dies, you can just swap the drive out and Ceph will start to re-distribute the data and "heal thyself", especially as hard drive capacities increase?".)

Like imagine having a bunch of Proxmox nodes where they reside on I think the newly announced 36 TB ePMR CMR HDDs last year, so you can have a pretty massive LVM-thin volume, but then have that be a part of an expandable Ceph cluster.[/I][/I]

alexskysilk · 2026-02-22T03:05:59+0100

alpha754293 said:
But as I said, if you have set up your system for PCIe passthrough,

I think you need to carefully consider what your end goal is. PCIe passthrough is not a good citizen in a PVE cluster, since VMs with PCIe pins not only cannot move anywhere, but also liable to hang the host. if you MUST use PCIe passthrough, consider leaving that node outside the cluster. I understand that you also want hyperconverged ceph- understand what the tradeoffs are and act accordingly. In any event, backup up a host for PCIe passthrough reasons is actually poor practice since you are not guaranteed the same hardware/slot order in a replacement.

What are you using pcie passthrough for?

alpha754293 said:
have yet to learn how to remove a node, from said cluster. (I don't think that's an option in the PVE GUI.)

https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node

alpha754293 · 2026-02-22T03:56:43+0100

alexskysilk said:
I think you need to carefully consider what your end goal is.

Agreed.

alexskysilk said:
What are you using pcie passthrough for?

Locally/self-hosted AI via LXCs.

I am currently looking at getting something like this SYS-F628R3-RTBPT+ to try and see if I can improve storage services by deploying Ceph rather than using ZFS.

alexskysilk · 2026-02-22T04:36:06+0100

you dont need pci passthrough for lxc- just would need to install the proper nvidia driver based on hardware and kernel deployed. You are better off creating an installation script, especially if you intend on having multiple nodes with GPUs.

FYI, that 4x node solution is VERY old and has 6x3.5" slots per node; kind of an odd beast. Also, there are no risers in the node chassis, which means you can only install LP sized GPUs- food for thought.

alpha754293 · 2026-02-22T05:26:28+0100

alexskysilk said:
you dont need pci passthrough for lxc- just would need to install the proper nvidia driver based on hardware and kernel deployed.

I don't?

Huh. Good to know.

My deployment notes were written, possibly originally for GPU passthrough to Windows VM but I have found that for AI workloads, that it has been better for me to share GPU resources between LXCs than using a Windows VM.

alexskysilk said:
You are better off creating an installation script, especially if you intend on having multiple nodes with GPUs.

So, there actually two things at play here:

1) The Ceph cluster is being evaluated to take over for my main "do-it-all" Proxmox server (from my mass consolidation project of Jan 2023). I have found that recently, I/O has become an issue, so I might need to split the tasks back out where instead of having one server literally "do it all", that I might need to split the storage back out so that compute can focus on compute and storage can focus on storage. (The original idea behind the 4-to-1 mass consolidation was because the four server, plus network supporting infrastructure was consuming 1242 W whereas now my "do-it-all" single 36-bay 4U server consumes around 700 W or so. So not quite cutting the power bill by half, but it's pretty darn close.

2) The compute layer is separate (including AI, etc.) The AI systems now are (or is being prepped) to be moved from my 6700K system to my 5950X system (because the 5950X supports a max of 128 GB of RAM whereas my 6700K tops out at only 64 GB of RAM). I do have an RTX A2000 6 GB in said "do-it-all" Proxmox system, but that's mostly for Plex transcoding moreso than AI workloads. It can handle some of the lighter workloads, but not by much. That is better off being relegated to my 3090s.

alexskysilk said:
FYI, that 4x node solution is VERY old and has 6x3.5" slots per node; kind of an odd beast.

A lot of the stuff I have is very old due to budget constraints. Wife will kill me if I spend thousands on new(er) hardware.

My "do-it-all" Proxmox server is also rocking an X10DRI-T4+ with a pair of Intel Xeon E5-2697A v4s, but it's been working well for me actually.

Dual EPYC 7763 is still too expensive for me.

But for two 3.5" HDD bays for OS in the back and six 3.5" HDD bays in the front, then I can put 24 6 TB SATA HDDs there, set them all us as OSDs for Ceph and away we go.

It'll work for what I need it to do (based on this idea).

alexskysilk said:
Also, there are no risers in the node chassis, which means you can only install LP sized GPUs- food for thought.

I know.

The intent is that if that's going to my new Ceph cluster storage system, they'll be talking to each other over 100 Gbps IB and to my main, current Proxmox "do-it-all" server, also over 100 Gbps IB.

I'm very early in the planning phase of this, to see whether this idea would even be feasible.

Pardon my less-than-intelligent question, but is there a way to install Proxmox on a Ceph cluster?

Member

Distinguished Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Renowned Member

Member

Renowned Member

Member

Distinguished Member

Renowned Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

We value your privacy