[SOLVED] CEPH + Kubernetes

Isaack

New Member
Jul 28, 2020
6
2
3
43
I've installed Proxmox PVE on 3 nodes and set up Ceph which works fine with VMs running from them.
The Cluster and Public network are on the same subnet with the monitors being on the same subnet as the VMs.

My problem is using my Ceph storage with Kubernetes running as VMs. I cannot get it to provision any pools.

Even if I install Ceph-common-tools on the VMs and copying ceph.conf and ceph.client.admin.keyring, I cannot do any calls like "ceph -s".
I always get this error: [errno 2] RADOS object not found (error connecting to the cluster)

How do I set a client to communicate with my ceph cluster?
 
My problem is using my Ceph storage with Kubernetes running as VMs. I cannot get it to provision any pools.
Can your VMs reach the ceph cluster (eg. same IP range, firewall)?
 
Yes I can, I can curl to the hosts and they will respond with "CEPH" and some numbers.
curl to what endpoint? The VMs need to access all parts of the Ceph cluster.
 
I tried connnecting to the monitor address on port 6789, which responded with "CEPH..."
For the VM I'm testing with, I also made it accessible to the Cluster and Public address.

Just now I discovered what I was missing: /etc/pve/priv/ceph.client.admin.keyring
I don't recall seeing this file being mentioned, only the ones from /etc/ceph/

So now I can do sudo ceph status from all my VMs.
And I'm back to my original problem, where the Kubernetes provisioner replies with:
Code:
GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-085c8a16-39ff-4322-b5d0-ed840ee058c0 already exists
 
Well, this is kubernetes specific. Either the volume ID really exists already or the kubernetes setup needs some additional configuration.
 
Just to conclude this as I've now solved it. And for others who reach this thread.

My misunderstanding was that they only needed access to the monitor network. But like you said, allowing all Kubernetes nodes access to the Public/Storage network as well, it provisioned correctly.
So I think the test could be to see if you can do "ceph status" and "rbd ls [pool-name]"

Thanks for leading me to a solution
 
@Isaack if you don't mind, I'd like to pick your brain about more details of your set as I am highly considering Ceph on a new school setup for some basic library and administration apps within k8s.
  • Proxmox on 3 big nodes.
  • Mixed HDD/SSD disks. One server has all the TBs for storage, while the other two have different size small SSDs and one or two HDDs (would prefer to do an LVM ssd cache setup if possible)
So I'm not sure if Ceph is the answer here yet; still experimenting within Virtualbox.

Once this is up, it's Kubernetes - most likely k3s ontop as I currently have them on EKS on AWS and have it all setup there in helm charts.

What I am struggling with is the PV/PVC lifecycle with this entire setup... I highly prefer to control the lifecycle of each PVC for each pod/container. Some apps don't need any redundancy as its built into the app's clustering itself. While other apps are a single point of failure and needs 3x replicas of their volumes.

What you mentioned above is an option I haven't considered yet: Proxmox to provision Ceph, and then have K8s talks to that Ceph cluster directly for k8s storage, instead of local file/dir storage.

Knowing what i posted above about the lifecycles, could you provide more details on how this is working for you with your Ceph-by-Proxmox-accessed-by-K8s-VMs is working out for you and the options you have?
 
Last edited:
@eduncan911 your setup will work with Proxmox, CEPH and Kubernentes. I however have just moved away from Proxmox as I realized I didn't need the virtualization anymore for a mix of Windows and Linux applications. This will also greatly simplify the setup, which was neat, but it started looking a bit silly with lots of moving pieces.

For your PV/PVC, I suggest that you create different pools in Proxmox/Ceph with their different redudancy levels that you need. I'm not familiar with LVM caches(only ZFS). And I tried different caching setups with CEPH but ended up having pure SSD pools and then pure HDD pools. My hardware is old and limited in many ways with regards to bandwidth.
Once you've created the different pools, you can then bind them in Kubernetes with PVCs and CEPH CSI bindings. The VMs will connect directly to CEPH and it definitely works.