using an existing ceph 10.2 (jewel) cluster with proxmox 4.2

pixel

Renowned Member
Aug 6, 2014
138
3
83
For using proxmox 4.2 with an existing jewel cluster, from the ceph jewel release notes, you need "rbd default features = 1" in the client section of ceph.conf. but i dont see a ceph.conf anywhere. is there somewhere else this can be set?

in a test cluster of virtual machines, did proxmox 4.2 and ceph 10.2.1 which works fine with rbd. my hope is to try this with the hammer client that proxmox comes with and to see if krbd can work with it. we dont need lxc, but its nice to have.
 
the jewel integration will be available soon (also a workaround for the krbd problem)
 
By soon can you give a rough estimate to help with planning? We plan on running in test / dev mode for 2-3 weeks before production.

P.s. thanks for such a fast response on a weekend! Did not expect that
 
Last edited:
the krdb image feature disabling fix is already in the pve-storage package on pve-no-subscription.

note that we only disable the features on image creation, so if you have images which you created under jewel with the default settings (which means, with features that krdb does not yet support) and want to use them with krdb, you need to run the following command once on each image (replace IMAGENAME accordingly):

Code:
rbd feature disable IMAGENAME deep-flatten,fast-diff,object-map,exclusive-lock

the pveceph integration for jewel is not yet done, so installing, creating mons/osds with pveceph or the GUI can break stuff on ceph > hammer. if you do those things manually with the tools provided by ceph, they SHOULD work, but your mileage may vary ;)[/code]
 
just tried with proxmox 4.2 pve-no-subscription and ceph 10.2.1 on ubuntu xenial. the ceph storage summary is correct in storage space, so that parts working. but, when i try to create a vm or container,

Code:
2016-06-13 02:39:38.758477 7f72dde10780 -1 did not load config file, using default settings.
Error initializing cluster client: Error
TASK ERROR: command 'ceph version' failed: exit code 1

"ceph version" returns 10.2.1 on the ceph nodes, but "Error initializing cluster client: Error" on the ceph nodes.

dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
vgname pve
thinpool data
content rootdir,images

rbd: ceph
monhost 192.168.113.31;192.168.113.32;192.168.113.33
pool rbd
krbd
content rootdir,images
username admin

kvm does work when krbd is disabled.
 
just tried with proxmox 4.2 pve-no-subscription and ceph 10.2.1 on ubuntu xenial. the ceph storage summary is correct in storage space, so that parts working. but, when i try to create a vm or container,

Code:
2016-06-13 02:39:38.758477 7f72dde10780 -1 did not load config file, using default settings.
Error initializing cluster client: Error
TASK ERROR: command 'ceph version' failed: exit code 1

"ceph version" returns 10.2.1 on the ceph nodes, but "Error initializing cluster client: Error" on the ceph nodes.

I guess you mean returns 10.2.1 on the ceph nodes and the error message on the PVE / client nodes? I will look into it..
 
Last edited:
I just posted a patch to pve-devel (I mistakenly thought "ceph version" and "ceph --version" were the same) which should fix this issue.
 
  • Like
Reactions: pixel
updated proxmox test machine on pve-no-devel and applied your patch by hand. both lxc and kvm worked. can you post here when the fix is applied to subscription? then ill try it at work.
 
Just tried removing the vm and container. vm removed fine. container could not remove the rbd, saying the image still has watchers.

Code:
2016-06-14 04:35:12.051159 7f7c8845a780 -1 did not load config file, using default settings.
Removing all snapshots: 100% complete...done.
2016-06-14 04:35:12.078886 7f4af84ff780 -1 did not load config file, using default settings.
image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
TASK ERROR: rbd rm 'vm-101-disk-1' error: rbd: error: image still has watchers

"rbd rm vm-101-disk-1" worked, but then trying to remove the container from gui gave this

Code:
2016-06-14 04:38:19.140803 7f5680cc7780 -1 did not load config file, using default settings.
rbd: error opening image vm-101-disk-1: (2) No such file or directory
TASK ERROR: rbd snap purge 'vm-101-disk-1' error: rbd: error opening image vm-101-disk-1: (2) No such file or directory

rm /etc/pve/lxc/101.conf (after removing the rbd image by hand) didnt seem to bother proxmox after that. i was not logged into the gui while doing that. opened the dialog for making a vm and container, and the vmid counter was set back to 100.
 
Last edited:
if this is reproducible for you, can you try removing the container twice in a row and see if the second attempt works?
 
the first time, i did try twice. then turned off both the ceph cluster and proxmox. turned them back on again to try again, and it worked fine the first time. made two more containers, then delete both, and that worked fine too.

note that i did not build from git, just made the one line change in your patch to the pve-no-subscription version (added the '--' to "ceph version")
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!