Fresh install 8.1.4 + Ceph 18.4 = Broken install w. multiple issues

jorel83

Active Member
Dec 11, 2017
26
2
43
41
Hi,

Tried yet again to reinstall PMX 8.1.4 and Ceph 18.4 and have the same identical results.

Issues i detect is as follows below in the screenshots, but also seems like symlinks is missing and cannot be created.

Seems this is very buggy release and i cant find a way forward.

file '/etc/ceph/ceph.conf' already exists and is not a symlink to /etc/pve/ceph.conf (500)

1708714056923.png

Not possible to add new symlink (testing on the 3rd host) identical issue across all hosts.
1708714077556.png

rados_connect failed - No such file or directory (500)
1708713553921.png

file '/etc/ceph/ceph.conf' already exists and is not a symlink to /etc/pve/ceph.conf (500)
1708713600747.png


1708713524198.png

1708713709712.png
Monitor cannot start
1708713902252.png
 

Attachments

  • 1708713989803.png
    1708713989803.png
    3.5 KB · Views: 5
correct. dont do that. there's also no reason to, pve clustering will handle it for you.
Ok, that was some advice from similar cases on the forum.

Found onre more issue I never seen befoer either, seems DNS related for ceph-mon?

root@pmx0:/etc/pve# ceph fs ls
unable to get monitor info from [B]DNS SRV[/B] with service name: ceph-mon
2024-02-23T20:00:08.528+0100 7fb5aac716c0 -1 failed for service _ceph-mon._tcp
2024-02-23T20:00:08.528+0100 7fb5aac716c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
[errno 2] RADOS object not found (error connecting to the cluster)

added to the hostfile, but no difference on this part, just grasping for straws, no good to have the entire cluster down for this long time...
 
not dns; please post your ceph.conf from the machine you are running that on.
This is the only thing generated during the setup (on 1 of the 4 hosts) both crush map and configuration database times out with Error got timeout (500)

[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.102.102/24
fsid = 17f038ec-b463-4770-8af4-504b56c1e4b7
mon_allow_pool_delete = true
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.254.10.12/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

Nothing beyond this is generated, the other hosts has identical config but instead of timeout it throws: Error rados_connect failed - No such file or directory (500)

I have now decided to remove 1 host from the equation and use as single node until this is resolved, cannot have more down time, even for hobby. Happy this is not a production environment..
 
there are no mds servers defined in your configuration file. ceph has no idea what you're asking with

ceph fs ls.
I know, and I cannot add any the webui config times out and the cli just refuses me to edit it manually

root@pmx0:~# ceph fs ls.
unable to get monitor info from DNS SRV with service name: ceph-mon
2024-02-23T23:40:00.428+0100 7f78a78106c0 -1 failed for service _ceph-mon._tcp
2024-02-23T23:40:00.428+0100 7f78a78106c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
[errno 2] RADOS object not found (error connecting to the cluster)

Added ceph-mon to /etc/hosts but makes no difference.

Keep in mind this is pure fresh install, only network for the bonds is configured before this.

Was running 7.2.X something before the reinstall and disk swap that lead up to this, even that webui ceph config worked, before that I did manual in the CLI.
 
ok, here is what I would recommend.

I assume you have not used your file system meaningfully yet.

run the following command on all nodes:
pveceph purge
pveceph install

at this point, you can either use the gui or the cli to continue.
on first node, pveceph --init (see options here: https://pve.proxmox.com/pve-docs/pveceph.1.html)
then pveceph createmon on three nodes
the createosds; since they will probably have an existing signature, you can use ceph-volume lvm disk zap to clear them.

if you have a sane configuration at this point, congratulations. if not... reinstalling everything is probably the quickest way forward.

--edit in case its not clear, DO NOT HAND MODIFY ANYTHING
 
Last edited:
ok, here is what I would recommend.

I assume you have not used your file system meaningfully yet.

run the following command on all nodes:
pveceph purge
pveceph install

at this point, you can either use the gui or the cli to continue.
on first node, pveceph --init (see options here: https://pve.proxmox.com/pve-docs/pveceph.1.html)
then pveceph createmon on three nodes
the createosds; since they will probably have an existing signature, you can use ceph-volume lvm disk zap to clear them.

if you have a sane configuration at this point, congratulations. if not... reinstalling everything is probably the quickest way forward.

--edit in case its not clear, DO NOT HAND MODIFY ANYTHING
At this stage your probably right, only thing left to do is to do CLI not UI, already reinstalled the servers 3 times now and doing it remotely so takes quite some time achieve it.

I hope to be able to try it tomorrow.

Thanks for your help so far.

Cheers
 
from scratch- order of operations:
1. build your networks. make sure you have your service (internet) network, corosync (at least one) and ceph (private and public; can be one for both) defined on all nodes before you do anything.
2. make sure you have hosts files on all nodes. hosts files should contain each node's short name (without any domain) pointing to that node's primary corosync ip
3. create your cluster. see https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_create_cluster
4. make sure all nodes are present in your gui, with green status
5. pveceph install on all nodes
6. make sure your ceph ips can ping all nodes from all nodes.
7. continue as in https://forum.proxmox.com/threads/f...-install-w-multiple-issues.142160/post-637635

at no point should you be editing anything by hand; if you have to, you messed up something above.

Once you get the whole cluster up, there are things you could be doing to tune the cluster- but not till you got the cluster up to begin with.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!