Hyper converged Ceph cluster on Proxmox with external Ceph cluster allready conected

nejcsuhadolc

Member
Apr 17, 2019
10
0
21
54
Hi,

I'm trying to setup a local (Hyperconvergedd) cluster on a system that is already using external Ceph cluster.
When I clicked Install ceph on one of the nodes, this node started to experience problems with external ceph. I can not read files, but VMs still runs on external Ceph.

One of the errors I get is this one when trying to moove VM:


2020-09-20 03:23:14 starting migration of VM 118 to node 'prox-02' (10.20.0.102) 2020-09-20 03:23:18 found local disk 'local-zfs:vm-118-disk-0' (via storage) 2020-09-20 03:23:19 ERROR: Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory 2020-09-20 03:23:19 aborting phase 1 - cleanup resources 2020-09-20 03:23:19 ERROR: migration aborted (duration 00:00:05): Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory TASK ERROR: migration aborted

Also I can not configure the local Ceph, it seems to pick up the config of the external ceph.

Error in syslog:
Sep 20 02:54:17 prox-06 systemd[1]: Stopped Ceph cluster monitor daemon. Sep 20 02:57:07 prox-06 systemd[1]: Started Ceph cluster monitor daemon. Sep 20 02:57:33 prox-06 systemd[1]: Stopping Ceph cluster monitor daemon... Sep 20 02:57:33 prox-06 ceph-mon[8124]: 2020-09-20 02:57:33.444 7f34c33a7700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 Sep 20 02:57:33 prox-06 ceph-mon[8124]: 2020-09-20 02:57:33.444 7f34c33a7700 -1 mon.prox-06@0(leader) e1 *** Got Signal Terminated ***

Do you have any idea how to solve this?
 
ERROR: Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory
Does the listing work from CLI, rbd -p <pool> ls?

Also I can not configure the local Ceph, it seems to pick up the config of the external ceph.
The local Ceph services need the /etc/ceph/ceph.conf, usually a link to /etc/pve/ceph.conf.
 
Hi Alwin


rbd returns an error:
rbd -p xenCeph ls -bash: /usr/bin/rbd: Input/output error

/etc/ceph/ceph.conf is allready a link to /etc/pve/ceph.conf.

Content of that is:
root@prox-06:~# cat /etc/pve/ceph.conf [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx fsid = 7b7357f8-a718-4f81-8f66-39aeb6cbf4dc # keyring = /etc/pve/priv/$cluster.$name.keyring keyring = /var/lib/ceph/mgr/ceph-$id/keyring mon_allow_pool_delete = true mon_host = 10.50.0.106 osd_journal_size = 5120 osd_pool_default_min_size = 2 osd_pool_default_size = 3 [osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring [mon.prox-06] public_addr = 10.50.0.106
 
rbd -p xenCeph ls
That command alone will query the cluster found in /etc/ceph/ceph.conf. Did you run it on the Ceph cluster directly?
How does your /etc/pve/storage.cfg look like?
 
Hi,

I have no direct access to remote Ceph, so i can only run it on my servers. If I do, I get:

root@prox-05:~# rbd -p xenCeph ls unable to get monitor info from DNS SRV with service name: ceph-mon rbd: couldn't connect to the cluster! rbd: listing images failed: 2020-09-21 13:02:11.455 7f1c268ce0c0 -1 failed for service _ceph-mon._tcp(2) No such file or directory 2020-09-21 13:02:11.455 7f1c268ce0c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact

if i run it on a working node (the one i haven't run pveceph on) i get:

root@prox-02:~# rbd -p xenCeph ls unable to get monitor info from DNS SRV with service name: ceph-mon no monitors specified to connect to. rbd: couldn't connect to the cluster! rbd: list: 2020-09-21 13:04:54.230309 7f54f701c0c0 -1 failed for service _ceph-mon._tcp (2) No such file or directory
 
In both outputs, the node can't connect to the Ceph cluster.

root@prox-06:~# cat /etc/pve/ceph.conf
For the local Ceph cluster, you will need to specify the cluster_network and public_network in the ceph.conf. Otherwise the Ceph daemons try to run on the IP found by the hostname.
mon_host = 10.50.0.106
Does this MON exist and can every node connect to it?

I have no direct access to remote Ceph, so i can only run it on my servers.
For running the CLI command to connect onto a external Ceph cluster, you will need to specify the MON address, username and keyring.
Code:
ceph -p <pool> -m <MON-IP> -n client.<username> --keyring /<path>/<to>/<file>.keyring --auth_supported cephx ls
 
For running the CLI command to connect onto a external Ceph cluster, you will need to specify the MON address, username and keyring.
Code:
ceph -p <pool> -m <MON-IP> -n client.<username> --keyring /<path>/<to>/<file>.keyring --auth_supported cephx ls
[/QUOTE]

above code seem to have some semantical problems, but this one lists content of rdb storage:

Code:
rbd list [pool]  -m 1.1.1.1 -k /etc/pve/priv/ceph/Ceph.keyring --user [username]
 
I found out this error also:

Code:
root@ve-02:~# service pvestatd status
● pvestatd.service - PVE Status Daemon
   Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-09-28 23:45:37 CEST; 29min ago
  Process: 182719 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
 Main PID: 182728 (pvestatd)
    Tasks: 1 (limit: 9830)
   Memory: 103.4M
   CGroup: /system.slice/pvestatd.service
           └─182728 pvestatd

Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha
Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $avail in int at /usr/share/perl5
Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in int at /usr/share/perl5/
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $free in addition (+) at /usr/sha
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $avail in int at /usr/share/perl5
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $used in int at /usr/share/perl5/
Sep 29 00:15:24 ve-02 pvestatd[182728]: status update time (32.127 seconds)
Sep 29 00:15:29 ve-02 pvestatd[182728]: got timeout
Sep 29 00:15:30 ve-02 pvestatd[182728]: status update time (5.479 seconds)
 
above code seem to have some semantical problems, but this one lists content of rdb storage:
Which are?

rbd list [pool] -m 1.1.1.1 -k /etc/pve/priv/ceph/Ceph.keyring --user [username]
What was the output?

Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha
It's a side-effect. The values for the usage calculation are uninitialized but shouldn't.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!