Hyper converged Ceph cluster on Proxmox with external Ceph cluster allready conected

nejcsuhadolc · Sep 20, 2020

Hi,

I'm trying to setup a local (Hyperconvergedd) cluster on a system that is already using external Ceph cluster.
When I clicked Install ceph on one of the nodes, this node started to experience problems with external ceph. I can not read files, but VMs still runs on external Ceph.

One of the errors I get is this one when trying to moove VM:


2020-09-20 03:23:14 starting migration of VM 118 to node 'prox-02' (10.20.0.102)
2020-09-20 03:23:18 found local disk 'local-zfs:vm-118-disk-0' (via storage)
2020-09-20 03:23:19 ERROR: Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory
2020-09-20 03:23:19 aborting phase 1 - cleanup resources
2020-09-20 03:23:19 ERROR: migration aborted (duration 00:00:05): Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory
TASK ERROR: migration aborted

Also I can not configure the local Ceph, it seems to pick up the config of the external ceph.

Error in syslog:


Sep 20 02:54:17 prox-06 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 20 02:57:07 prox-06 systemd[1]: Started Ceph cluster monitor daemon.
Sep 20 02:57:33 prox-06 systemd[1]: Stopping Ceph cluster monitor daemon...
Sep 20 02:57:33 prox-06 ceph-mon[8124]: 2020-09-20 02:57:33.444 7f34c33a7700 -1 received  signal: Terminated from /sbin/init  (PID: 1) UID: 0
Sep 20 02:57:33 prox-06 ceph-mon[8124]: 2020-09-20 02:57:33.444 7f34c33a7700 -1 mon.prox-06@0(leader) e1 *** Got Signal Terminated ***

Do you have any idea how to solve this?

Alwin · Sep 21, 2020

nejcsuhadolc said:
ERROR: Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory

Does the listing work from CLI, rbd -p <pool> ls?

nejcsuhadolc said:
Also I can not configure the local Ceph, it seems to pick up the config of the external ceph.

The local Ceph services need the /etc/ceph/ceph.conf, usually a link to /etc/pve/ceph.conf.

nejcsuhadolc · Sep 21, 2020

Hi Alwin

rbd returns an error:


rbd -p xenCeph ls


-bash: /usr/bin/rbd: Input/output error

/etc/ceph/ceph.conf is allready a link to /etc/pve/ceph.conf.

Content of that is:


root@prox-06:~# cat /etc/pve/ceph.conf

[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     fsid = 7b7357f8-a718-4f81-8f66-39aeb6cbf4dc
#     keyring = /etc/pve/priv/$cluster.$name.keyring
     keyring = /var/lib/ceph/mgr/ceph-$id/keyring
     mon_allow_pool_delete = true
     mon_host =  10.50.0.106
     osd_journal_size = 5120
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.prox-06]
     public_addr = 10.50.0.106

Alwin · Sep 21, 2020

nejcsuhadolc said:
rbd -p xenCeph ls

That command alone will query the cluster found in /etc/ceph/ceph.conf. Did you run it on the Ceph cluster directly?
How does your /etc/pve/storage.cfg look like?

nejcsuhadolc · Sep 21, 2020

Hi,

I have no direct access to remote Ceph, so i can only run it on my servers. If I do, I get:


root@prox-05:~# rbd -p xenCeph ls
unable to get monitor info from DNS SRV with service name: ceph-mon
rbd: couldn't connect to the cluster!
rbd: listing images failed: 2020-09-21 13:02:11.455 7f1c268ce0c0 -1 failed for service _ceph-mon._tcp(2) No such file or directory
2020-09-21 13:02:11.455 7f1c268ce0c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact

if i run it on a working node (the one i haven't run pveceph on) i get:


root@prox-02:~# rbd -p xenCeph ls
unable to get monitor info from DNS SRV with service name: ceph-mon
no monitors specified to connect to.
rbd: couldn't connect to the cluster!
rbd: list: 2020-09-21 13:04:54.230309 7f54f701c0c0 -1 failed for service _ceph-mon._tcp
(2) No such file or directory

Alwin · Sep 21, 2020

In both outputs, the node can't connect to the Ceph cluster.

nejcsuhadolc said:
root@prox-06:~# cat /etc/pve/ceph.conf

For the local Ceph cluster, you will need to specify the cluster_network and public_network in the ceph.conf. Otherwise the Ceph daemons try to run on the IP found by the hostname.

nejcsuhadolc said:
mon_host = 10.50.0.106

Does this MON exist and can every node connect to it?

nejcsuhadolc said:
I have no direct access to remote Ceph, so i can only run it on my servers.

For running the CLI command to connect onto a external Ceph cluster, you will need to specify the MON address, username and keyring.

Code:

ceph -p <pool> -m <MON-IP> -n client.<username> --keyring /<path>/<to>/<file>.keyring --auth_supported cephx ls

nejcsuhadolc · Sep 29, 2020

For running the CLI command to connect onto a external Ceph cluster, you will need to specify the MON address, username and keyring.

Code:

ceph -p <pool> -m <MON-IP> -n client.<username> --keyring /<path>/<to>/<file>.keyring --auth_supported cephx ls

[/QUOTE]

above code seem to have some semantical problems, but this one lists content of rdb storage:

Code:

rbd list [pool]  -m 1.1.1.1 -k /etc/pve/priv/ceph/Ceph.keyring --user [username]

nejcsuhadolc · Sep 29, 2020

I found out this error also:

Code:

root@ve-02:~# service pvestatd status
● pvestatd.service - PVE Status Daemon
   Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-09-28 23:45:37 CEST; 29min ago
  Process: 182719 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
 Main PID: 182728 (pvestatd)
    Tasks: 1 (limit: 9830)
   Memory: 103.4M
   CGroup: /system.slice/pvestatd.service
           └─182728 pvestatd

Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha
Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $avail in int at /usr/share/perl5
Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in int at /usr/share/perl5/
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $free in addition (+) at /usr/sha
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $avail in int at /usr/share/perl5
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $used in int at /usr/share/perl5/
Sep 29 00:15:24 ve-02 pvestatd[182728]: status update time (32.127 seconds)
Sep 29 00:15:29 ve-02 pvestatd[182728]: got timeout
Sep 29 00:15:30 ve-02 pvestatd[182728]: status update time (5.479 seconds)

Alwin · Sep 30, 2020

nejcsuhadolc said:
above code seem to have some semantical problems, but this one lists content of rdb storage:

Which are?

nejcsuhadolc said:
rbd list [pool] -m 1.1.1.1 -k /etc/pve/priv/ceph/Ceph.keyring --user [username]

What was the output?

nejcsuhadolc said:
Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha

It's a side-effect. The values for the usage calculation are uninitialized but shouldn't.

Search

Search

Hyper converged Ceph cluster on Proxmox with external Ceph cluster allready conected

nejcsuhadolc

Active Member

Alwin

Proxmox Retired Staff

nejcsuhadolc

Active Member

Alwin

Proxmox Retired Staff

nejcsuhadolc

Active Member

Alwin

Proxmox Retired Staff

nejcsuhadolc

Active Member

nejcsuhadolc

Active Member

Alwin

Proxmox Retired Staff