Ceph Monitor creations fails

Apr 19, 2016
47
1
8
66
New install Proxmox 4.4
Three nodes

Following https://pve.proxmox.com/wiki/Ceph_Server

I get to this section and can proceed no further

Creating more Ceph Monitors
You should run 3 monitors, one on each node. Create them via GUI or via CLI. So please login to the next node and run:

node2# pveceph createmon
And execute the same steps on the third node:

node3# pveceph createmon


On node1 the command succeeded
mon.0 pteracluster Yes 10.10.10.1:6789/0

on node2 and node3 the command returns time out
on node2 and node3 pveceph status returns got timeout

rebooted both nodes - no change

ceph service is active on both nodes

on the node with the monitor created pveceph status returns

root@pteracluster:~# pveceph status
{
"mdsmap" : {
"max" : 0,
"up" : 0,
"epoch" : 1,
"in" : 0,
"by_rank" : []
},
"quorum_names" : [
"0"
],
"pgmap" : {
"bytes_avail" : 1191772364800,
"data_bytes" : 0,
"bytes_total" : 1191807008768,
"bytes_used" : 34643968,
"version" : 9,
"num_pgs" : 64,
"pgs_by_state" : [
{
"state_name" : "active+undersized+degraded",
"count" : 64
}
]
},
"osdmap" : {
"osdmap" : {
"epoch" : 5,
"num_in_osds" : 1,
"num_up_osds" : 1,
"nearfull" : false,
"num_osds" : 1,
"num_remapped_pgs" : 0,
"full" : false
}
},
"quorum" : [
0
],
"fsid" : "59e794ea-2786-4f8d-ad3d-98f927b6e250",
"monmap" : {
"modified" : "2016-12-27 12:56:02.582855",
"epoch" : 1,
"mons" : [
{
"addr" : "10.10.10.1:6789/0",
"name" : "0",
"rank" : 0
}
],
"fsid" : "59e794ea-2786-4f8d-ad3d-98f927b6e250",
"created" : "2016-12-27 12:56:02.582855"
},
"election_epoch" : 2,
"health" : {
"timechecks" : {
"round" : 0,
"epoch" : 2,
"round_status" : "finished"
},
"health" : {
"health_services" : [
{
"mons" : [
{
"last_updated" : "2016-12-27 13:32:02.907505",
"store_stats" : {
"bytes_log" : 7395046,
"last_updated" : "0.000000",
"bytes_total" : 7396296,
"bytes_misc" : 1250,
"bytes_sst" : 0
},
"name" : "0",
"kb_total" : 71601512,
"kb_used" : 1585952,
"avail_percent" : 92,
"health" : "HEALTH_OK",
"kb_avail" : 66355376
}
]
}
]
},
"detail" : [],
"overall_status" : "HEALTH_WARN",
"summary" : [
{
"summary" : "64 pgs degraded",
"severity" : "HEALTH_WARN"
},
{
"severity" : "HEALTH_WARN",
"summary" : "64 pgs stuck degraded"
},
{
"summary" : "64 pgs stuck unclean",
"severity" : "HEALTH_WARN"
},
{
"severity" : "HEALTH_WARN",
"summary" : "64 pgs stuck undersized"
},
{
"severity" : "HEALTH_WARN",
"summary" : "64 pgs undersized"
}
]
}
}
 
Hi
can you give us the output of

Code:
pveversion -v
and also
Code:
pvecm status
 
node1
root@pteracluster:~# pveversion -v
proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
ceph: 0.94.9-1~bpo80+1

root@pteracluster:~# pvecm status
Quorum information
------------------
Date: Wed Dec 28 08:40:36 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1/128
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 69.28.32.120 (local)
0x00000003 1 69.28.32.121
0x00000002 1 69.28.32.122
root@pteracluster:~#

Node2
root@pteranode2:~# pveversion -v
proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
ceph: 0.94.9-1~bpo80+1
root@pteranode2:~# pvecm status
Quorum information
------------------
Date: Wed Dec 28 08:41:59 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000003
Ring ID: 1/128
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 69.28.32.120
0x00000003 1 69.28.32.121 (local)
0x00000002 1 69.28.32.122

Node3
root@pteranode3:~# pveversion -v
proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
ceph: 0.94.9-1~bpo80+1
root@pteranode3:~# pvecm status
Quorum information
------------------
Date: Wed Dec 28 08:43:35 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000002
Ring ID: 1/128
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 69.28.32.120
0x00000003 1 69.28.32.121
0x00000002 1 69.28.32.122 (local)
 
Can you send the /etc/pve/ceph.conf ?
Can you ping all nodes in the 10.10.10.0/24 network ?
 
Node 1
root@pteracluster:~# cat /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 59e794ea-2786-4f8d-ad3d-98f927b6e250
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.0]
host = pteracluster
mon addr = 10.10.10.1:6789

root@pteracluster:~# ping 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=64 time=0.306 ms
64 bytes from 10.10.10.2: icmp_seq=2 ttl=64 time=0.240 ms
^C
--- 10.10.10.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.240/0.273/0.306/0.033 ms
root@pteracluster:~# ping 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_seq=1 ttl=64 time=0.307 ms
64 bytes from 10.10.10.3: icmp_seq=2 ttl=64 time=0.202 ms
^C
--- 10.10.10.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.202/0.254/0.307/0.054 ms

Node 2

root@pteranode2:~# cat /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 59e794ea-2786-4f8d-ad3d-98f927b6e250
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.0]
host = pteracluster
mon addr = 10.10.10.1:6789

root@pteranode2:~# ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=0.178 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.220 ms
^C
--- 10.10.10.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.178/0.199/0.220/0.021 ms
root@pteranode2:~# ping 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_seq=1 ttl=64 time=0.306 ms
64 bytes from 10.10.10.3: icmp_seq=2 ttl=64 time=0.243 ms
^C
--- 10.10.10.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.243/0.274/0.306/0.035 ms

Node 3

root@pteranode3:~# cat /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 59e794ea-2786-4f8d-ad3d-98f927b6e250
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.0]
host = pteracluster
mon addr = 10.10.10.1:6789

root@pteranode3:~# ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=0.185 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.214 ms
^C
--- 10.10.10.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.185/0.199/0.214/0.020 ms
root@pteranode3:~# ping 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=64 time=0.210 ms
64 bytes from 10.10.10.2: icmp_seq=2 ttl=64 time=0.160 ms
^C
--- 10.10.10.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.160/0.185/0.210/0.025 ms
 
Now I do not know what changed. All I did was log in to them with ssh and get the information above.
But like yesterday I tried creating the other two monitors and today they were successfully created.
 
Yes may be there was a short network problem.
 
New problem - I have ntp installed on all three system yet...
osdCapture.PNG
 
So on the clock shew I set all three nodes to talk to my time server and the three nodes peer with each other.
Waiting to see if this clears it up.
 
Tried the create a vm to be stored on the cepf drive - got error

TASK ERROR: create failed - rbd error: rbd: error opening pool rbd: (2) No such file or directory

I did copy the key ring file per instructions.

root@pteracluster:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,iso,backup

lvmthin: local-lvm
vgname pve
thinpool data
content images,rootdir

nfs: nas1
path /mnt/pve/nas1
server 69.28.32.54
export /mnt/PteraNas1/VMBackups
content images,iso
maxfiles 1
options vers=3

rbd: RBD_Drive
monhost 10.10.10.1;10.10.10.2;10.10.10.3
username admin
pool rbd
content images
krbd 0

root@pteracluster:~# ls /etc/pve/priv/ceph/
RBD_Drive.keyring
 
OK saw my error in the key ring file - charged to
pteracluster.RBD_Drive.keyring

But now I get ...
TASK ERROR: create failed - rbd error: rbd: couldn't connect to the cluster!
 
Finally got it working. Had to remove the RDB storage and create on with the proper ceph pool name and proper keyring name.

Created a VM and now I will test drive it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!