Ceph Monitor creations fails

Apr 19, 2016
47
1
8
66
New install Proxmox 4.4
Three nodes

Following https://pve.proxmox.com/wiki/Ceph_Server

I get to this section and can proceed no further

Creating more Ceph Monitors
You should run 3 monitors, one on each node. Create them via GUI or via CLI. So please login to the next node and run:

node2# pveceph createmon
And execute the same steps on the third node:

node3# pveceph createmon


On node1 the command succeeded
mon.0 pteracluster Yes 10.10.10.1:6789/0

on node2 and node3 the command returns time out
on node2 and node3 pveceph status returns got timeout

rebooted both nodes - no change

ceph service is active on both nodes

on the node with the monitor created pveceph status returns

root@pteracluster:~# pveceph status
{
"mdsmap" : {
"max" : 0,
"up" : 0,
"epoch" : 1,
"in" : 0,
"by_rank" : []
},
"quorum_names" : [
"0"
],
"pgmap" : {
"bytes_avail" : 1191772364800,
"data_bytes" : 0,
"bytes_total" : 1191807008768,
"bytes_used" : 34643968,
"version" : 9,
"num_pgs" : 64,
"pgs_by_state" : [
{
"state_name" : "active+undersized+degraded",
"count" : 64
}
]
},
"osdmap" : {
"osdmap" : {
"epoch" : 5,
"num_in_osds" : 1,
"num_up_osds" : 1,
"nearfull" : false,
"num_osds" : 1,
"num_remapped_pgs" : 0,
"full" : false
}
},
"quorum" : [
0
],
"fsid" : "59e794ea-2786-4f8d-ad3d-98f927b6e250",
"monmap" : {
"modified" : "2016-12-27 12:56:02.582855",
"epoch" : 1,
"mons" : [
{
"addr" : "10.10.10.1:6789/0",
"name" : "0",
"rank" : 0
}
],
"fsid" : "59e794ea-2786-4f8d-ad3d-98f927b6e250",
"created" : "2016-12-27 12:56:02.582855"
},
"election_epoch" : 2,
"health" : {
"timechecks" : {
"round" : 0,
"epoch" : 2,
"round_status" : "finished"
},
"health" : {
"health_services" : [
{
"mons" : [
{
"last_updated" : "2016-12-27 13:32:02.907505",
"store_stats" : {
"bytes_log" : 7395046,
"last_updated" : "0.000000",
"bytes_total" : 7396296,
"bytes_misc" : 1250,
"bytes_sst" : 0
},
"name" : "0",
"kb_total" : 71601512,
"kb_used" : 1585952,
"avail_percent" : 92,
"health" : "HEALTH_OK",
"kb_avail" : 66355376
}
]
}
]
},
"detail" : [],
"overall_status" : "HEALTH_WARN",
"summary" : [
{
"summary" : "64 pgs degraded",
"severity" : "HEALTH_WARN"
},
{
"severity" : "HEALTH_WARN",
"summary" : "64 pgs stuck degraded"
},
{
"summary" : "64 pgs stuck unclean",
"severity" : "HEALTH_WARN"
},
{
"severity" : "HEALTH_WARN",
"summary" : "64 pgs stuck undersized"
},
{
"severity" : "HEALTH_WARN",
"summary" : "64 pgs undersized"
}
]
}
}
 
Hi
can you give us the output of

Code:
pveversion -v
and also
Code:
pvecm status
 
node1
root@pteracluster:~# pveversion -v
proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
ceph: 0.94.9-1~bpo80+1

root@pteracluster:~# pvecm status
Quorum information
------------------
Date: Wed Dec 28 08:40:36 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1/128
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 69.28.32.120 (local)
0x00000003 1 69.28.32.121
0x00000002 1 69.28.32.122
root@pteracluster:~#

Node2
root@pteranode2:~# pveversion -v
proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
ceph: 0.94.9-1~bpo80+1
root@pteranode2:~# pvecm status
Quorum information
------------------
Date: Wed Dec 28 08:41:59 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000003
Ring ID: 1/128
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 69.28.32.120
0x00000003 1 69.28.32.121 (local)
0x00000002 1 69.28.32.122

Node3
root@pteranode3:~# pveversion -v
proxmox-ve: 4.4-76 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
ceph: 0.94.9-1~bpo80+1
root@pteranode3:~# pvecm status
Quorum information
------------------
Date: Wed Dec 28 08:43:35 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000002
Ring ID: 1/128
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 69.28.32.120
0x00000003 1 69.28.32.121
0x00000002 1 69.28.32.122 (local)
 
Can you send the /etc/pve/ceph.conf ?
Can you ping all nodes in the 10.10.10.0/24 network ?
 
Node 1
root@pteracluster:~# cat /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 59e794ea-2786-4f8d-ad3d-98f927b6e250
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.0]
host = pteracluster
mon addr = 10.10.10.1:6789

root@pteracluster:~# ping 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=64 time=0.306 ms
64 bytes from 10.10.10.2: icmp_seq=2 ttl=64 time=0.240 ms
^C
--- 10.10.10.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.240/0.273/0.306/0.033 ms
root@pteracluster:~# ping 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_seq=1 ttl=64 time=0.307 ms
64 bytes from 10.10.10.3: icmp_seq=2 ttl=64 time=0.202 ms
^C
--- 10.10.10.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.202/0.254/0.307/0.054 ms

Node 2

root@pteranode2:~# cat /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 59e794ea-2786-4f8d-ad3d-98f927b6e250
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.0]
host = pteracluster
mon addr = 10.10.10.1:6789

root@pteranode2:~# ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=0.178 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.220 ms
^C
--- 10.10.10.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.178/0.199/0.220/0.021 ms
root@pteranode2:~# ping 10.10.10.3
PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_seq=1 ttl=64 time=0.306 ms
64 bytes from 10.10.10.3: icmp_seq=2 ttl=64 time=0.243 ms
^C
--- 10.10.10.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.243/0.274/0.306/0.035 ms

Node 3

root@pteranode3:~# cat /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 59e794ea-2786-4f8d-ad3d-98f927b6e250
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.0]
host = pteracluster
mon addr = 10.10.10.1:6789

root@pteranode3:~# ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=0.185 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.214 ms
^C
--- 10.10.10.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.185/0.199/0.214/0.020 ms
root@pteranode3:~# ping 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=64 time=0.210 ms
64 bytes from 10.10.10.2: icmp_seq=2 ttl=64 time=0.160 ms
^C
--- 10.10.10.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.160/0.185/0.210/0.025 ms
 
Now I do not know what changed. All I did was log in to them with ssh and get the information above.
But like yesterday I tried creating the other two monitors and today they were successfully created.
 
Yes may be there was a short network problem.
 
New problem - I have ntp installed on all three system yet...
osdCapture.PNG
 
So on the clock shew I set all three nodes to talk to my time server and the three nodes peer with each other.
Waiting to see if this clears it up.
 
Tried the create a vm to be stored on the cepf drive - got error

TASK ERROR: create failed - rbd error: rbd: error opening pool rbd: (2) No such file or directory

I did copy the key ring file per instructions.

root@pteracluster:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,iso,backup

lvmthin: local-lvm
vgname pve
thinpool data
content images,rootdir

nfs: nas1
path /mnt/pve/nas1
server 69.28.32.54
export /mnt/PteraNas1/VMBackups
content images,iso
maxfiles 1
options vers=3

rbd: RBD_Drive
monhost 10.10.10.1;10.10.10.2;10.10.10.3
username admin
pool rbd
content images
krbd 0

root@pteracluster:~# ls /etc/pve/priv/ceph/
RBD_Drive.keyring
 
OK saw my error in the key ring file - charged to
pteracluster.RBD_Drive.keyring

But now I get ...
TASK ERROR: create failed - rbd error: rbd: couldn't connect to the cluster!
 
Finally got it working. Had to remove the RDB storage and create on with the proper ceph pool name and proper keyring name.

Created a VM and now I will test drive it.