CEPH got timeout (500) creating OSD

Ste.C

New Member
Apr 7, 2019
5
0
1
47
Hi All,

My name is Stefano, I'm trying to enable CEPH on 3 DL380G8 Nodes (in HBA mode) for testing.
I followed the manual in order to setup a simple environment with cluster and public network on the same subnet.
Everything seems to be up and running except when creating OSDs.

I get this error creating a new one:
got timeout (500)

I was able to create a couple of OSDs but sometimes I get them listed into the interface and sometimes not.

Code:
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 10.3.3.0/24
     fsid = 08cb8cf3-e3b2-4784-b3e3-08f05822f409
     keyring = /etc/pve/priv/$cluster.$name.keyring
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 2
     osd pool default size = 3
     public network = 10.3.3.0/24

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.pve2]
     host = pve2
     mon addr = 10.3.3.242:6789

[mon.pve1]
     host = pve1
     mon addr = 10.3.3.241:6789

[mon.pve3]
     host = pve3
     mon addr = 10.3.3.243:6789



root@pve2:~# ceph status
  cluster:
    id:     08cb8cf3-e3b2-4784-b3e3-08f05822f409
    health: HEALTH_WARN
            mons pve1,pve2,pve3 are low on available space

  services:
    mon: 3 daemons, quorum pve1,pve2,pve3
    mgr: pve1(active), standbys: pve2, pve3
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0B
    usage:   2.00GiB used, 836GiB / 838GiB avail
    pgs:

  
root@pve2:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   16G     0   16G   0% /dev
tmpfs                 3.2G  9.3M  3.2G   1% /run
/dev/mapper/pve-root  3.2G  2.3G  774M  75% /
tmpfs                  16G   66M   16G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
tmpfs                  16G     0   16G   0% /sys/fs/cgroup
/dev/fuse              30M   28K   30M   1% /etc/pve
/dev/sdh1              97M  5.5M   92M   6% /var/lib/ceph/osd/ceph-1
root@pve2:~#


root@pve3:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-12-pve)
pve-manager: 5.3-12 (running version: 5.3-12/5fbbbaf6)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-35
pve-kernel-4.15.18-10-pve: 4.15.18-32
ceph: 12.2.11-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-48
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-12
libpve-storage-perl: 5.0-39
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-24
pve-cluster: 5.0-34
pve-container: 2.0-35
pve-docs: 5.3-3
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-18
pve-firmware: 2.0-6
pve-ha-manager: 2.0-8
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-2
pve-xtermjs: 3.10.1-2
qemu-server: 5.0-47
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2




root@pve1:~# fdisk -l
Disk /dev/sda: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F6ABA7A5-9A3C-456E-9062-9670D6780DC0


Disk /dev/sdb: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5652BE77-DFA7-4F09-AAC6-4328FA118E2D


Disk /dev/sdc: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 9BE9B0FF-E23C-4370-AE4C-CC14B689CDD4


Disk /dev/sdd: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E853C161-9A4C-4ACA-88EA-82398053025A


Disk /dev/sde: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 76C73AC2-91F1-42E6-B624-FC892020E412


Disk /dev/sdf: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 86A8A16B-BE7E-48AB-AC46-264E8EC0BDCD


Disk /dev/sdh: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sdg: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3C7E8E84-9256-4768-855A-71535FBF088A


Disk /dev/sdi: 14.5 GiB, 15514730496 bytes, 30302208 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 1C46FB63-D04A-41D4-8CCD-FC58393F97AF

Device       Start      End  Sectors  Size Type
/dev/sdi1       34     2047     2014 1007K BIOS boot
/dev/sdi2     2048  1050623  1048576  512M EFI System
/dev/sdi3  1050624 30302174 29251551   14G Linux LVM


Disk /dev/mapper/pve-root: 3.3 GiB, 3489660928 bytes, 6815744 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-swap: 1.6 GiB, 1744830464 bytes, 3407872 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
root@pve1:~#


A couple of disks are in this state


Code:
Disk /dev/sdh: 419.2 GiB, 450098159616 bytes, 879097968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5D02F9E8-5C17-4BE9-82D5-98C187360B6A

Device      Start       End   Sectors   Size Type
/dev/sdh1    2048    206847    204800   100M Ceph OSD
/dev/sdh2  206848 879097934 878891087 419.1G unknown

No errors displayed in /var/log/ when getting timeout error...

Thank you for your help.
Regards.
Stefano
 
Looks like not all server are reachable.
public network = 10.3.3.0/24
Check that every server can reach each other.

mons pve1,pve2,pve3 are low on available space
Increase disk space, Ceph's services and Proxmox VE's will write a lot of logging. Also the MON stores its DB at /var/lib/ceph/.

Disk /dev/sdi: 14.5 GiB,
This is a very small disk, perhaps a USB stick. Don't run Proxmox VE or Ceph on USB or DOM devices, their live time and low transfer rates make them less then ideal candidates.
 
Hi Alwin,

thank you for the quick reply.
The servers are reachable, we configured a bond0 interface (broadcast) with one physical nic on the 10.3.3.0/24 subnet for each server.
The interfaces are connected through a stand-alone HPE switch.

Code:
root@pve1:~# ping 10.3.3.241
PING 10.3.3.241 (10.3.3.241) 56(84) bytes of data.
64 bytes from 10.3.3.241: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 10.3.3.241: icmp_seq=2 ttl=64 time=0.017 ms
64 bytes from 10.3.3.241: icmp_seq=3 ttl=64 time=0.017 ms
64 bytes from 10.3.3.241: icmp_seq=4 ttl=64 time=0.021 ms
^C
--- 10.3.3.241 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3051ms
rtt min/avg/max/mdev = 0.017/0.018/0.021/0.005 ms
root@pve1:~# ping 10.3.3.242
PING 10.3.3.242 (10.3.3.242) 56(84) bytes of data.
64 bytes from 10.3.3.242: icmp_seq=1 ttl=64 time=0.191 ms
64 bytes from 10.3.3.242: icmp_seq=2 ttl=64 time=0.251 ms
64 bytes from 10.3.3.242: icmp_seq=3 ttl=64 time=0.165 ms
^C
--- 10.3.3.242 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2033ms
rtt min/avg/max/mdev = 0.165/0.202/0.251/0.037 ms
root@pve1:~# ping 10.3.3.243
PING 10.3.3.243 (10.3.3.243) 56(84) bytes of data.
64 bytes from 10.3.3.243: icmp_seq=1 ttl=64 time=0.128 ms
64 bytes from 10.3.3.243: icmp_seq=2 ttl=64 time=0.265 ms
64 bytes from 10.3.3.243: icmp_seq=3 ttl=64 time=0.250 ms
^C
--- 10.3.3.243 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2041ms
rtt min/avg/max/mdev = 0.128/0.214/0.265/0.062 ms

We used USB disks because it's a lab environment and HPE smart arrays in HBA mode does not support booting from the connected disks.
If they are not supported we should find some other ways to setup the test servers.

Regards.
Stefano
 
The servers are reachable, we configured a bond0 interface (broadcast) with one physical nic on the 10.3.3.0/24 subnet for each server.
If are not already, I want to make you aware of having one physical network connection for all traffic will introduce side effects. Especially with HA activated. Corosync needs a low and stable latency to operate, other services running on the same network will interfere. As Ceph is a network distributed storage, any interference as well will bring storage IO down.
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network
Even though it is a LAB environment, I advise to consider above constraints.

The ping test only ensures that 'pve1' can reach all other nodes but do the others as well (eg. pve2 <-> pve3)?

We used USB disks because it's a lab environment and HPE smart arrays in HBA mode does not support booting from the connected disks.
If they are not supported we should find some other ways to setup the test servers.
USB/DOM devices will just die fast and may also have an effect on cluster operation due to their low performance. But that said, it is not impossible to use them. Besides, there needs to be more space for the data that is produced by the services, hence the Ceph warning.
 
I just updated the corosync config to use a dedicated interface following the instructions found here:
https://forum.proxmox.com/threads/change-cluster-nodes-ip-addresses.33406/


Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

iface eno3 inet manual

iface eno4 inet manual

auto bond0
iface bond0 inet static
        address  10.3.3.241
        netmask  255.255.255.0
        bond-slaves eno4
        bond-miimon 100
        bond-mode broadcast

auto bond1
iface bond1 inet static
        address  10.2.2.241
        netmask  255.255.255.0
        bond-slaves eno3
        bond-miimon 100
        bond-mode balance-rr

auto vmbr0
iface vmbr0 inet static
        address  10.7.7.241
        netmask  255.255.255.0
        gateway  10.7.7.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0




127.0.0.1 localhost.localdomain localhost
10.2.2.241 pve1.7n.lab pve1

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts




logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.2.2.241
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.2.2.242
  }
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.2.2.243
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: ClusterLab01
  config_version: 4
  interface {
    bindnetaddr: 10.2.2.241
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

The cluster is up and running.
But...ceph is still reporting "got timeout" message.

I think the USB memory is too slow to manage the OS and the ceph I/O.
 
This still leaves the low storage space for the MON db. Also the bond config with mode broadcast might not work as intended, especially if you use a switch, broadcast would be the wrong mode anyhow. My suggestion is, keep it KISS and remove the bond. This way less side effects are possible.
 
I rebuilt the bond in may ways, the latest article I found suggested to use broadcast for replication interface...
Changed back to standard "KISS" mode, same issue.

I'm rebuilding everything on another LAB environment to avoid USB disks.
I'll update you as soon the new cluster is up and running.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!