Proxmox ceph cluster problem

mikipn

Member
Jan 31, 2018
11
1
6
I have 3 node proxmox cluster
And I wish to add 3 more computers to use for vm's, but when I add them to proxmox cluster they are shown with question mark, and not green.
New computers don't see ceph public network, but see iscsi storage. and some of them see ceph public network also.
In datacenter there are 2 ceph storages setup on the same ceph pool.
What could be the problem here?
I reinstalled those new nodes 2-3 times, and always I get the same result, when I add them to the same proxmox cluster as ceph nodes are
ceph storages are defined as proxmox local storages. maybe this is a source of a problem
 
This sounds to me, that your network configuration is messed up. All servers that should use the ceph storage need access to it. On what PVE+Ceph version are you (pveversion -v)?
 
proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-3-pve: 4.13.13-34
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1 pve-cluster: 5.0-19
qemu-server: 5.0-20
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-6
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.4-pve2~bpo9
ceph: 12.2.2-pve1

I don't need ceph on all nodes, but I am wondering if the point that there is storage that is inaccessible to node can cause unknown state of the node
The new nodes have only 2 nics, and I have 3 networks, so I am tryng to use openvswitch to connect iscsi and ceoh network to those nodes.
If proxmox internal rbs is installed as storage, can additional nodes see ceph or i need to setup ceph storage as external ceph?

Also I have problem on one node, if i turn on vlan taging on the ointerface network stops working on that nic after 10-20 seconds without any message in dmesg.
Another node runs the same setup without problems. But keeps being in ? state and cant show state of network storages in gui. sees iscsi storage and lvm's on it when I do pvs, vgs and lvs without problem.

My setup in those new nodes is that on second nic I have openvswitch bridge with iscsi network on it, and intport with tag for ceph network.
 
I don't need ceph on all nodes, but I am wondering if the point that there is storage that is inaccessible to node can cause unknown state of the node
If you configured a cluster wide storage and that storage can not be reached, the host might look like in an unknown state. Server is running though.

The new nodes have only 2 nics, and I have 3 networks, so I am tryng to use openvswitch to connect iscsi and ceoh network to those nodes.
Also I have problem on one node, if i turn on vlan taging on the ointerface network stops working on that nic after 10-20 seconds without any message in dmesg.
Another node runs the same setup without problems. But keeps being in ? state and cant show state of network storages in gui. sees iscsi storage and lvm's on it when I do pvs, vgs and lvs without problem.

My setup in those new nodes is that on second nic I have openvswitch bridge with iscsi network on it, and intport with tag for ceph network.
Why are you using openvswitch? It seems to make quiet some problems in your setup. I assume you are stacking technologies (ceph to iscsi over ovs). Proxmox ships with the krbd/librbd ceph client and can directly communicate with ceph.

If proxmox internal rbs is installed as storage, can additional nodes see ceph or i need to setup ceph storage as external ceph?
Yes, every host added to the pve cluster can see the ceph storage, if the network is correctly configured.

To reduce the complexity and avoid problems caused by it, I suggest, you try your setup without the openvswitch first.

Separate the ceph storage onto a dedicated network (if not already). Do the same for the corosync traffic, as with higher system load, the cluster mustn't become unstable. Let Proxmox access the Ceph storage directly, also better performance then iSCSI.
If that works, you can add the OVS on top (if needed). As a side note, the linux birdge is also vlan capable (set "bridge_vlan_aware yes" in the interface file).
 
I have solved my network problem.
I swapped internet,corosync card and iscsi,ceph card and now all my nodes are green, everything work stable.
Now the only thing that I have a problem is that with all ssd ceph cluster I have about 4xx-520MB/s write and about 230MB.s read with dd command inside a vm, while rados bech give me ~500MB.s write and about 1500MB/s read over 10gbps network.
Is there some already existing tread where I could try to find suggestions how to improve read performance?
fio in VM have 10-40MB/s performance.