Proxmox with Corosync+Ceph / how multible "routes" to ceph storage

TheMrg

New Member
Aug 1, 2019
25
0
1
38
We have 3 nodes with 2 private nics
eth0 - 192.168.0.0/24 corosync link 1 , ceph storage, proxmox gui and clustercommunication f.e migration
eth1 - 192.168.1.0/24 corosync link 0



we have set corosync nodes to have 2 rings (ring0_addr: 192.168.1.2 / ring1_addr: 192.168.0.2).
this works well.
so corosync not runs on storage network.

ceph.conf
[global]
<------> auth_client_required = cephx
<------> auth_cluster_required = cephx
<------> auth_service_required = cephx
<------> cluster network = 192.168.0.0/24
<------> fsid = 00cf7097-2cb5-47f6-b342-e55094bf839e
<------> mon_host = 192.168.0.20 192.168.0.2 192.168.0.1
<------> mon_initial_members = storage1 storage2
<------> public network = 192.168.0.0/24

[btw: is the _ or the space correct? we get different statements]

Now if eth0 failed on storage1 (corosync works well via ring1, so the node is not marked offline)
- proxmox gui can not reach the host, no route to host
- the VM on storage1 still run (pvecm says node is online).
- now the problem is that ceph is nor reachable. so the VM hungs. But Proxmox HA think all is fine.
- ceph -s hungs of cause

How can i say ceph to use the eth1/192.168.1.0 alternatively if eth0/192.168.1.0 failed?
public network in [global]?
public network = 192.168.0.0/24, 192.168.1.0/24 seems not working. also monitors are on 192.168.0.0/24 network.

or: may ceph can fence the host so that corosync can mark this host as faulty?

or do i have a fundamental design failure?

Thanks.
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,438
295
88
[btw: is the _ or the space correct? we get different statements]
Ceph should handle both.

- proxmox gui can not reach the host, no route to host
Check that you have both IPs in your /etc/hosts and you can login with ssh from both networks.

How can i say ceph to use the eth1/192.168.1.0 alternatively if eth0/192.168.1.0 failed?
public network in [global]?
public network = 192.168.0.0/24, 192.168.1.0/24 seems not working. also monitors are on 192.168.0.0/24 network.
This would only work with either a bond or multipath routing. But in general, Ceph needs the same network redundancy as Corosync to function properly with HA.
 

TheMrg

New Member
Aug 1, 2019
25
0
1
38
Thanks.

"Check that you have both IPs in your /etc/hosts and you can login with ssh from both networks."
do you mean to hosts?

192.168.0.2 storage2
192.168.1.2 storage2

This does not help. May Proxmox GUI think the nodes are on 192.168.0.N .. in
GUI cluster -> Overview the nodes are listed with
192.168.0.N
in
GUI cluster -> Cluster the nodes are listed with link 0 192.168.1.N and link 1 192.168.0.N
 
Last edited:

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,438
295
88
The entries need to be populated in the whole cluster, eg.:
Code:
root@pve6ceph01:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.19.151    pve6ceph01.proxmox.com pve6ceph01
10.10.10.151    pve6ceph01.proxmox.com pve6ceph01

192.168.19.152    pve6ceph02.proxmox.com pve6ceph02
10.10.10.152    pve6ceph02.proxmox.com pve6ceph02

192.168.19.153    pve6ceph03.proxmox.com pve6ceph03
10.10.10.153    pve6ceph03.proxmox.com pve6ceph03

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
 

TheMrg

New Member
Aug 1, 2019
25
0
1
38
sadly not. we have in hosts

192.168.0.1 storage1
192.168.1.1 storage1
192.168.0.2 storage2
192.168.1.2 storage2
192.168.0.3 storage3
192.168.1.3 storage3

some weeks ago added all nodes to the cluster via

pvecm add 192.168.0.1 -link0 192.168.1.2 -link1 192.168.0.2

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.3
0x00000002 1 192.168.1.1
0x00000003 1 192.168.1.20
0x00000004 1 192.168.1.2 (local)


but if we down interface with 192.168.0.2 (eth0) the GUI says
No route to host (595) to storage2
corosync see the host is online. so no fence
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,438
295
88
Please restart the 'pveproxy.service' on all nodes and try it again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!