We have 3 nodes with 2 private nics
eth0 - 192.168.0.0/24 corosync link 1 , ceph storage, proxmox gui and clustercommunication f.e migration
eth1 - 192.168.1.0/24 corosync link 0
we have set corosync nodes to have 2 rings (ring0_addr: 192.168.1.2 / ring1_addr: 192.168.0.2).
this works well.
so corosync not runs on storage network.
ceph.conf
[global]
<------> auth_client_required = cephx
<------> auth_cluster_required = cephx
<------> auth_service_required = cephx
<------> cluster network = 192.168.0.0/24
<------> fsid = 00cf7097-2cb5-47f6-b342-e55094bf839e
<------> mon_host = 192.168.0.20 192.168.0.2 192.168.0.1
<------> mon_initial_members = storage1 storage2
<------> public network = 192.168.0.0/24
[btw: is the _ or the space correct? we get different statements]
Now if eth0 failed on storage1 (corosync works well via ring1, so the node is not marked offline)
- proxmox gui can not reach the host, no route to host
- the VM on storage1 still run (pvecm says node is online).
- now the problem is that ceph is nor reachable. so the VM hungs. But Proxmox HA think all is fine.
- ceph -s hungs of cause
How can i say ceph to use the eth1/192.168.1.0 alternatively if eth0/192.168.1.0 failed?
public network in [global]?
public network = 192.168.0.0/24, 192.168.1.0/24 seems not working. also monitors are on 192.168.0.0/24 network.
or: may ceph can fence the host so that corosync can mark this host as faulty?
or do i have a fundamental design failure?
Thanks.
eth0 - 192.168.0.0/24 corosync link 1 , ceph storage, proxmox gui and clustercommunication f.e migration
eth1 - 192.168.1.0/24 corosync link 0
we have set corosync nodes to have 2 rings (ring0_addr: 192.168.1.2 / ring1_addr: 192.168.0.2).
this works well.
so corosync not runs on storage network.
ceph.conf
[global]
<------> auth_client_required = cephx
<------> auth_cluster_required = cephx
<------> auth_service_required = cephx
<------> cluster network = 192.168.0.0/24
<------> fsid = 00cf7097-2cb5-47f6-b342-e55094bf839e
<------> mon_host = 192.168.0.20 192.168.0.2 192.168.0.1
<------> mon_initial_members = storage1 storage2
<------> public network = 192.168.0.0/24
[btw: is the _ or the space correct? we get different statements]
Now if eth0 failed on storage1 (corosync works well via ring1, so the node is not marked offline)
- proxmox gui can not reach the host, no route to host
- the VM on storage1 still run (pvecm says node is online).
- now the problem is that ceph is nor reachable. so the VM hungs. But Proxmox HA think all is fine.
- ceph -s hungs of cause
How can i say ceph to use the eth1/192.168.1.0 alternatively if eth0/192.168.1.0 failed?
public network in [global]?
public network = 192.168.0.0/24, 192.168.1.0/24 seems not working. also monitors are on 192.168.0.0/24 network.
or: may ceph can fence the host so that corosync can mark this host as faulty?
or do i have a fundamental design failure?
Thanks.