Proxmox with Corosync+Ceph / how multible "routes" to ceph storage

Discussion in 'Proxmox VE: Networking and Firewall' started by TheMrg, Aug 2, 2019.

  1. TheMrg

    TheMrg New Member

    Joined:
    Aug 1, 2019
    Messages:
    22
    Likes Received:
    0
    We have 3 nodes with 2 private nics
    eth0 - 192.168.0.0/24 corosync link 1 , ceph storage, proxmox gui and clustercommunication f.e migration
    eth1 - 192.168.1.0/24 corosync link 0



    we have set corosync nodes to have 2 rings (ring0_addr: 192.168.1.2 / ring1_addr: 192.168.0.2).
    this works well.
    so corosync not runs on storage network.

    ceph.conf
    [global]
    <------> auth_client_required = cephx
    <------> auth_cluster_required = cephx
    <------> auth_service_required = cephx
    <------> cluster network = 192.168.0.0/24
    <------> fsid = 00cf7097-2cb5-47f6-b342-e55094bf839e
    <------> mon_host = 192.168.0.20 192.168.0.2 192.168.0.1
    <------> mon_initial_members = storage1 storage2
    <------> public network = 192.168.0.0/24

    [btw: is the _ or the space correct? we get different statements]

    Now if eth0 failed on storage1 (corosync works well via ring1, so the node is not marked offline)
    - proxmox gui can not reach the host, no route to host
    - the VM on storage1 still run (pvecm says node is online).
    - now the problem is that ceph is nor reachable. so the VM hungs. But Proxmox HA think all is fine.
    - ceph -s hungs of cause

    How can i say ceph to use the eth1/192.168.1.0 alternatively if eth0/192.168.1.0 failed?
    public network in [global]?
    public network = 192.168.0.0/24, 192.168.1.0/24 seems not working. also monitors are on 192.168.0.0/24 network.

    or: may ceph can fence the host so that corosync can mark this host as faulty?

    or do i have a fundamental design failure?

    Thanks.
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,550
    Likes Received:
    221
    Ceph should handle both.

    Check that you have both IPs in your /etc/hosts and you can login with ssh from both networks.

    This would only work with either a bond or multipath routing. But in general, Ceph needs the same network redundancy as Corosync to function properly with HA.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. TheMrg

    TheMrg New Member

    Joined:
    Aug 1, 2019
    Messages:
    22
    Likes Received:
    0
    Thanks.

    "Check that you have both IPs in your /etc/hosts and you can login with ssh from both networks."
    do you mean to hosts?

    192.168.0.2 storage2
    192.168.1.2 storage2

    This does not help. May Proxmox GUI think the nodes are on 192.168.0.N .. in
    GUI cluster -> Overview the nodes are listed with
    192.168.0.N
    in
    GUI cluster -> Cluster the nodes are listed with link 0 192.168.1.N and link 1 192.168.0.N
     
    #3 TheMrg, Aug 5, 2019
    Last edited: Aug 5, 2019
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,550
    Likes Received:
    221
    The entries need to be populated in the whole cluster, eg.:
    Code:
    root@pve6ceph01:~# cat /etc/hosts
    127.0.0.1 localhost.localdomain localhost
    192.168.19.151    pve6ceph01.proxmox.com pve6ceph01
    10.10.10.151    pve6ceph01.proxmox.com pve6ceph01
    
    192.168.19.152    pve6ceph02.proxmox.com pve6ceph02
    10.10.10.152    pve6ceph02.proxmox.com pve6ceph02
    
    192.168.19.153    pve6ceph03.proxmox.com pve6ceph03
    10.10.10.153    pve6ceph03.proxmox.com pve6ceph03
    
    # The following lines are desirable for IPv6 capable hosts
    
    ::1     ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff02::3 ip6-allhosts
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. TheMrg

    TheMrg New Member

    Joined:
    Aug 1, 2019
    Messages:
    22
    Likes Received:
    0
    sadly not. we have in hosts

    192.168.0.1 storage1
    192.168.1.1 storage1
    192.168.0.2 storage2
    192.168.1.2 storage2
    192.168.0.3 storage3
    192.168.1.3 storage3

    some weeks ago added all nodes to the cluster via

    pvecm add 192.168.0.1 -link0 192.168.1.2 -link1 192.168.0.2

    Membership information
    ----------------------
    Nodeid Votes Name
    0x00000001 1 192.168.1.3
    0x00000002 1 192.168.1.1
    0x00000003 1 192.168.1.20
    0x00000004 1 192.168.1.2 (local)


    but if we down interface with 192.168.0.2 (eth0) the GUI says
    No route to host (595) to storage2
    corosync see the host is online. so no fence
     
  6. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,550
    Likes Received:
    221
    Please restart the 'pveproxy.service' on all nodes and try it again.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice