Help with cluster configuration

Discussion in 'Proxmox VE: Installation and configuration' started by Valerio Pachera, Apr 30, 2018.

  1. Valerio Pachera

    Joined:
    Aug 19, 2016
    Messages:
    131
    Likes Received:
    2
    Hi, I'm configuring my first cluster following the documentation.

    I have a dedicated nic for corosync on 192.168.9.0/24 network.
    On the first node (192.168.9.106) I run 'pvecm create pve-cluster-01' and no errors were reported.
    On the second node I run
    Code:
    pvecm add 192.168.9.106 --ring0_addr 192.168.9.78
    ...
    Request addition of this node
    500 cluster not ready - no quorum?
    
    Then

    Code:
    pvecm status
    Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
    Cannot initialize CMAP service
    
    pvecm nodes
    Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
    Cannot initialize CMAP service
    
    On the first node

    Code:
    pvecm status
    Cannot initialize CMAP service
    
    I guess the first step went wrong but i got no errors.

    PS: the two dedicated nic are connected to a TP-Link TL-SG105E switch.
    I enabled IGMP snooping on it.

    Any suggestion?
    Any log I can look at?

    Looking at the man page, I think I should create the cluster on the first node by the rin0_addr option in first place.
    Otherwise it would bind to another address.

    Code:
          pvecm create <clustername> [OPTIONS]
          ....
           --ring0_addr <string> (default = Hostname of the node)
               Hostname (or IP) of the corosync ring0 address of this node.
     
  2. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,829
    Likes Received:
    158
    Hi,
    I assume your host entry on the first node don't point to 192.168.9.106?!

    Take a look at /etc/corosync/corosync.conf on the first node.

    Udo
     
  3. Valerio Pachera

    Joined:
    Aug 19, 2016
    Messages:
    131
    Likes Received:
    2
    @udo You are right!
    Nontheless I found out that the ip address set in /etc/hosts was not matching the current server ip.
    Note: the first server has been installed and configured by a third person and is already running some guests.

    1) So, because I want to use a dedicated NIC, I have to specify its address alreay when I create the cluster.
    Am I right?
    By man, I see two options:

    Code:
           --bindnet0_addr <string>
               This specifies the network address the corosync ring 0 executive should bind to and defaults to the local IP address of the node.
    
           --ring0_addr <string> (default = Hostname of the node)
               Hostname (or IP) of the corosync ring0 address of this node.
    2) Honestly, I dont understand the 'bindnet0_addr' and the difference with 'ring0_addr'.
    Could you explain it please?

    3) I guess it's safe to run 'pvecm create' a second time right?
    As of now, there are no other nodes.

    Thank you.
     
  4. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,829
    Likes Received:
    158
    Hi,
    you can edit corosync.conf (and /etc/pve/corosync.conf) to use the right IPs (bindnetaddr + ring0_addr) and restart corosync.

    Udo
     
  5. Valerio Pachera

    Joined:
    Aug 19, 2016
    Messages:
    131
    Likes Received:
    2
    Hi udo, I changed /etc/corosync/corosync.conf but the file /etc/pve/corosync.conf has 440 permission so I can't edit it (and still contains the wrong ip).
    Am I allowed to add write permission?
    Are you sure I can change corosync settings without rebooting the server?
    Shall I increase 'config_version' as described in https://pve.proxmox.com/wiki/Separate_Cluster_Network#Configure_corosync ?
    Note: as of now, if I try to start a guest or do any action in the gui I get the error "cluster not ready - no quorum? (500)".

    Code:
    May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
    May  2 08:49:39 pve corosync[14854]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
    May  2 08:49:39 pve corosync[14854]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
    May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
    May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
    May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
    May  2 08:49:39 pve corosync[14854]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
    May  2 08:49:39 pve corosync[14854]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
    May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] The network interface [192.168.9.106] is now up.
    May  2 08:49:39 pve corosync[14854]:  [TOTEM ] The network interface [192.168.9.106] is now up.
    May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
    May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cmap
    May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync configuration service [1]
    May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cfg
    May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
    May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cpg
    May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
    May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
    May  2 08:49:39 pve corosync[14854]: warning [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
    May  2 08:49:39 pve corosync[14854]: warning [WD    ] resource load_15min missing a recovery key.
    May  2 08:49:39 pve corosync[14854]: warning [WD    ] resource memory_used missing a recovery key.
    May  2 08:49:39 pve corosync[14854]: info    [WD    ] no resources configured.
    May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync watchdog service [7]
    May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Using quorum provider corosync_votequorum
    May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync configuration map access [0]
    May  2 08:49:39 pve systemd[1]: Started Corosync Cluster Engine.
    May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] This node is within the primary component and will provide service.
    May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Members[0]:
    May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
    May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: votequorum
    May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
    May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: quorum
    May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] A new membership (192.168.9.106:4) was formed. Members joined: 1
    May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Members[1]: 1
    May  2 08:49:39 pve corosync[14854]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
    May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cmap
    May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync configuration service [1]
    May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cfg
    May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
    May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cpg
    May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
    May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
    May  2 08:49:39 pve corosync[14854]:  [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
    May  2 08:49:39 pve corosync[14854]:  [WD    ] resource load_15min missing a recovery key.
    May  2 08:49:39 pve corosync[14854]:  [WD    ] resource memory_used missing a recovery key.
    May  2 08:49:39 pve corosync[14854]:  [WD    ] no resources configured.
    May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync watchdog service [7]
    May  2 08:49:39 pve corosync[14854]:  [QUORUM] Using quorum provider corosync_votequorum
    May  2 08:49:39 pve corosync[14854]:  [QUORUM] This node is within the primary component and will provide service.
    May  2 08:49:39 pve corosync[14854]:  [QUORUM] Members[0]:
    May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
    May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: votequorum
    May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
    May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: quorum
    May  2 08:49:39 pve corosync[14854]:  [TOTEM ] A new membership (192.168.9.106:4) was formed. Members joined: 1
    May  2 08:49:39 pve corosync[14854]:  [QUORUM] Members[1]: 1
    May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Completed service synchronization, ready to provide service.
    
    Thank you.
     
  6. Valerio Pachera

    Joined:
    Aug 19, 2016
    Messages:
    131
    Likes Received:
    2
    I think the only way is to follow the steps of chapter 'Separate A Node Without Reinstalling'.

    Code:
    systemctl stop pve-cluster
    systemctl stop corosync
    pmxcfs -l
    rm /etc/pve/corosync.conf
    rm /etc/corosync/*
    killall pmxcfs
    rm /var/lib/corosync/*
    
    And create the cluster from scratch with the right options:

    Code:
    systemctl start pve-cluster
    pvecm create testx -bindnet0_addr 192.168.9.106 -ring0_addr 192.168.9.106
    I tried that procedure on a test installation where I reproduce the situation.

    Note from man pvecm

    Let me know if you have better solutions.
     
    #6 Valerio Pachera, May 2, 2018
    Last edited: May 2, 2018
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice