[SOLVED] Adding Node to Cluster Makes Other Nodes Un-pingable

Discussion in 'Proxmox VE: Installation and configuration' started by Keyninja, Apr 15, 2019.

  1. Keyninja

    Keyninja New Member
    Proxmox Subscriber

    Joined:
    Apr 15, 2019
    Messages:
    8
    Likes Received:
    0
    Hi All. I'm trying to add a 16th node to our cluster, but when I try to add it, I get 1. locked out of the 16th node, and 2. It causes several other members of the cluster to become unpingable, as well as ALL nodes in the cluster showing as beig in Standalone mode, even though you can see them all in the list. Any ideas as to what's going on?
     
  2. oguz

    oguz Proxmox Staff Member
    Staff Member

    Joined:
    Nov 19, 2018
    Messages:
    595
    Likes Received:
    62
    Hi.

    Are all the nodes running the same PVE versions? They might not play well together if the versions are different.

    Checking these files/commands might give some insight about the situation as well:
    * /var/log/syslog
    * pvecm status
    * /etc/pve/.clusterlog
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Keyninja

    Keyninja New Member
    Proxmox Subscriber

    Joined:
    Apr 15, 2019
    Messages:
    8
    Likes Received:
    0
    Thanks oguz! I'm guessing you mean in the currently active nodes, not node12?
     
  4. oguz

    oguz Proxmox Staff Member
    Staff Member

    Joined:
    Nov 19, 2018
    Messages:
    595
    Likes Received:
    62
    I think checking it on one or two nodes should be enough, especially since /etc/pve should have the same contents on all nodes (it's synced via cluster filesystem).

    Would you mind telling how you're trying to add this node? Can you also check the PVE versions on the active and inactive nodes? `pveversion -v` should give you an output for the node you run it on.

    EDIT:
    Now that I think of it, you should also check the hostnames of all the nodes in the cluster. It might be problematic if some nodes have the same hostname, causing problems in host resolution.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. Keyninja

    Keyninja New Member
    Proxmox Subscriber

    Joined:
    Apr 15, 2019
    Messages:
    8
    Likes Received:
    0
    So, as far as their hostnames go, the schema is basically nodexx.yyyyyyyyy.com None of them should be the same. All of them have been installed with the latest 5.4 .iso, so their PVE version is all:

    proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)
    pve-manager: 5.4-3 (running version: 5.4-3/0a6eaa62)
    pve-kernel-4.15: 5.3-3
    pve-kernel-4.15.18-12-pve: 4.15.18-35
    ceph: 12.2.11-pve1
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: not correctly installed
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-8
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-50
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-13
    libpve-storage-perl: 5.0-41
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-3
    proxmox-widget-toolkit: 1.0-25
    pve-cluster: 5.0-36
    pve-container: 2.0-37
    pve-docs: 5.4-2
    pve-edk2-firmware: 1.20190312-1
    pve-firewall: 3.0-19
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-9
    pve-i18n: 1.1-4
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 2.12.1-3
    pve-xtermjs: 3.12.0-1
    qemu-server: 5.0-50
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    I'm not adding it fancily or anything, just using the web UI and copying the join information into node12's web UI "Join Cluster" dialog box. node12 comes up as the exact same version on all packages. And I have gone through all their host files and verified their public IP does indeed match up to their FQDN and hostname. None of them point to 12 or conflict with each other in any way.
     
  6. Keyninja

    Keyninja New Member
    Proxmox Subscriber

    Joined:
    Apr 15, 2019
    Messages:
    8
    Likes Received:
    0
    So, now it poisoned node1 and I had to remove that as well. It was spitting a "could not resolve node1/node12" error. I tried re-installing node12 and totally changing the IP address, incremented the hostname up to node18, and added again, with the same results. So, I removed 1, and here we are. I figure this is probably corosync related, so here's my corosync.conf.

    logging {
    debug: off
    to_syslog: yes
    }

    nodelist {
    node {
    name: node10
    nodeid: 10
    quorum_votes: 1
    ring0_addr: 10.48.5.11
    }
    node {
    name: node11
    nodeid: 11
    quorum_votes: 1
    ring0_addr: 10.48.5.12
    }
    node {
    name: node13
    nodeid: 12
    quorum_votes: 1
    ring0_addr: 10.48.5.14
    }
    node {
    name: node14
    nodeid: 13
    quorum_votes: 1
    ring0_addr: 10.48.5.15
    }
    node {
    name: node15
    nodeid: 14
    quorum_votes: 1
    ring0_addr: 10.48.5.16
    }
    node {
    name: node16
    nodeid: 15
    quorum_votes: 1
    ring0_addr: 10.48.5.17
    }
    node {
    name: node18
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.48.5.36
    }
    node {
    name: node2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.48.5.3
    }
    node {
    name: node3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.48.5.4
    }
    node {
    name: node4
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.48.5.5
    }
    node {
    name: node5
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 10.48.5.6
    }
    node {
    name: node6
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 10.48.5.7
    }
    node {
    name: node7
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 10.48.5.8
    }
    node {
    name: node8
    nodeid: 8
    quorum_votes: 1
    ring0_addr: 10.48.5.9
    }
    node {
    name: node9
    nodeid: 9
    quorum_votes: 1
    ring0_addr: 10.48.5.10
    }
    }

    quorum {
    provider: corosync_votequorum
    }

    totem {
    cluster_name: hou1-vpc1
    config_version: 25
    interface {
    bindnetaddr: 10.48.5.2
    ringnumber: 0
    }
    ip_version: ipv4
    secauth: on
    version: 2
    }
    All IP's are non-routable privates.
     
  7. Keyninja

    Keyninja New Member
    Proxmox Subscriber

    Joined:
    Apr 15, 2019
    Messages:
    8
    Likes Received:
    0
    I ended up reinstalling the cluster, it was a DNS issue. Bleh.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice