[SOLVED] Proxmox 3 Node Cluster with Unicast - major problems

ThinkPrivacy

Active Member
Sep 22, 2016
17
1
43
51
So first of all I tried to setup a cluster and failed because my hosting provider does not support Multicast, so I decided to try and follow the information to configure a unicast cluster and I am having a nightmare.

Here is my /etc/hosts (public IP address & domain removed)
Code:
root@pmn1:~# more /etc/hosts
127.0.0.1    localhost
*.*.*.* pmn1.mydomain.com pmn1

# Proxmox Cluster
10.91.150.134 pmn1.tp
10.91.156.172 pmn2.tp
10.91.156.173 pmn3.tp

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Here is my ifconfig (public IP address & MAC addresses removed)
Code:
eth0      Link encap:Ethernet  HWaddr xx:xx:xx:xx:xx:xx
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:24092 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4330 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8551475 (8.1 MiB)  TX bytes:2326213 (2.2 MiB)

eth1      Link encap:Ethernet  HWaddr xx:xx:xx:xx:xx:xx
          inet addr:10.91.150.134  Bcast:10.91.150.255  Mask:255.255.255.128
          inet6 addr: fe80::ec4:7aff:fe57:5c25/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:684 (684.0 B)  TX bytes:1192 (1.1 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:224 errors:0 dropped:0 overruns:0 frame:0
          TX packets:224 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:20554 (20.0 KiB)  TX bytes:20554 (20.0 KiB)

vmbr0     Link encap:Ethernet  HWaddr xx:xx:xx:xx:xx:xx
          int add:xxx.xxx.xxx.xxx  Bcast:xxx.xxx.xxx.xxx  Mask:255.255.255.0
          inet6 addr: fe80::ec4:7aff:fe57:5c24/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:24090 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4326 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8214095 (7.8 MiB)  TX bytes:2325949 (2.2 MiB)

Here is my corosync.conf:
Code:
totem {
  version: 2
  secauth: on
  cluster_name: tpcl
  config_version: 2
  ip_version: ipv4
  transport: udpu
  interface {
    ringnumber: 0
  }
}

nodelist {
  node {
    ring0_addr: pmn1.tp
    name: pmn1
    nodeid: 1
    quorum_votes: 1
  }
  node {
    ring0_addr: pmn2.tp
    name: pmn2
    nodeid: 2
    quorum_votes: 1
  }
  node {
    ring0_addr: pmn3.tp
    name: pmn3
    nodeid: 3
    quorum_votes: 1
  }
}

quorum {
  provider: corosync_votequorum
  two_node: 1
}

logging {
  to_syslog: yes
  debug: off
}

Steps I took:

1. pvecm create tpcl

2. edit /etc/pve/corosync.conf (to match the cons above)

3. pvecm status
Cluster was up but was using the wrong IP (it was using my public IP which is to be expected because the original corosync.conf was using bindnetaddr: public_ip

4. rebooted the server

5. pvecm status
Code:
root@pmn1:~# pvecm status
Cannot initialize CMAP service
root@pmn1:~#

Syslog output:
Code:
Sep 23 08:12:35 pmn1 systemd[1]: Starting The Proxmox VE cluster filesystem...
Sep 23 08:12:35 pmn1 pmxcfs[1267]: [dcdb] crit: local corosync.conf is newer
Sep 23 08:12:35 pmn1 pmxcfs[1267]: [dcdb] crit: local corosync.conf is newer
Sep 23 08:12:35 pmn1 pmxcfs[1273]: [quorum] crit: quorum_initialize failed: 2
Sep 23 08:12:35 pmn1 pmxcfs[1273]: [quorum] crit: can't initialize service
Sep 23 08:12:35 pmn1 pmxcfs[1273]: [confdb] crit: cmap_initialize failed: 2
Sep 23 08:12:35 pmn1 pmxcfs[1273]: [confdb] crit: can't initialize service
Sep 23 08:12:35 pmn1 pmxcfs[1273]: [dcdb] crit: cpg_initialize failed: 2
Sep 23 08:12:35 pmn1 pmxcfs[1273]: [dcdb] crit: can't initialize service
Sep 23 08:12:35 pmn1 pmxcfs[1273]: [status] crit: cpg_initialize failed: 2
Sep 23 08:12:35 pmn1 pmxcfs[1273]: [status] crit: can't initialise service

There was a whole bunch more in syslog but basically it kept failing up until attempt 9 and then stopped trying.

I really have no idea how to fix this and would really appreciate some help.

Thanks.
 
ok I fixed this - the clue was in the critical error about the local corosync.conf being newer. I used pmxcfs -l to mount the fuse system locally, changed the config_version in totem to 9 (because I couldn't remember which edit I was on), removed the pmxcfs lockfile and restarted pve-cluster. It then worked, so I did a reboot to make sure and it came up fine.

Went to the other 2 nodes, added them and the cluster is now working :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!