Node is showing as red

alchemyx

New Member
Apr 2, 2014
7
0
1
Hello,

I have a strange issue. When I log to node A via HTTP, B shows in red (but I can browse summary, storage and so on).
If I log to B via HTTP, a shows as red. Output from A:

Code:
root@proxmox-A:~# pvecm status
Version: 6.2.0
Config Version: 12
Cluster Name: KLASTER
Cluster Id: 9492
Cluster Member: Yes
Cluster Generation: 244
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 7
Flags: 
Ports Bound: 0 178  
Node name: proxmox-A
Node ID: 1
Multicast addresses: 239.192.37.57 
Node addresses: 10.10.10.1 
root@proxmox-A:~# /etc/init.d/cman status
cluster is running.
root@proxmox-A:~# cat /etc/pve/.members
{
"nodename": "proxmox-A",
"version": 3,
"cluster": { "name": "KLASTER", "version": 12, "nodes": 2, "quorate": 1 },
"nodelist": {
  "proxmox-A": { "id": 1, "online": 1, "ip": "10.10.10.1"},
  "proxmox-B": { "id": 2, "online": 0}
  }
}

And B:
Code:
root@proxmox-B:~# pvecm status
Version: 6.2.0
Config Version: 12
Cluster Name: KLASTER
Cluster Id: 9492
Cluster Member: Yes
Cluster Generation: 244
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 7
Flags: 
Ports Bound: 0 178  
Node name: proxmox-B
Node ID: 2
Multicast addresses: 239.192.37.57 
Node addresses: 10.10.10.2 
root@proxmox-B:~# /etc/init.d/cman status
cluster is running.
root@proxmox-B:~# cat /etc/pve/.members
{
"nodename": "proxmox-B",
"version": 7,
"cluster": { "name": "KLASTER", "version": 12, "nodes": 2, "quorate": 1 },
"nodelist": {
  "proxmox-A": { "id": 1, "online": 1, "ip": "10.10.10.1"},
  "proxmox-B": { "id": 2, "online": 1, "ip": "10.10.10.2"}
  }
}

So as you can see on node A, there B seems to be down. I can ping them both ways
and ssh from A to B and B to A. But still it shows as offline. It happend after some reboots
(I was testing failover). Any idea why? Those are test boxes but I would like to know
how to fix that beforge going into production.

Cluster config:

Code:
 <?xml version="1.0"?>
<cluster config_version="12" name="KLASTER">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10" votes="1"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ifmib" community="password" ipaddr="1.2.3.4" name="szafa-a-b" snmp_version="2c"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="proxmox-A" nodeid="1" votes="1">
      <fence>
        <method name="fence">
          <device action="off" name="szafa-a-b" port="GigabitEthernet0/21"/>
          <device action="off" name="szafa-a-b" port="GigabitEthernet0/22"/>
        </method>
      </fence>
      <unfence>
        <device action="on" name="szafa-a-b" port="GigabitEthernet0/21"/>
        <device action="on" name="szafa-a-b" port="GigabitEthernet0/22"/>
      </unfence>
    </clusternode>
    <clusternode name="proxmox-B" nodeid="2" votes="1">
      <fence>
        <method name="fence">
          <device action="off" name="szafa-a-b" port="GigabitEthernet0/23"/>
          <device action="off" name="szafa-a-b" port="GigabitEthernet0/24"/>
        </method>
      </fence>
      <unfence>
        <device action="on" name="szafa-a-b" port="GigabitEthernet0/23"/>
        <device action="on" name="szafa-a-b" port="GigabitEthernet0/24"/>
      </unfence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="107"/>
    <pvevm autostart="1" vmid="101"/>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>
 
Thanks, but I am passing it trough switch that just transparently passes multicasts along (so treats them as broadcast). And nothning
really changed. I wass diging and I found that (I have shared LVM VG over iSCSI):

Code:
  --- Logical volume ---
  LV Path                /dev/shared/vm-100-disk-1
  LV Name                vm-100-disk-1
  VG Name                shared
  LV UUID                Vxf4Al-32Xs-6Byz-xxE7-0dc7-ULpX-CzoVwP
  LV Write Access        read/write
  LV Creation host, time proxmox-A, 2014-04-04 19:57:32 +0200
  LV Status              NOT available
  LV Size                32.00 GiB
  Current LE             8192
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

All other are also NOT available. vgscan fixed it. But node is still red. In syslog I have:

Code:
 Apr  7 10:40:17 proxmox-B pmxcfs[2267]: [status] crit: cpg_send_message failed: 9

I was having none VMs running on node B so I did /etc/init.d/cman restart and it did not help.
So going trough my configs I found that IGMP was not disabled so it may be true that there was
some trouble with multicasts. Also I simplified my network config (changed from tagged vlans to
one untagged vlan). When I was doing that my node got fenced. So I decied to reboot everything
and see if it happens againg.
Thank you for clues!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!