Node is showing as red

alchemyx · Apr 7, 2014

Hello,

I have a strange issue. When I log to node A via HTTP, B shows in red (but I can browse summary, storage and so on).
If I log to B via HTTP, a shows as red. Output from A:

Code:

root@proxmox-A:~# pvecm status
Version: 6.2.0
Config Version: 12
Cluster Name: KLASTER
Cluster Id: 9492
Cluster Member: Yes
Cluster Generation: 244
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 7
Flags: 
Ports Bound: 0 178  
Node name: proxmox-A
Node ID: 1
Multicast addresses: 239.192.37.57 
Node addresses: 10.10.10.1 
root@proxmox-A:~# /etc/init.d/cman status
cluster is running.
root@proxmox-A:~# cat /etc/pve/.members
{
"nodename": "proxmox-A",
"version": 3,
"cluster": { "name": "KLASTER", "version": 12, "nodes": 2, "quorate": 1 },
"nodelist": {
  "proxmox-A": { "id": 1, "online": 1, "ip": "10.10.10.1"},
  "proxmox-B": { "id": 2, "online": 0}
  }
}

And B:

Code:

root@proxmox-B:~# pvecm status
Version: 6.2.0
Config Version: 12
Cluster Name: KLASTER
Cluster Id: 9492
Cluster Member: Yes
Cluster Generation: 244
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 7
Flags: 
Ports Bound: 0 178  
Node name: proxmox-B
Node ID: 2
Multicast addresses: 239.192.37.57 
Node addresses: 10.10.10.2 
root@proxmox-B:~# /etc/init.d/cman status
cluster is running.
root@proxmox-B:~# cat /etc/pve/.members
{
"nodename": "proxmox-B",
"version": 7,
"cluster": { "name": "KLASTER", "version": 12, "nodes": 2, "quorate": 1 },
"nodelist": {
  "proxmox-A": { "id": 1, "online": 1, "ip": "10.10.10.1"},
  "proxmox-B": { "id": 2, "online": 1, "ip": "10.10.10.2"}
  }
}

So as you can see on node A, there B seems to be down. I can ping them both ways
and ssh from A to B and B to A. But still it shows as offline. It happend after some reboots
(I was testing failover). Any idea why? Those are test boxes but I would like to know
how to fix that beforge going into production.

Cluster config:

Code:

 <?xml version="1.0"?>
<cluster config_version="12" name="KLASTER">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <quorumd allow_kill="0" interval="1" label="proxmox1_qdisk" tko="10" votes="1"/>
  <totem token="54000"/>
  <fencedevices>
    <fencedevice agent="fence_ifmib" community="password" ipaddr="1.2.3.4" name="szafa-a-b" snmp_version="2c"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="proxmox-A" nodeid="1" votes="1">
      <fence>
        <method name="fence">
          <device action="off" name="szafa-a-b" port="GigabitEthernet0/21"/>
          <device action="off" name="szafa-a-b" port="GigabitEthernet0/22"/>
        </method>
      </fence>
      <unfence>
        <device action="on" name="szafa-a-b" port="GigabitEthernet0/21"/>
        <device action="on" name="szafa-a-b" port="GigabitEthernet0/22"/>
      </unfence>
    </clusternode>
    <clusternode name="proxmox-B" nodeid="2" votes="1">
      <fence>
        <method name="fence">
          <device action="off" name="szafa-a-b" port="GigabitEthernet0/23"/>
          <device action="off" name="szafa-a-b" port="GigabitEthernet0/24"/>
        </method>
      </fence>
      <unfence>
        <device action="on" name="szafa-a-b" port="GigabitEthernet0/23"/>
        <device action="on" name="szafa-a-b" port="GigabitEthernet0/24"/>
      </unfence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="107"/>
    <pvevm autostart="1" vmid="101"/>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>

dietmar · Apr 7, 2014

seems multicast is not working.

alchemyx · Apr 7, 2014

Thanks, but I am passing it trough switch that just transparently passes multicasts along (so treats them as broadcast). And nothning
really changed. I wass diging and I found that (I have shared LVM VG over iSCSI):

Code:

  --- Logical volume ---
  LV Path                /dev/shared/vm-100-disk-1
  LV Name                vm-100-disk-1
  VG Name                shared
  LV UUID                Vxf4Al-32Xs-6Byz-xxE7-0dc7-ULpX-CzoVwP
  LV Write Access        read/write
  LV Creation host, time proxmox-A, 2014-04-04 19:57:32 +0200
  LV Status              NOT available
  LV Size                32.00 GiB
  Current LE             8192
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

All other are also NOT available. vgscan fixed it. But node is still red. In syslog I have:

Code:

 Apr  7 10:40:17 proxmox-B pmxcfs[2267]: [status] crit: cpg_send_message failed: 9

I was having none VMs running on node B so I did /etc/init.d/cman restart and it did not help.
So going trough my configs I found that IGMP was not disabled so it may be true that there was
some trouble with multicasts. Also I simplified my network config (changed from tagged vlans to
one untagged vlan). When I was doing that my node got fenced. So I decied to reboot everything
and see if it happens againg.
Thank you for clues!

Search

Search

Node is showing as red

alchemyx

New Member

dietmar

Proxmox Staff Member

alchemyx

New Member

We value your privacy