[solved] different "pvecm nodes" output on cluster-nodes

udo · Feb 26, 2014

Hi,
during weekend I worked on the network (but not the cluster-switch). After that all Nodes switch to red in the gui.
After restarting pve-cluster on all nodes all nodes are green and I'm able to start VMs.

But now i want to change an VM on one node and can't cange the config, but from 3 other nodes it's work.
Very strange is the different output of "pvecm nodes" on all nodes (btw cluster generation is 14312):

Code:

root@proxmox1:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M  14312   2014-02-22 21:58:45  proxmox3
   2   M  14312   2014-02-22 21:58:45  proxmox2
   3   M  14200   2014-02-08 21:47:21  proxmox1
   4   M  14312   2014-02-22 21:58:45  proxmox4
   5   M  14312   2014-02-22 21:58:45  proxmox5

root@proxmox2:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M  14312   2014-02-22 21:58:45  proxmox3
   2   M  14236   2014-02-08 23:40:31  proxmox2
   3   M  14312   2014-02-22 21:58:45  proxmox1
   4   M  14308   2014-02-22 21:58:45  proxmox4
   5   M  14308   2014-02-22 21:58:45  proxmox5

root@proxmox3:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M  14164   2014-02-06 19:01:08  proxmox3
   2   M  14312   2014-02-22 21:58:45  proxmox2
   3   M  14312   2014-02-22 21:58:45  proxmox1
   4   M  14312   2014-02-22 21:58:45  proxmox4
   5   M  14312   2014-02-22 21:58:45  proxmox5

root@proxmox4:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M  14312   2014-02-22 21:58:45  proxmox3
   2   M  14308   2014-02-22 21:58:45  proxmox2
   3   M  14312   2014-02-22 21:58:45  proxmox1
   4   M  14208   2014-02-08 21:56:04  proxmox4
   5   M  14240   2014-02-08 23:30:16  proxmox5

root@proxmox5:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M  14312   2014-02-22 21:58:45  proxmox3
   2   M  14308   2014-02-22 21:58:45  proxmox2
   3   M  14312   2014-02-22 21:58:45  proxmox1
   4   M  14240   2014-02-08 23:30:16  proxmox4
   5   M  14228   2014-02-08 23:30:16  proxmox5

Quorum looks ok, but not realy:

Code:

pvecm status
Version: 6.2.0
Config Version: 5
Cluster Name: abc-cluster
Cluster Id: 62900
Cluster Member: Yes
Cluster Generation: 14312
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 5
Node votes: 1
Quorum: 3  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: proxmox4
Node ID: 4
Multicast addresses: 239.192.245.170 
Node addresses: 172.20.2.64

root@proxmox4:~# touch /etc/pve/xx
touch: cannot touch `/etc/pve/xx': Device or resource busy
root@proxmox4:~# grep cores /etc/pve/nodes/proxmox4/qemu-server/499.conf
cores: 8

# on node 1-3 the new content is displayed:
root@proxmox3:~# grep cores /etc/pve/nodes/proxmox4/qemu-server/499.conf 
cores: 2

Restarting of pve-cluster on all nodes don't help.
The version is on all nodes the same:

Code:

proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-24 (running version: 3.1-24/060bd5a6)
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-9
libpve-access-control: 3.0-8
libpve-storage-perl: 3.0-18
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

In the syslog I saw following error:

Code:

Feb 25 06:25:56 proxmox4 pmxcfs[692963]: [dcdb] notice: start cluster connection
Feb 25 06:25:56 proxmox4 pmxcfs[692963]: [dcdb] notice: members: 1/464892, 2/10102, 2/334568, 3/233316, 4/692963
Feb 25 06:25:56 proxmox4 pmxcfs[692963]: [dcdb] notice: starting data syncronisation
Feb 25 06:25:56 proxmox4 pmxcfs[692963]: [dcdb] notice: received sync request (epoch 1/464892/0001CBC4)
Feb 25 06:25:56 proxmox4 pmxcfs[692963]: [dcdb] notice: members: 1/464892, 2/10102, 2/334568, 3/233316
Feb 25 06:25:56 proxmox4 pmxcfs[692963]: [dcdb] notice: we (4/692963) left the process group
Feb 25 06:25:56 proxmox4 pmxcfs[692963]: [dcdb] crit: leaving CPG group
Feb 25 06:25:59 proxmox4 pvestatd[693607]: WARNING: unable to connect to VM 425 socket - timeout after 31 retries

Any hint, how to proceed?

Udo

udo · Feb 26, 2014

Re: different "pvecm nodes" output on cluster-nodes

Hi again,
just tried to restart cman on all nodes.
Work on node 1,2,4,5 but on the last one (3) quorum for this nodes stopped:

Code:

ssh proxmox3 "service cman restart"
Stopping cluster: 
   Stopping dlm_controld... [  OK  ]
   Stopping fenced... [  OK  ]
   Stopping cman... [  OK  ]
   Waiting for corosync to shutdown:[  OK  ]
   Unloading kernel modules... [  OK  ]
   Unmounting configfs... [  OK  ]
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]

root@proxmox3:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M  14348   2014-02-26 00:12:18  proxmox3
   2   X      0                        proxmox2
   3   X      0                        proxmox1
   4   X      0                        proxmox4
   5   X      0                        proxmox5

root@proxmox5:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X  14336                        proxmox3
   2   M  14344   2014-02-26 00:11:57  proxmox2
   3   M  14336   2014-02-26 00:11:20  proxmox1
   4   M  14336   2014-02-26 00:11:20  proxmox4
   5   M  14332   2014-02-26 00:11:20  proxmox5

/etc/pve still unchangable on proxmox4+5 (and of course now on proxmox3 also).

Udo

udo · Feb 26, 2014

Re: different "pvecm nodes" output on cluster-nodes

Found the issue!
I had different MTUs on vmbr0 (eth0) - proxmox1-3 had mtu 1500 and proxmox4+5 mtu9000.

I'm relative sure, that all nodes had jumbo frames, but if an VM start on vmbr0 (normaly not done, because is cluster network only) the mtu switched back to 1500...

After changing the mtu to 9000 on all bridges, I was able to restart cman on all nodes - the content of /etc/pve still write protected on all nodes, but after an pve-cluster restart all works like expected.

Sorry for the noise, but perhaps it's helpfull for someone else...

Udo

Search

Search

[solved] different "pvecm nodes" output on cluster-nodes

udo

Distinguished Member

udo

Distinguished Member

udo

Distinguished Member

We value your privacy