[SOLVED] Added a new node to cluster, cant migrate VMs over: no such cluster node

mailinglists

Renowned Member
Mar 14, 2012
641
67
93
On cluster with nodes running:
Code:
Linux 4.15.18-16-pve #1 SMP PVE 4.15.18-41 (Tue, 18 Jun 2019 07:36:54 +0200) 
pve-manager/5.4-7/fc10404a

I have added a new node, running:
Code:
Linux 4.15.18-19-pve #1 SMP PVE 4.15.18-45 (Fri, 26 Jul 2019 09:34:08 +0200) 
pve-manager/5.4-13/aee6f0ec

I have added it simply with pvecm add IPaddressoffirstclusternode.

All seemed OK:
Code:
root@p27:~# pvecm status
Quorum information
------------------
Date:             Mon Aug 12 19:03:58 2019
Quorum provider:  corosync_votequorum
Nodes:            7
Node ID:          0x00000004
Ring ID:          1/32560
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   8
Highest expected: 8
Total votes:      7
Quorum:           5  
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.31.1.24
0x00000002          1 10.31.1.25
0x00000004          1 10.31.1.27 (local)
0x00000005          1 10.31.1.28
0x00000006          1 10.31.1.29
0x00000007          1 10.31.1.30
0x00000008          1 10.31.1.31
root@p27:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 p24
         2          1 10.31.1.25
         4          1 p27 (local)
         5          1 10.31.1.28
         6          1 10.31.1.29
         7          1 10.31.1.30
         8          1 p31.c.mydomain.com

While i do not like how nodes get different names when added (probably pvecm add decides on them - I would prefer IP to be used every time), there is a different problem. When I wanted to migrate a VM to new node, I got this error:
Code:
root@p27:~# qm migrate 140 p
p24  p25  p26  p28  p29  p30  
root@p27:~# qm migrate 140 p31
no such cluster node 'p31'

How can I debug this further?
Why does qm not see desired destination node?
 
Is it possible that your new node got the name p26 instead of p31? That one shows up in your auto-completion.
You can check
Code:
/etc/pve/corosync.conf
to make sure.
 
Hi Fabian_E

tnx for replying.

p26 is another node, which is currently shut down, but has not been removed from cluster until someone deletes it data completely. I will remove it from the cluster only then, so that nothing bad happens in case if someone accidentally starts it up.

Corosync.conf on p24 looks fine:
Code:
root@p24:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: p24
    nodeid: 1
    quorum_votes: 1
    ring0_addr: p24
  }
  node {
    name: p25
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.31.1.25
  }
  node {
    name: p26
    nodeid: 3
    quorum_votes: 1
    ring0_addr: p26
  }
  node {
    name: p27
    nodeid: 4
    quorum_votes: 1
    ring0_addr: p27
  }
  node {
    name: p28
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 10.31.1.28
  }
  node {
    name: p29
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 10.31.1.29
  }
  node {
    name: p30
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 10.31.1.30
  }
  node {
    name: p31
    nodeid: 8
    quorum_votes: 1
    ring0_addr: 10.31.1.31
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Optimus15
  config_version: 10
  interface {
    bindnetaddr: 10.31.1.24
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}
While I do not like how a few nodes are referenced by uname and the rest by IP, it should not matter.

After testing some more I noticed that pvecm nodes shows different node name for p31 on different nodes.

From p24 where migration works:
Code:
8          1 10.31.1.31
From p25 where migration works:
Code:
8          1 10.31.1.31
From p27 where migration fails:
Code:
8          1 p31.c.mydomain.com
From p28 where migration should work as we can see p31:
Code:
8          1 10.31.1.31
(It really does not work, because of ssh public keys not accepted, but that seems to be another issue)
From p29 where migration fails:
Code:
8          1 p31.c.mydomain.com
From p30 where migration fails:
Code:
8          1 p31.c.mydomain.com
etc ..

So the thing in common to all sources where I can not migrate to p31 is that pvecm nodes shows this node with it's FQDN.

So I guess I have to check why i it got different name on different nodes and how to fix that.
Any help will be appreciated.
 
Last edited:
LOL, after restarting corosync on p27 I can now see p31 as a migration destination.
No other config change was made. Might be a bug.
 
I am marking this as solved.
Corosync restart changed the output from pvecm nodes to values as listed in corosync.conf and everything works yet again.
I did not change anything in config, just restarted the service. I was lucky, this cluster has no HA.
Still think this might be a bug with pvecm add.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!