no such cluster node 'sys3' (500)

RobFantini · Mar 19, 2017

I added a new node to our cluster. sys3

I can migrate a vm to zfspool storage on sys3 .

however when I try to migrate to ceph , this displays: no such cluster node 'sys3' (500)

I can migrate to other ceph vm nodes , just not the new one.

Note I had issues with ceph upgrade see https://forum.proxmox.com/threads/ceph-issues-mostly-operator-errors.33578/page-2

wolfgang · Mar 20, 2017

Hi,

can you send the output of

pvecm status

cat /etc/pve/.members

RobFantini · Mar 20, 2017

wolfgang said:
Hi,

can you send the output of

pvecm status

Code:

Quorum information
------------------
Date:             Mon Mar 20 07:10:21 2017
Quorum provider:  corosync_votequorum
Nodes:            11
Node ID:          0x00000001
Ring ID:          11/2568
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   11
Highest expected: 11
Total votes:      11
Quorum:           6  
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x0000000b          1 10.1.10.3
0x00000004          1 10.1.10.4
0x00000009          1 10.1.10.5
0x00000005          1 10.1.10.6
0x00000007          1 10.1.10.8
0x00000006          1 10.1.10.10
0x00000008          1 10.1.10.11
0x00000001          1 10.1.10.12 (local)
0x0000000a          1 10.1.10.13
0x00000003          1 10.1.10.20
0x00000002          1 10.1.10.24

cat /etc/pve/.members

Code:

{
"nodename": "sys12",
"version": 29,
"cluster": { "name": "20170226", "version": 13, "nodes": 11, "quorate": 1 },
"nodelist": {
  "sys12": { "id": 1, "online": 1, "ip": "10.1.10.12"},
  "sys20": { "id": 3, "online": 1, "ip": "10.1.10.20"},
  "sys4": { "id": 4, "online": 1, "ip": "10.1.10.4"},
  "sys24": { "id": 2, "online": 1, "ip": "10.1.10.24"},
  "sys10": { "id": 6, "online": 1, "ip": "10.1.10.10"},
  "sys8": { "id": 7, "online": 1, "ip": "10.1.10.8"},
  "fbc11": { "id": 8, "online": 1, "ip": "10.1.10.11"},
  "sys5": { "id": 9, "online": 1, "ip": "10.1.10.5"},
  "sys13": { "id": 10, "online": 1, "ip": "10.1.10.13"},
  "sys3": { "id": 11, "online": 1, "ip": "10.1.10.3"},
  "sys6": { "id": 5, "online": 1, "ip": "10.1.10.6"}
  }
}

wolfgang · Mar 20, 2017

Is this output from sys3?
And how exactly do you migrate to ceph?
Please can you explain in detail?

RobFantini · Mar 20, 2017

wolfgang said:
Is this output from sys3?
And how exactly do you migrate to ceph?
Please can you explain in detail?

that output was from another sys, here is sys3:

Code:

sys3  /etc/pve # cat .members
{
"nodename": "sys3",
"version": 23,
"cluster": { "name": "20170226", "version": 13, "nodes": 11, "quorate": 1 },
"nodelist": {
  "sys12": { "id": 1, "online": 1, "ip": "10.1.10.12"},
  "sys20": { "id": 3, "online": 1, "ip": "10.1.10.20"},
  "sys4": { "id": 4, "online": 1, "ip": "10.1.10.4"},
  "sys24": { "id": 2, "online": 1, "ip": "10.1.10.24"},
  "sys10": { "id": 6, "online": 1, "ip": "10.1.10.10"},
  "sys8": { "id": 7, "online": 1, "ip": "10.1.10.8"},
  "fbc11": { "id": 8, "online": 1, "ip": "10.1.10.11"},
  "sys5": { "id": 9, "online": 1, "ip": "10.1.10.5"},
  "sys13": { "id": 10, "online": 1, "ip": "10.1.10.13"},
  "sys3": { "id": 11, "online": 1, "ip": "10.1.10.3"},
  "sys6": { "id": 5, "online": 1, "ip": "10.1.10.6"}
  }
}

Code:

sys3  /etc/pve # pvecm status
Quorum information
------------------
Date:             Mon Mar 20 08:00:45 2017
Quorum provider:  corosync_votequorum
Nodes:            11
Node ID:          0x0000000b
Ring ID:          11/2568
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   11
Highest expected: 11
Total votes:      11
Quorum:           6  
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x0000000b          1 10.1.10.3 (local)
0x00000004          1 10.1.10.4
0x00000009          1 10.1.10.5
0x00000005          1 10.1.10.6
0x00000007          1 10.1.10.8
0x00000006          1 10.1.10.10
0x00000008          1 10.1.10.11
0x00000001          1 10.1.10.12
0x0000000a          1 10.1.10.13
0x00000003          1 10.1.10.20
0x00000002          1 10.1.10.24

For migrate to ceph -
sys3 wa the 11TH node to be added.
what I did was

1- set up networking on sys3
2- joined cluster
3- added ceph

these are my cli notes:

Code:

----------------------------------------------

JOIN CLUSTER Before setting up ceph !

pvecm add  10.1.10.12

***    JUST DO CEPH IF CPEH CLUSTER ALREADY SET UP.
 DOing new cluster  - so do this later  2/16/2017
# ceph
pveceph install -version jewel

# non ceph mon
pveceph init

# mon
pveceph createmon

wolfgang · Mar 20, 2017

Sorry may be I'm not clear enough.
Which command produce this error?

RobFantini said:
however when I try to migrate to ceph , this displays: no such cluster node 'sys3' (500)

RobFantini · Mar 20, 2017

when I tried to migrate a KVM from one node to sys3 the error displays at pve web page.

the KVM uses ceph storage.

I can migrate a KVM that uses zfspool storage to sys3.

RobFantini · Mar 20, 2017

more info:

Code:

sys3  ~ # aptitude search ceph|grep ^i
i   ceph                            - distributed storage and file system       
i A ceph-base                       - common ceph daemon libraries and managemen
i   ceph-common                     - common utilities to mount and interact wit
i A ceph-mon                        - monitor server for the ceph storage system
i A ceph-osd                        - OSD server for the ceph storage system   
i   libcephfs1                      - Ceph distributed file system client librar
i   python-ceph                     - Meta-package for python libraries for the
i A python-cephfs                   - Python libraries for the Ceph libcephfs li
sys3  ~ # aptitude show ceph
Package: ceph                           
State: installed
Automatically installed: no
Version: 10.2.6-1~bpo80+1
Priority: optional
Section: admin
Maintainer: Ceph Maintainers <ceph-maintainers@lists.ceph.com>
Architecture: amd64
Uncompressed Size: 1,153 k
Depends: ceph-mon (= 10.2.6-1~bpo80+1), ceph-osd (= 10.2.6-1~bpo80+1)
Recommends: ceph-mds (= 10.2.6-1~bpo80+1)
Description: distributed storage and file system
 Ceph is a massively scalable, open-source, distributed storage system that runs on commodity hardware and delivers object, block and
 file system storage.
Homepage: http://ceph.com/

Tags: role::program, uitoolkit::gtk

RobFantini · Mar 20, 2017

UPDATE:
I can not migrate a zpool storage kvm to sys3

I had mixed up the target . long tired weekend.

Code:

Mar 16 17:46:53 starting migration of VM 109 to node 'fbc11' (10.1.10.11)
Mar 16 17:46:53 copying disk images
Mar 16 17:46:54 migration finished successfully (duration 00:00:01)
TASK OK

wolfgang · Mar 20, 2017

Try to restart the pveproxy.service and pvedaemon.service on the source node.

RobFantini · Mar 20, 2017

no

wolfgang said:
Try to restart the pveproxy.service and pvedaemon.service on the source node.

I did this:

Code:

sys5  /fbc/adm/systemd # systemctl restart pveproxy.service
sys5  /fbc/adm/systemd # systemctl restart pvedaemon.service

and the error occured again.

at pve from sys5 migration to sys3: no such cluster node 'sys3' (500)

I can still migrate to an older node:

Code:

Virtual Environment 4.4-13/7ea56165
Virtual Machine 726 ('x2go' ) on node 'fbc11'
Logs
()
Mar 20 14:18:29 starting migration of VM 726 to node 'fbc11' (10.1.10.11)
Mar 20 14:18:29 copying disk images
Mar 20 14:18:29 starting VM 726 on remote node 'fbc11'
Mar 20 14:18:32 start remote tunnel
Mar 20 14:18:33 starting online/live migration on unix:/run/qemu-server/726.migrate
Mar 20 14:18:33 migrate_set_speed: 8589934592
Mar 20 14:18:33 migrate_set_downtime: 0.1
Mar 20 14:18:33 set migration_caps
Mar 20 14:18:33 set cachesize: 214748364
Mar 20 14:18:33 start migrate command to unix:/run/qemu-server/726.migrate
Mar 20 14:18:35 migration status: active (transferred 244672198, remaining 1552945152), total 2156732416)
Mar 20 14:18:35 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Mar 20 14:18:37 migration status: active (transferred 478412512, remaining 1315500032), total 2156732416)
Mar 20 14:18:37 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Mar 20 14:18:39 migration status: active (transferred 722417649, remaining 199012352), total 2156732416)
Mar 20 14:18:39 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Mar 20 14:18:41 migration speed: 256.00 MB/s - downtime 6 ms
Mar 20 14:18:41 migration status: completed
Mar 20 14:18:45 migration finished successfully (duration 00:00:17)
TASK OK

Is there a way to attempt that from cli and get more debugging info?

udo · Mar 20, 2017

RobFantini said:
Is there a way to attempt that from cli and get more debugging info?

Hi Rob,
of course:

Code:

qm migrate 726 sys3 --online
# if you VM use local storage too append --with-local-disks

Udo

RobFantini · Mar 20, 2017

Code:

# qm migrate   114  sys3 --online
no such cluster node 'sys3'

and at quick glance `man qm` does not have a rebug option that I could see.

RobFantini · Mar 20, 2017

Look we do not use keys . could that be part of the issue?

RobFantini · Mar 21, 2017

I added another node - and have no issue with migration.

and now migration to sys3 works.

I think the issue had something to do with the interrupted apt upgrades.

then when a node got added things got fixed.

t.lamprecht · Mar 21, 2017

RobFantini said:
I think the issue had something to do with the interrupted apt upgrades.

Which other PVE packages where affected by the interrupted upgrade?

You should get this information about it in one of those logs:

Code:

/var/log/apt/history.log
/var/log/dpkg.log

apt/history is more an general overview and dpkg.log the whole dpkg output.

All in all a strange error.

RobFantini · Mar 21, 2017

t.lamprecht said:
Which other PVE packages where affected by the interrupted upgrade?

You should get this information about it in one of those logs:

Code:

/var/log/apt/history.log /var/log/dpkg.log

apt/history is more an general overview and dpkg.log the whole dpkg output.

All in all a strange error.

the known issue [ https://forum.proxmox.com/threads/upgrade-hanging.33477/ ] that causes the need to kill processes - is the cause I think.

to duplicate this someone would need to run in to that issue, then add a node to cluster. I assume if using same package versions that the issue would occur.

probably the cause of the failed upgrade has been solved?

Search

Search

no such cluster node 'sys3' (500)

RobFantini

Famous Member

wolfgang

Proxmox Retired Staff

RobFantini

Famous Member

wolfgang

Proxmox Retired Staff

RobFantini

Famous Member

wolfgang

Proxmox Retired Staff

RobFantini

Famous Member

RobFantini

Famous Member

RobFantini

Famous Member

wolfgang

Proxmox Retired Staff

RobFantini

Famous Member

udo

Distinguished Member

RobFantini

Famous Member

RobFantini

Famous Member

RobFantini

Famous Member

t.lamprecht

Proxmox Staff Member

RobFantini

Famous Member

We value your privacy