no such cluster node 'sys3' (500)

Hi,

can you send the output of

pvecm status

cat /etc/pve/.members
 
Hi,

can you send the output of

pvecm status
Code:
Quorum information
------------------
Date:             Mon Mar 20 07:10:21 2017
Quorum provider:  corosync_votequorum
Nodes:            11
Node ID:          0x00000001
Ring ID:          11/2568
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   11
Highest expected: 11
Total votes:      11
Quorum:           6  
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x0000000b          1 10.1.10.3
0x00000004          1 10.1.10.4
0x00000009          1 10.1.10.5
0x00000005          1 10.1.10.6
0x00000007          1 10.1.10.8
0x00000006          1 10.1.10.10
0x00000008          1 10.1.10.11
0x00000001          1 10.1.10.12 (local)
0x0000000a          1 10.1.10.13
0x00000003          1 10.1.10.20
0x00000002          1 10.1.10.24

cat /etc/pve/.members
Code:
{
"nodename": "sys12",
"version": 29,
"cluster": { "name": "20170226", "version": 13, "nodes": 11, "quorate": 1 },
"nodelist": {
  "sys12": { "id": 1, "online": 1, "ip": "10.1.10.12"},
  "sys20": { "id": 3, "online": 1, "ip": "10.1.10.20"},
  "sys4": { "id": 4, "online": 1, "ip": "10.1.10.4"},
  "sys24": { "id": 2, "online": 1, "ip": "10.1.10.24"},
  "sys10": { "id": 6, "online": 1, "ip": "10.1.10.10"},
  "sys8": { "id": 7, "online": 1, "ip": "10.1.10.8"},
  "fbc11": { "id": 8, "online": 1, "ip": "10.1.10.11"},
  "sys5": { "id": 9, "online": 1, "ip": "10.1.10.5"},
  "sys13": { "id": 10, "online": 1, "ip": "10.1.10.13"},
  "sys3": { "id": 11, "online": 1, "ip": "10.1.10.3"},
  "sys6": { "id": 5, "online": 1, "ip": "10.1.10.6"}
  }
}
 
Is this output from sys3?
And how exactly do you migrate to ceph?
Please can you explain in detail?
 
Is this output from sys3?
And how exactly do you migrate to ceph?
Please can you explain in detail?

that output was from another sys, here is sys3:
Code:
sys3  /etc/pve # cat .members
{
"nodename": "sys3",
"version": 23,
"cluster": { "name": "20170226", "version": 13, "nodes": 11, "quorate": 1 },
"nodelist": {
  "sys12": { "id": 1, "online": 1, "ip": "10.1.10.12"},
  "sys20": { "id": 3, "online": 1, "ip": "10.1.10.20"},
  "sys4": { "id": 4, "online": 1, "ip": "10.1.10.4"},
  "sys24": { "id": 2, "online": 1, "ip": "10.1.10.24"},
  "sys10": { "id": 6, "online": 1, "ip": "10.1.10.10"},
  "sys8": { "id": 7, "online": 1, "ip": "10.1.10.8"},
  "fbc11": { "id": 8, "online": 1, "ip": "10.1.10.11"},
  "sys5": { "id": 9, "online": 1, "ip": "10.1.10.5"},
  "sys13": { "id": 10, "online": 1, "ip": "10.1.10.13"},
  "sys3": { "id": 11, "online": 1, "ip": "10.1.10.3"},
  "sys6": { "id": 5, "online": 1, "ip": "10.1.10.6"}
  }
}
Code:
sys3  /etc/pve # pvecm status
Quorum information
------------------
Date:             Mon Mar 20 08:00:45 2017
Quorum provider:  corosync_votequorum
Nodes:            11
Node ID:          0x0000000b
Ring ID:          11/2568
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   11
Highest expected: 11
Total votes:      11
Quorum:           6  
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x0000000b          1 10.1.10.3 (local)
0x00000004          1 10.1.10.4
0x00000009          1 10.1.10.5
0x00000005          1 10.1.10.6
0x00000007          1 10.1.10.8
0x00000006          1 10.1.10.10
0x00000008          1 10.1.10.11
0x00000001          1 10.1.10.12
0x0000000a          1 10.1.10.13
0x00000003          1 10.1.10.20
0x00000002          1 10.1.10.24

For migrate to ceph -
sys3 wa the 11TH node to be added.
what I did was

1- set up networking on sys3
2- joined cluster
3- added ceph

these are my cli notes:
Code:
----------------------------------------------

JOIN CLUSTER Before setting up ceph !

pvecm add  10.1.10.12

***    JUST DO CEPH IF CPEH CLUSTER ALREADY SET UP.
 DOing new cluster  - so do this later  2/16/2017
# ceph
pveceph install -version jewel

# non ceph mon
pveceph init

# mon
pveceph createmon
 
Sorry may be I'm not clear enough.
Which command produce this error?
however when I try to migrate to ceph , this displays: no such cluster node 'sys3' (500)
 
when I tried to migrate a KVM from one node to sys3 the error displays at pve web page.

the KVM uses ceph storage.

I can migrate a KVM that uses zfspool storage to sys3.
 
more info:
Code:
sys3  ~ # aptitude search ceph|grep ^i
i   ceph                            - distributed storage and file system       
i A ceph-base                       - common ceph daemon libraries and managemen
i   ceph-common                     - common utilities to mount and interact wit
i A ceph-mon                        - monitor server for the ceph storage system
i A ceph-osd                        - OSD server for the ceph storage system   
i   libcephfs1                      - Ceph distributed file system client librar
i   python-ceph                     - Meta-package for python libraries for the
i A python-cephfs                   - Python libraries for the Ceph libcephfs li
sys3  ~ # aptitude show ceph
Package: ceph                           
State: installed
Automatically installed: no
Version: 10.2.6-1~bpo80+1
Priority: optional
Section: admin
Maintainer: Ceph Maintainers <ceph-maintainers@lists.ceph.com>
Architecture: amd64
Uncompressed Size: 1,153 k
Depends: ceph-mon (= 10.2.6-1~bpo80+1), ceph-osd (= 10.2.6-1~bpo80+1)
Recommends: ceph-mds (= 10.2.6-1~bpo80+1)
Description: distributed storage and file system
 Ceph is a massively scalable, open-source, distributed storage system that runs on commodity hardware and delivers object, block and
 file system storage.
Homepage: http://ceph.com/

Tags: role::program, uitoolkit::gtk
 
UPDATE:
I can not migrate a zpool storage kvm to sys3

I had mixed up the target . long tired weekend.

Code:
Mar 16 17:46:53 starting migration of VM 109 to node 'fbc11' (10.1.10.11)
Mar 16 17:46:53 copying disk images
Mar 16 17:46:54 migration finished successfully (duration 00:00:01)
TASK OK
 
Try to restart the pveproxy.service and pvedaemon.service on the source node.
 
no
Try to restart the pveproxy.service and pvedaemon.service on the source node.

I did this:
Code:
sys5  /fbc/adm/systemd # systemctl restart pveproxy.service
sys5  /fbc/adm/systemd # systemctl restart pvedaemon.service

and the error occured again.

at pve from sys5 migration to sys3: no such cluster node 'sys3' (500)

I can still migrate to an older node:
Code:
Virtual Environment 4.4-13/7ea56165
Virtual Machine 726 ('x2go' ) on node 'fbc11'
Logs
()
Mar 20 14:18:29 starting migration of VM 726 to node 'fbc11' (10.1.10.11)
Mar 20 14:18:29 copying disk images
Mar 20 14:18:29 starting VM 726 on remote node 'fbc11'
Mar 20 14:18:32 start remote tunnel
Mar 20 14:18:33 starting online/live migration on unix:/run/qemu-server/726.migrate
Mar 20 14:18:33 migrate_set_speed: 8589934592
Mar 20 14:18:33 migrate_set_downtime: 0.1
Mar 20 14:18:33 set migration_caps
Mar 20 14:18:33 set cachesize: 214748364
Mar 20 14:18:33 start migrate command to unix:/run/qemu-server/726.migrate
Mar 20 14:18:35 migration status: active (transferred 244672198, remaining 1552945152), total 2156732416)
Mar 20 14:18:35 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Mar 20 14:18:37 migration status: active (transferred 478412512, remaining 1315500032), total 2156732416)
Mar 20 14:18:37 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Mar 20 14:18:39 migration status: active (transferred 722417649, remaining 199012352), total 2156732416)
Mar 20 14:18:39 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
Mar 20 14:18:41 migration speed: 256.00 MB/s - downtime 6 ms
Mar 20 14:18:41 migration status: completed
Mar 20 14:18:45 migration finished successfully (duration 00:00:17)
TASK OK

Is there a way to attempt that from cli and get more debugging info?
 
Code:
# qm migrate   114  sys3 --online
no such cluster node 'sys3'

and at quick glance `man qm` does not have a rebug option that I could see.
 
I added another node - and have no issue with migration.

and now migration to sys3 works.

I think the issue had something to do with the interrupted apt upgrades.

then when a node got added things got fixed.
 
I think the issue had something to do with the interrupted apt upgrades.

Which other PVE packages where affected by the interrupted upgrade?

You should get this information about it in one of those logs:

Code:
/var/log/apt/history.log
/var/log/dpkg.log

apt/history is more an general overview and dpkg.log the whole dpkg output.

All in all a strange error.
 
Which other PVE packages where affected by the interrupted upgrade?

You should get this information about it in one of those logs:

Code:
/var/log/apt/history.log
/var/log/dpkg.log

apt/history is more an general overview and dpkg.log the whole dpkg output.

All in all a strange error.

the known issue [ https://forum.proxmox.com/threads/upgrade-hanging.33477/ ] that causes the need to kill processes - is the cause I think.

to duplicate this someone would need to run in to that issue, then add a node to cluster. I assume if using same package versions that the issue would occur.

probably the cause of the failed upgrade has been solved?