Hi,
I've got a 13-node cluster, 7 of which are just running Ceph (i.e. they don't host any VMs) and the other 6 of which are just hosting VMs. When I added the 13th host, it seemed to join the cluster just fine, and when I check pvecm status on any of the nodes, all 13 nodes show up and they have quorum. However, when I check the web interface on any of the 6 non-ceph hosts, the new node does not appear and I cannot migrate any VMs to it (it says "no such cluster node" if I try). The new 13th host *does* show up in the web interface of the 7 Ceph-only hosts, however. I've attempted to restart the pveproxy, corosync, and pve-cluster services on the hosts having the issue but I wasn't sure what else to try. (I do know that multicast is working properly, because we had a multicast configuration issue on our switches previously that caused all kinds of touble... since fixing that, the cluster has been working smoothly until this issue appeared.)
Here is my corosync.conf, which seems to be correctly propagated to all 13 hosts:
Edit:
Also this is a 4.1 cluster, with all software up-to-date in the repos. (we don't have a support key for these servers yet, but I'm working on that part...)
I've got a 13-node cluster, 7 of which are just running Ceph (i.e. they don't host any VMs) and the other 6 of which are just hosting VMs. When I added the 13th host, it seemed to join the cluster just fine, and when I check pvecm status on any of the nodes, all 13 nodes show up and they have quorum. However, when I check the web interface on any of the 6 non-ceph hosts, the new node does not appear and I cannot migrate any VMs to it (it says "no such cluster node" if I try). The new 13th host *does* show up in the web interface of the 7 Ceph-only hosts, however. I've attempted to restart the pveproxy, corosync, and pve-cluster services on the hosts having the issue but I wasn't sure what else to try. (I do know that multicast is working properly, because we had a multicast configuration issue on our switches previously that caused all kinds of touble... since fixing that, the cluster has been working smoothly until this issue appeared.)
Here is my corosync.conf, which seems to be correctly propagated to all 13 hosts:
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: vmhost85
nodeid: 13
quorum_votes: 1
ring0_addr: vmhost85
}
node {
name: vmhost84
nodeid: 8
quorum_votes: 1
ring0_addr: vmhost84
}
node {
name: vmhost82
nodeid: 6
quorum_votes: 1
ring0_addr: vmhost82
}
node {
name: vmhost-ceph-5
nodeid: 10
quorum_votes: 1
ring0_addr: vmhost-ceph-5
}
node {
name: vmhost-ceph-1
nodeid: 1
quorum_votes: 1
ring0_addr: vmhost-ceph-1
}
node {
name: vmhost-ceph-7
nodeid: 12
quorum_votes: 1
ring0_addr: vmhost-ceph-7
}
node {
name: vmhost-ceph-6
nodeid: 11
quorum_votes: 1
ring0_addr: vmhost-ceph-6
}
node {
name: vmhost81
nodeid: 5
quorum_votes: 1
ring0_addr: vmhost81
}
node {
name: vmhost-ceph-2
nodeid: 2
quorum_votes: 1
ring0_addr: vmhost-ceph-2
}
node {
name: vmhost-ceph-4
nodeid: 9
quorum_votes: 1
ring0_addr: vmhost-ceph-4
}
node {
name: vmhost83
nodeid: 7
quorum_votes: 1
ring0_addr: vmhost83
}
node {
name: vmhost-ceph-3
nodeid: 3
quorum_votes: 1
ring0_addr: vmhost-ceph-3
}
node {
name: vmhost80
nodeid: 4
quorum_votes: 1
ring0_addr: vmhost80
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: gjr-virt-stack
config_version: 13
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 10.1.2.61
ringnumber: 0
}
}
Edit:
Also this is a 4.1 cluster, with all software up-to-date in the repos. (we don't have a support key for these servers yet, but I'm working on that part...)
Last edited: