I am completely puzzled. I started with 2 nodes (Prox1 and Prox2), no issues, they work great. I found some spare hardware and created a temporary third node (prox3), no issues. I grabbed a small mini pc for a forth node (just to run Mikrotik The Dude separate from the other servers). The 4th node (prox4) when ever I added it, it hung the web interfaces on nodes 1, 2 and 3. So I manually deleted node 4, life was fine.
I now have a proper server of equal power as prox1 and prox2. This node will be called prox5.
Every time I go to add prox5 (via web and command line) I get the following errors:
Web:
Establishing API connection with host '10.35.35.71'
TASK ERROR: 500 Can't connect to 10.35.35.71:8006
Command line:
root@prox5:~# pvecm add 10.35.35.71 --use_ssh
unable to copy ssh ID: exit code 1
root@prox5:~# pvecm add 10.35.35.71
Please enter superuser (root) password for '10.35.35.71': *********
Establishing API connection with host '10.35.35.71'
500 Can't connect to 10.35.35.71:8006
root@prox5:~# telnet 10.35.35.71 8006
Trying 10.35.35.71...
Connected to 10.35.35.71.
Escape character is '^]'.
^]
telnet> Connection closed.
I have updated all servers to ensure they are the same. I have restarted all services (I have not restarted any of the nodes). I am using 10Gb/s to connect all nodes (with MTU =9000).
I can ssh to all servers. I can telnet to 8006.
Any suggestions on how to this 4th node (prox5)? I am completely stumped.
Thank you!
root@prox1:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-helper: 7.2-4
pve-kernel-5.15: 7.2-3
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
root@prox1:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: prox1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.35.35.71
}
node {
name: prox2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.35.35.72
}
node {
name: prox3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.35.35.61
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: Thinkers
config_version: 7
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
root@prox1:~# systemctl status pve-cluster
systemctl status corosync.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-06-13 15:59:15 EDT; 1h 26min ago
Process: 3203187 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 3203188 (pmxcfs)
Tasks: 6 (limit: 77159)
Memory: 45.4M
CPU: 7.097s
CGroup: /system.slice/pve-cluster.service
└─3203188 /usr/bin/pmxcfs
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [dcdb] notice: waiting for updates from leader
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [status] notice: received all states
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [status] notice: all data is up to date
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [dcdb] notice: update complete - trying to commit (got 3 inode updates)
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [dcdb] notice: all data is up to date
Jun 13 16:01:31 prox1 pmxcfs[3203188]: [status] notice: received log
Jun 13 16:24:17 prox1 pmxcfs[3203188]: [status] notice: received log
Jun 13 16:39:17 prox1 pmxcfs[3203188]: [status] notice: received log
Jun 13 16:59:14 prox1 pmxcfs[3203188]: [dcdb] notice: data verification successful
Jun 13 17:11:50 prox1 pmxcfs[3203188]: [status] notice: received log
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-06-13 15:59:16 EDT; 1h 26min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 3203329 (corosync)
Tasks: 9 (limit: 77159)
Memory: 132.5M
CPU: 1min 6.668s
CGroup: /system.slice/corosync.service
└─3203329 /usr/sbin/corosync -f
Jun 13 15:59:18 prox1 corosync[3203329]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jun 13 15:59:18 prox1 corosync[3203329]: [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 8885
Jun 13 15:59:18 prox1 corosync[3203329]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 8885
Jun 13 15:59:18 prox1 corosync[3203329]: [KNET ] pmtud: Global data MTU changed to: 8885
Jun 13 15:59:19 prox1 corosync[3203329]: [QUORUM] Sync members[3]: 1 2 3
Jun 13 15:59:19 prox1 corosync[3203329]: [QUORUM] Sync joined[2]: 2 3
Jun 13 15:59:19 prox1 corosync[3203329]: [TOTEM ] A new membership (1.10b) was formed. Members joined: 2 3
Jun 13 15:59:19 prox1 corosync[3203329]: [QUORUM] This node is within the primary component and will provide service.
Jun 13 15:59:19 prox1 corosync[3203329]: [QUORUM] Members[3]: 1 2 3
Jun 13 15:59:19 prox1 corosync[3203329]: [MAIN ] Completed service synchronization, ready to provide service.
I now have a proper server of equal power as prox1 and prox2. This node will be called prox5.
Every time I go to add prox5 (via web and command line) I get the following errors:
Web:
Establishing API connection with host '10.35.35.71'
TASK ERROR: 500 Can't connect to 10.35.35.71:8006
Command line:
root@prox5:~# pvecm add 10.35.35.71 --use_ssh
unable to copy ssh ID: exit code 1
root@prox5:~# pvecm add 10.35.35.71
Please enter superuser (root) password for '10.35.35.71': *********
Establishing API connection with host '10.35.35.71'
500 Can't connect to 10.35.35.71:8006
root@prox5:~# telnet 10.35.35.71 8006
Trying 10.35.35.71...
Connected to 10.35.35.71.
Escape character is '^]'.
^]
telnet> Connection closed.
I have updated all servers to ensure they are the same. I have restarted all services (I have not restarted any of the nodes). I am using 10Gb/s to connect all nodes (with MTU =9000).
I can ssh to all servers. I can telnet to 8006.
Any suggestions on how to this 4th node (prox5)? I am completely stumped.
Thank you!
root@prox1:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-helper: 7.2-4
pve-kernel-5.15: 7.2-3
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
root@prox1:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: prox1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.35.35.71
}
node {
name: prox2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.35.35.72
}
node {
name: prox3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.35.35.61
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: Thinkers
config_version: 7
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
root@prox1:~# systemctl status pve-cluster
systemctl status corosync.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-06-13 15:59:15 EDT; 1h 26min ago
Process: 3203187 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 3203188 (pmxcfs)
Tasks: 6 (limit: 77159)
Memory: 45.4M
CPU: 7.097s
CGroup: /system.slice/pve-cluster.service
└─3203188 /usr/bin/pmxcfs
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [dcdb] notice: waiting for updates from leader
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [status] notice: received all states
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [status] notice: all data is up to date
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [dcdb] notice: update complete - trying to commit (got 3 inode updates)
Jun 13 15:59:20 prox1 pmxcfs[3203188]: [dcdb] notice: all data is up to date
Jun 13 16:01:31 prox1 pmxcfs[3203188]: [status] notice: received log
Jun 13 16:24:17 prox1 pmxcfs[3203188]: [status] notice: received log
Jun 13 16:39:17 prox1 pmxcfs[3203188]: [status] notice: received log
Jun 13 16:59:14 prox1 pmxcfs[3203188]: [dcdb] notice: data verification successful
Jun 13 17:11:50 prox1 pmxcfs[3203188]: [status] notice: received log
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-06-13 15:59:16 EDT; 1h 26min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 3203329 (corosync)
Tasks: 9 (limit: 77159)
Memory: 132.5M
CPU: 1min 6.668s
CGroup: /system.slice/corosync.service
└─3203329 /usr/sbin/corosync -f
Jun 13 15:59:18 prox1 corosync[3203329]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jun 13 15:59:18 prox1 corosync[3203329]: [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 8885
Jun 13 15:59:18 prox1 corosync[3203329]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 8885
Jun 13 15:59:18 prox1 corosync[3203329]: [KNET ] pmtud: Global data MTU changed to: 8885
Jun 13 15:59:19 prox1 corosync[3203329]: [QUORUM] Sync members[3]: 1 2 3
Jun 13 15:59:19 prox1 corosync[3203329]: [QUORUM] Sync joined[2]: 2 3
Jun 13 15:59:19 prox1 corosync[3203329]: [TOTEM ] A new membership (1.10b) was formed. Members joined: 2 3
Jun 13 15:59:19 prox1 corosync[3203329]: [QUORUM] This node is within the primary component and will provide service.
Jun 13 15:59:19 prox1 corosync[3203329]: [QUORUM] Members[3]: 1 2 3
Jun 13 15:59:19 prox1 corosync[3203329]: [MAIN ] Completed service synchronization, ready to provide service.