Hi
Today we provisioned a OVH dedicated server as a new node for our PVE cluster.
Upon adding the node from the UI and setting the ring0 address/link0 address, we could not get the node to join.
We tried to figure out why and finally got it to join.
However, there is a strange difference between /etc/corosync/corosync.conf and /etc/pve/corosync.conf
/etc/pve/corosync.conf is the same on all nodes, but /etc/corosync/corosync.conf is different. The newly added node has the same contents in both files
Here is diff from one of the existing nodes (diff /etc/pve/corosync.conf /etc/corosync/corosync.conf):
pvecm status shows config version 18 and transport udpu on all nodes
All previous nodes were started on PVE 5 (using the OVH proxmox5-zfs template) and then upgraded to PVE 6 by following the upgrade guide and upgrading corosync 2.x to corosync 3.x before the upgrade to pve 6
The new node was deployed also on PVE 5 (using the OVH proxmox5-zfs template) and then upgraded to PVE 6 before joining it to the cluster
Output of systemctl status pve-cluster.service on the new node:
Output of systemctl status corosync.service on the new node:
Contents of /etc/pve/corosync.conf on the new node:
All IP-addresses were replaced for security reasons
If I manually edit the /etc/corosync/corosync.conf on the new node to have knet as the transport, then corosync starts.
EDIT: Just redeployed the node and this time tried to add from the commandline using
So there seems to be some kind of confusion on whether the cluster should run udpu or knet
Today we provisioned a OVH dedicated server as a new node for our PVE cluster.
Upon adding the node from the UI and setting the ring0 address/link0 address, we could not get the node to join.
We tried to figure out why and finally got it to join.
However, there is a strange difference between /etc/corosync/corosync.conf and /etc/pve/corosync.conf
/etc/pve/corosync.conf is the same on all nodes, but /etc/corosync/corosync.conf is different. The newly added node has the same contents in both files
Here is diff from one of the existing nodes (diff /etc/pve/corosync.conf /etc/corosync/corosync.conf):
Diff:
20,25d19
< name: ns570850
< nodeid: 2
< quorum_votes: 1
< ring0_addr: 172.16.0.7
< }
< node {
51c45
< config_version: 18
---
> config_version: 20
53c47
< bindnetaddr: 172.16.0.2
---
> bindnetaddr: 172.16.0.5
58c52
< transport: udpu
---
> transport: knet
pvecm status shows config version 18 and transport udpu on all nodes
All previous nodes were started on PVE 5 (using the OVH proxmox5-zfs template) and then upgraded to PVE 6 by following the upgrade guide and upgrading corosync 2.x to corosync 3.x before the upgrade to pve 6
The new node was deployed also on PVE 5 (using the OVH proxmox5-zfs template) and then upgraded to PVE 6 before joining it to the cluster
Output of systemctl status pve-cluster.service on the new node:
Code:
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-08-28 10:43:17 CEST; 6min ago
Process: 1507 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 1531 (pmxcfs)
Tasks: 5 (limit: 4915)
Memory: 37.1M
CGroup: /system.slice/pve-cluster.service
└─1531 /usr/bin/pmxcfs
Aug 28 10:49:10 ns570850 pmxcfs[1531]: [dcdb] crit: cpg_initialize failed: 2
Aug 28 10:49:10 ns570850 pmxcfs[1531]: [status] crit: cpg_initialize failed: 2
Aug 28 10:49:16 ns570850 pmxcfs[1531]: [quorum] crit: quorum_initialize failed: 2
Aug 28 10:49:16 ns570850 pmxcfs[1531]: [confdb] crit: cmap_initialize failed: 2
Aug 28 10:49:16 ns570850 pmxcfs[1531]: [dcdb] crit: cpg_initialize failed: 2
Aug 28 10:49:16 ns570850 pmxcfs[1531]: [status] crit: cpg_initialize failed: 2
Aug 28 10:49:22 ns570850 pmxcfs[1531]: [quorum] crit: quorum_initialize failed: 2
Aug 28 10:49:22 ns570850 pmxcfs[1531]: [confdb] crit: cmap_initialize failed: 2
Aug 28 10:49:22 ns570850 pmxcfs[1531]: [dcdb] crit: cpg_initialize failed: 2
Aug 28 10:49:22 ns570850 pmxcfs[1531]: [status] crit: cpg_initialize failed: 2
Code:
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-28 10:43:17 CEST; 6min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 1646 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=8)
Main PID: 1646 (code=exited, status=8)
Aug 28 10:43:17 ns570850 systemd[1]: Starting Corosync Cluster Engine...
Aug 28 10:43:17 ns570850 corosync[1646]: [MAIN ] Corosync Cluster Engine 3.0.4 starting up
Aug 28 10:43:17 ns570850 corosync[1646]: [MAIN ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
Aug 28 10:43:17 ns570850 corosync[1646]: [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Aug 28 10:43:17 ns570850 corosync[1646]: [MAIN ] Please migrate config file to nodelist.
Aug 28 10:43:17 ns570850 corosync[1646]: [MAIN ] parse error in config: crypto_cipher & crypto_hash are only valid for the Knet transport.
Aug 28 10:43:17 ns570850 corosync[1646]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1392.
Aug 28 10:43:17 ns570850 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Aug 28 10:43:17 ns570850 systemd[1]: corosync.service: Failed with result 'exit-code'.
Aug 28 10:43:17 ns570850 systemd[1]: Failed to start Corosync Cluster Engine.
Contents of /etc/pve/corosync.conf on the new node:
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: ns3088794
nodeid: 5
quorum_votes: 1
ring0_addr: 1.0.0.6
}
node {
name: ns3128036
nodeid: 1
quorum_votes: 1
ring0_addr: 1.0.0.5
}
node {
name: ns570850
nodeid: 2
quorum_votes: 1
ring0_addr: 1.0.0.7
}
node {
name: ns61100575
nodeid: 4
quorum_votes: 1
ring0_addr: 1.0.0.3
}
node {
name: ns6136203
nodeid: 3
quorum_votes: 1
ring0_addr: 1.0.0.2
}
node {
name: ns631099096
nodeid: 6
quorum_votes: 1
ring0_addr: 1.0.0.4
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: VKD-EU
config_version: 18
interface {
bindnetaddr: 1.0.0.7
ringnumber: 0
}
ip_version: ipv4
secauth: on
transport: udpu
version: 2
}
If I manually edit the /etc/corosync/corosync.conf on the new node to have knet as the transport, then corosync starts.
EDIT: Just redeployed the node and this time tried to add from the commandline using
pvecm add 1.0.0.2 --force -link0 1.0.0.7
and this gave the same resultSo there seems to be some kind of confusion on whether the cluster should run udpu or knet
Last edited: