[TUTORIAL] [DHCP] Cluster deployment

esi_y

Renowned Member
Nov 29, 2023
1,717
264
63
github.com
This is one of the most common unwelcome surprises for the uninitiated of this day and age when first-time installing PVE: static network configuration [1]. And why is it necessary? Well, it is NOT.

NOTE: This guide may ALSO be used to setup a SINGLE NODE. Simply do NOT follow the instructions on the Clustering part anymore. You can choose between Debian, ISO or Auto installation. If you cannot satisfy the prerequisites below, you may still check simplified guide on single node setup with mDNS.

DISCLAIMER This is NOT a guide that suggests continuously recycling IP addresses with short leases, or casually changing hostnames during the normal course of cluster operation.

You WILL suffer from the same issues should your node IP or hostname change as with static configuration in terms of managing the transition. While it actually is possible to change both without a reboot (more on that below), the intended use case is to cover a rather static environment, but allow for centralised management.

Be it a simple homelab install where you just want to keep everything tidy with DHCP (why should PVE be special?), or a professional deployment where you already have well set up infrastructure (what is the DHCP & DNS for?) and cannot afford the headaches with an individually misconfigured node, or a testbed where you assemble and dismantle a cluster in no time several times over (and Ansible feels so wrong to manage this), you will want to have network configuration managed outside of the nodes, if for no other reason, at least to be able to connect to them reliably from outside the cluster, or even just to discover them.



1. Prerequisites - DHCP & DNS


The steps below assume that the nodes:
  • have reserved their IP address (their specific NICs by their MAC address) with reasonable lease time; and
  • hostname (handed out via DHCP Option 12); and
  • can reliably resolve their hostname via DNS lookup (nameserver handed out via DHCP Option 6)
latest before you start adding them to the cluster and at all times after.

You will be able to verify this during the deployment:

ip -c a (excerpt):
Code:
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether aa:bb:cc:dd:ee:ff brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.101/24 brd 10.10.10.255 scope global dynamic enp1s0


hostnamectl (excerpt):
Code:
Static hostname: localhost
Transient hostname: pve1
The important is the transient hostname, static hostname may be either localhost or unset (when /etc/hostname is missing).


dig nodename (excerpt):
Code:
root@pve1:~# dig pve1

;; ANSWER SECTION:
pve1.            50    IN    A    10.10.10.101


You can essentially verify all is well the same way the official guide [2] actually suggests:
hostname --ip-address

There should be no loopback (127.*.*.*) addresses in its output.



Notes - DHCP & DNS setup


Taking dnsmasq [3] for an example, you will need at least the equivalent of the following (excerpt):

Code:
dhcp-range=set:DEMO_NET,10.10.10.100,10.10.10.199,255.255.255.0,1d
domain=demo.internal,10.10.10.0/24,local
dhcp-option=tag:DEMO_NET,option:domain-name,demo.internal
dhcp-option=tag:DEMO_NET,option:router,10.10.10.1
dhcp-option=tag:DEMO_NET,option:dns-server,10.10.10.11

dhcp-host=aa:bb:cc:dd:ee:ff,set:DEMO_NET,10.10.10.101
host-record=pve1.demo.internal,10.10.10.101

I am well aware most of the readers of this tutorial will be instead using some sort of appliance that can achieve the same. Although out of scope for this tutorial, e.g. my favourite VyOS [4] allows for this an error-proof way, but I am happy to add further sections here based on your feedback (OPNSense?).


[1] https://pve.proxmox.com/wiki/Network_Configuration#_choosing_a_network_configuration
[2] https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm
[3] https://wiki.debian.org/dnsmasq
[4] https://docs.vyos.io/en/latest/conf...fgcmd-set-service-dhcp-server-hostfile-update
 
Last edited:
1. A. ISO Install


The ISO install [1] leaves you with static configuration.

Remove the static configuration from /etc/network/interfaces - your vmbr0 will look like this (excerpt):
Code:
iface vmbr0 inet dhcp
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0

Remove the hostname entry from /etc/hosts. and remove the hostname file: rm /etc/hostname

See section on Debian install for more details.



2. B. Install on top of Debian


There is official Debian installation walkthrough [2], simply skip the intial static part, i.e. install plain (i.e. with DHCP) Debian. You can fill in any hostname, (even localhost) and any domain (or no domain at all) to the installer.

After the installation, remove the 127.0.1.1 hostname entry from /etc/hosts upon the first boot.

Your /etc/hosts should be plain like this:
Code:
127.0.0.1       localhost
# NOTE: Non-loopback lookup managed via DNS

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Further, remove the static hostname file: rm /etc/hostname

The static hostname will be unset and the transient one will start showing in hostnamectl output.

NOTE: If your initially chosen hostname was localhost, you could get away with keeping this file populated, actually.

This is also where you should actually start the official guide [2], i.e. "Install Proxmox VE", i.e.: Install Debian as usual; add PVE apt repository; install PVE kernel, proxmox-ve, etc.; remove stock kernel, os-prober; refresh bootloader, configure networking for guests as necessary.



2. C. Auto Install & Ansible reconfiguration


If you deploy many nodes at once, or deploy them often, you will find this option in a separate post below.





[1] https://pve.proxmox.com/wiki/Installation#installation_installer
[2] https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm
 
Last edited:
3. A. Clustering


Unfortunately, PVE tooling populates the cluster configuration (corosync.conf [3]) with resolved IP addresses upon the inception.

Say you are creating a cluster from scratch (for brevity, all CLI only):

Code:
root@pve1:~# pvecm create demo-cluster
Corosync Cluster Engine Authentication key generator.
Gathering 2048 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.
Writing corosync config to /etc/pve/corosync.conf
Restart corosync and cluster filesystem

While all is well, the hostname got resolved and put into cluster configuration as an IP address:

Code:
root@pve1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.101
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: demo-cluster
  config_version: 1
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

This will of course work just fine, but It defeats our purpose. You may choose to do the following now (one by one as nodes are added), or may defer the repetitive work till you gather all nodes into your cluster.

All there is to do is to replace the ringX_addr with the hostname. The official docs [4] are rather opinionated how you should do this.

NOTE: Be sure to include the domain as well in case your nodes do not share one. Do NOT change the name entry for the node.

At any point, you may check journalctl -u pve-cluster to see that all went well:

Code:
[dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 2)
[status] notice: update cluster info (cluster name  demo-cluster, version = 2)

Now, when you are going to add a second node (sample output below from "one by one" approach) to the cluster (in CLI, this is done counter-intuively from to-be-added node referencing a node already in the cluster):

Code:
root@pve2:~# pvecm add pve1.demo.internal
Please enter superuser (root) password for 'pve1.demo.internal': **********
Establishing API connection with host 'pve1.demo.internal'
The authenticity of host 'pve1.demo.internal' can't be established.
X509 SHA256 key fingerprint is 52:13:D6:A1:F5:7B:46:F5:2E:A9:F5:62:A4:19:D8:07:71:96:D1:30:F2:2E:B7:6B:0A:24:1D:12:0A:75:AB:7E.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
check cluster join API version
No cluster network links passed explicitly, fallback to local node IP '10.10.10.102'
Request addition of this node
cluster: warning: ring0_addr 'pve1.demo.internal' for node 'pve1' resolves to '10.10.10.101' - consider replacing it with the currently resolved IP address for stability
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1726922870.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'pve2' to cluster.

It hints you about using the resolved IP as static entry (fallback to local node IP '10.10.10.102') for this action (despite hostname was provided) and indeed you would have to change this second incarnation of corosync.conf again.

So your nodelist (after the second change) should look like this:

Code:
nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: pve1.demo.internal
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: pve2.demo.internal
  }
}

NOTE: If you wonder about the misleading messages on "stability" (consider replacing it with the currently resolved IP address for stability) and how corosync actually supports resolving names, you may wish to consult [3] (excerpt):

ADDRESS RESOLUTION​


corosync resolves ringX_addr names/IP addresses using the getaddrinfo(3) call with respect of totem.ip_version setting.

getaddrinfo() function uses a sophisticated algorithm to sort node addresses into a preferred order and corosync always chooses the first address in that list of the required family. As such it is essential that your DNS or /etc/hosts files are correctly configured so that all addresses for ringX appear on the same network (or are reachable with minimal hops) and over the same IP protocol.

NOTE: At this point, it is suitable to point out the importance of ip_version parameter (defaults to ipv6-4 when unspecified, but PVE actually populates it to ipv4-6) [3], but also the configuration of hosts in nsswitch.conf [5].

You may want to check if everything is alright and well with your cluster at this point, either with bespoke pvecm status [6] or generic corosync-cfgtool [7]. Note you will still see IP addresses and IDs in this output, as they got resolved.



[2] https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm
[3] https://manpages.debian.org/bookworm/corosync/corosync.conf.5.en.html
[4] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_edit_corosync_conf
[5] https://manpages.debian.org/bookworm/manpages/nsswitch.conf.5.en.html#hosts
[6] https://pve.proxmox.com/wiki/Cluster_Manager
[7] https://manpages.debian.org/bookworm/corosync/corosync-cfgtool.8.en.html



3. B. Clustering - Corosync operation - word of caution


Particularly useful to check at any time is netstat -pan | egrep '5405.*corosync' [1] (you mat need to apt install net-tools), this is especially true if you are wondering why your node is missing from a cluster. Why could this happen? If you have improperly configured your DHCP and your node suddenly gets a new IP leased, corosync will NOT automatically take this into account:

Code:
DHCPREQUEST for 10.10.10.103 on vmbr0 to 10.10.10.11 port 67
DHCPNAK from 10.10.10.11
DHCPDISCOVER on vmbr0 to 255.255.255.255 port 67 interval 4
DHCPOFFER of 10.10.10.113 from 10.10.10.11
DHCPREQUEST for 10.10.10.113 on vmbr0 to 255.255.255.255 port 67
DHCPACK of 10.10.10.113 from 10.10.10.11
bound to 10.10.10.113 -- renewal in 57 seconds.
  [KNET  ] link: host: 2 link: 0 is down
  [KNET  ] link: host: 1 link: 0 is down
  [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
  [KNET  ] host: host: 2 has no active links
  [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
  [KNET  ] host: host: 1 has no active links
  [TOTEM ] Token has not been received in 2737 ms
  [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
  [QUORUM] Sync members[1]: 3
  [QUORUM] Sync left[2]: 1 2
  [TOTEM ] A new membership (3.9b) was formed. Members left: 1 2
  [TOTEM ] Failed to receive the leave message. failed: 1 2
  [QUORUM] This node is within the non-primary component and will NOT provide any services.
  [QUORUM] Members[1]: 3
  [MAIN  ] Completed service synchronization, ready to provide service.
[status] notice: node lost quorum
[dcdb] notice: members: 3/1080
[status] notice: members: 3/1080
[dcdb] crit: received write while not quorate - trigger resync
[dcdb] crit: leaving CPG group

This is because corosync has still link bound to the old IP, what is worse however, even if you restart the corosync service on the affected node, it will NOT be sufficient, the remaining cluster nodes will be rejecting its traffic with [KNET ] rx: Packet rejected from 10.10.10.113:5405. It is necessary to restart corosync on ALL nodes to get them back into (eventually) the primary component of the cluster. Finally, you ALSO need to restart pve-cluster service on the affected node (only).

NOTE: If you see wrong IP address even after restart, and you have all correct configuration in the corosync.conf, you need to troubleshoot starting with journalctl -t dhclient (and checking the DHCP server configuration if necessary), but eventually may even need to check nsswitch.conf [2] and gai.conf [3].

[1] https://manpages.debian.org/bookworm/net-tools/netstat.8.en.html
[2] https://manpages.debian.org/bookworm/manpages/nsswitch.conf.5.en.html
[3] https://manpages.debian.org/bookworm/manpages/gai.conf.5.en.html
 
Last edited:
IPv6, useful tooling, etc.


The tutorial above is meant to be quite straightforward and non-convoluted, focusing on the simplest of deployment (i.e. how you run everything else in your network).

However, I would like to reserve this part for other setups (IPv6 actually makes a lot of sense), distributing NTP servers information via DHCP for chrony, extra tooling one may find convenient (SSH certificates based PKI and scripts to recover fallen-behind corosync on the entire cluster without relying on the cluster filesystem that by itself is affected by dysfunctional CPG messaging by that point).

I will adapt this section further based on feedback/interest received.
 
Last edited:
Other tidbits



Corosync kronosnet (knet) paper


A quite readable informal paper on how KNET operates coming from the time of the transition might also be of your interest [1], it includes, amongst other things, details on "the way corosync determines which node name corresponds to the local host".

[1] https://people.redhat.com/ccaulfie/docs/KnetCorosync.pdf



The mystery origin of 127.0.1.1 in /etc/hosts


The strange superfluous loopback entry found its way to Debian's /etc/hosts as a workaround for a bug once [1]:
The IP address 127.0.1.1 in the second line of this example may not be found on some other Unix-like systems. The Debian Installer creates this entry for a system without a permanent IP address as a workaround for some software (e.g., GNOME) as documented in the bug #719621.

To be more precise, this was requested in 2005 [2] as a stop-gap while "pursuing the goal of fixing programs so that they no longer rely on the UNIX hostname being resolvable as if it were a DNS domain name." [3], with a particularly valuable end note:
If the system hostname cannot be added to /etc/hosts as the canonical hostname for a permanent real IP address then it should be written as the canonical hostname for a 127.* address.

In the long run it may be better not to write the system hostname
into /etc/hosts at all.

[1] https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=316099
[3] https://lists.debian.org/debian-boot/2005/06/msg00938.html
 
Last edited:
2. C. Auto-install & Ansible reconfiguration


Install the Automated Installation [1] tools, obtain the base ISO image:
Bash:
apt install proxmox-auto-install-assistant
wget https://enterprise.proxmox.com/iso/proxmox-ve_8.2-2.iso

Create generic answer.toml file:
Code:
[global]
keyboard = "en-us"
country = "us"
fqdn = "pve.unconfigured.internal"
mailto = "notify@mx.demo.internal"
timezone = "UTC"
root_password = "P1$$w0rd"
root_ssh_keys = [
    "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEY76qLhFdFXSFnWdoFSUCElPGdjgzErhNhi6aGuoH7a root@ansible"
]

[network]
source = "from-dhcp"

[disk-setup]
filesystem = "ext4"
lvm.swapsize = 0
lvm.maxvz = 0
disk_list = ['sda']

NOTE: Amend accordingly, in particular use your own SSH key and email address, disk, networking configuration, etc. LEAVE, however, the FQDN generic.

Create the ISO and use to image to create your custom installation media as usual:
Bash:
proxmox-auto-install-assistant prepare-iso proxmox-ve_8.2-2.iso --fetch-from iso --answer-file answer.toml

At this point, make sure your DHCP & DNS are set to give reserved IP addresses and resolve correctly hostnames accordingly based on all the nodes' NICs.

Let the custom ISO install run through, wait for nodes to be rebooted.

They will have obtained their IPs from DHCP during the install process, however as per default PVE install will be pre-set statically with the IPs that the installer obtained originally.

On your management machine with Ansible installed [2], create basic inventory with all your nodes and a playbook:

inventory.yaml:
YAML:
pve_nodes:
  hosts:
    pve11:
      ansible_host: pve11.demo.internal
    pve12:
      ansible_host: pve12.demo.internal
  vars:
    ansible_user: root
    ansible_ssh_private_key_file: ~/.ssh/ansible.ed25519
    ansible_python_interpreter: /usr/bin/python3

playbook.yaml:
YAML:
- name: Initial configuration

  hosts: pve_nodes

  tasks:
    - name: Ping nodes
      ansible.builtin.ping:

    - name: Ensure absent /etc/hostname
      ansible.builtin.file:
        path: "/etc/hostname"
        state: absent
      notify:
        - Reboot

    - name: Ensure no /etc/hosts fqdn entry other than loopback
      ansible.builtin.lineinfile:
        path: /etc/hosts
        search_string: pve.unconfigured.internal
        line: '# DNS managed non-loopback resolution'
      notify:
        - Reboot

    - name: Ensure DHCP configured /etc/network/interfaces
      ansible.builtin.copy:
        src: pve_node_interfaces
        dest: /etc/network/interfaces
      notify:
        - Reboot

  handlers:
  - name: Reboot
    ansible.builtin.reboot:

Keep your custom network interfaces template ready as well - pve_node_interfaces:
Code:
auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet dhcp
    bridge-ports enp1s0
    bridge-stp off
    bridge-fd 0

And run your playbook:
Bash:
$ ansible-playbook -i inventory.yaml playbook.yaml

PLAY [Initial configuration] ***

TASK [Gathering Facts] ***
ok: [pve11]
ok: [pve12]

TASK [Ping nodes] ***
ok: [pve12]
ok: [pve11]

TASK [Ensure absent /etc/hostname] ***
ok: [pve11]
changed: [pve12]

TASK [Ensure no /etc/hosts fqdn entry other than loopback] ***
changed: [pve12]
ok: [pve11]

TASK [Ensure DHCP configured /etc/network/interfaces] ***
ok: [pve11]
changed: [pve12]

RUNNING HANDLER [Reboot] ***
changed: [pve12]

PLAY RECAP ***
pve11                      : ok=5    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
pve12                      : ok=6    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

NOTE: All your nodes are now deployed as standalone with DHCP configuration, the next will be to use custom tooling to auto-cluster them.

[1] https://pve.proxmox.com/wiki/Automated_Installation
[2] https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html
 
Last edited:
  • Like
Reactions: waltar

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!