cloud-init brain dump

cypherfox

New Member
Aug 26, 2025
7
3
3
Greetings,
I've been fighting with cloud-init with templates on Proxmox for the last few hours, and I want to capture what I've learned, and maybe be told where I'm doing it wrong, or taking the hard way.

First off, a few specific things that I've learned.
  • Proxmox has it's own cloud-init settings (user, network, meta) which you _must not_ override in a template, because they are customized on-the-fly for each instance that is created from that template. The only one you can safely override is vendorbecause Proxmox doesn't use it.
    • This matters because cloud-config is first-wins. That is, if the user configuration from Proxmox includes a users: section, you can not have a users: section in your vendor cloud config; it will be ignored.
    • This means that if you are using the default template (qm cloudinit dump {tid} user) that looks something like this, you cannot add users in your own cloud config:
      • YAML:
        #cloud-config
        hostname: ubuntu-2404-cloudinit-template
        manage_etc_hosts: true
        fqdn: ubuntu-2404-cloudinit-template
        chpasswd:
          expire: False
        users:
          - default
        package_upgrade: true
    • To be more specific for my use case, is that I can not add a user=local:snippets/user-data.yaml section in my cicustom setting on the template (because it will not obey the custom overrides that Proxmox does) and with the default, I can not add users in my own vendor cloud config. This hurt my head for a while.
    • Also, you can not tell the Proxmox system to not include user settings in the user cloud init configuration.
Luckily, it turns out that you can fool it (I use fool as a general term, I'm not sure if it's an intended feature, but it feels hacky) by giving it an existing user, specifically root.

If you specify in the template that the cloud-init user is root, you get a user cloud-init that looks like this:
YAML:
#cloud-config
hostname: ubuntu-2404-cloudinit-template
manage_etc_hosts: true
fqdn: ubuntu-2404-cloudinit-template
user: root
disable_root: False
password: $5$lotsofcharactershere
chpasswd:
  expire: False
package_upgrade: true

You'll notice that this does not have a users: stanza, which means that (because of first-wins) it won't override one in my own cloud-config. This is why I say it feels hacky. It WORKS, but I'd be worried that it's not intentional, and it might break because it's a side-effect.

So what does this look like in a full cloud config?
YAML:
#cloud-config
timezone: America/Los_Angeles
locale: en_US
package_update: true
chpasswd:
    expire: False
users:
    - default
    - name: {username}
      gecos: {full name}
      groups:
        - sudo
        - docker
      ssh_import_id:
        - gh:{github user name to pull public SSH keys from}
      lock_passwd: false
      shell: /bin/bash
      passwd: $6${lots more characters}
package_upgrade: true
packages:
    - qemu-guest-agent
    - emacs-nox
    - silversearcher-ag
    - net-tools
    - apt-transport-https
    - ca-certificates
    - curl
    - vim
runcmd:
    - timedatectl set-timezone America/Los_Angeles
    - systemctl start qemu-guest-agent
    - systemctl enable qemu-guest-agent
    - install -m 0755 -d /etc/apt/keyrings
    - curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    - chmod a+r /etc/apt/keyrings/docker.asc
    - echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
    - apt update
    - apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    - systemctl enable docker
    - systemctl start docker
    - ['sh', '-c', 'curl -fsSL https://tailscale.com/install.sh | sh']
    # Set sysctl settings for IP forwarding (useful when configuring an exit node)
    - ['sh', '-c', "echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf && echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf && sudo sysctl -p /etc/sysctl.d/99-tailscale.conf" ]
    - ['tailscale', 'up', '--auth-key=tskey-auth-{tailscale-key}']
    # (Optional) Include this line to make this node available over Tailscale SSH
    - ['tailscale', 'set', '--ssh']
    - curl -s https://packages.wazuh.com/key/GPG-KEY-WAZUH | gpg --no-default-keyring --keyring gnupg-ring:/usr/share/keyrings/wazuh.gpg --import && chmod 644 /usr/share/keyrings/wazuh.gpg
    - echo "deb [signed-by=/usr/share/keyrings/wazuh.gpg] https://packages.wazuh.com/4.x/apt/ stable main" | tee -a /etc/apt/sources.list.d/wazuh.list
    - apt update
    - apt install -y wazuh-agent
    - sed -i 's/<address>MANAGER_IP<\/address>/<address>{wazuh api hostname}<\/address>/' /var/ossec/etc/ossec.conf
    - systemctl daemon-reload
    - systemctl enable wazuh-agent
    - systemctl start wazuh-agent

You'll notice a few sections with {...} in them, and that's where I've replaced/hidden configuration that's specific to my instance. This allows me to have a template which installs a bunch of things that I pretty much always want installed (e.g. Tailscale, Wazuh, Docker, Qemu Guest Agent). It also allows me to import my public keys from my GitHub repo, which is not a feature that Proxmox natively supports.

I did do a few other things, like:

Bash:
qemu-nbd --connect=/dev/nbd0 ./noble-server-cloudimg-amd64.img
fdisk -l /dev/nbd0
# Figure out which drive to mount, generally it's going to be the first partition.
mkdir /mnt/iso
mount /dev/nbd0p1 /mnt/iso
rm /mnt/iso/etc/ssh/sshd_config.d/60-cloudimg-settings.conf
umount /mnt/iso
# local-lvm is my storage area, you'd configure it how you need it. If you've already imported it,
# it'll call it `disk-2` so change the next lines appropriately.
qm importdisk 8001 noble-server-cloudimg-amd64.img local-lvm
qm set 8001 --scsihw virtio-scsi-pci --virtio0 local-lvm:vm-8001-disk-1
qm set 8001 --cicustom "vendor=local:snippets/initialize.yaml"

This is because that particular file disables password authentication in SSH, and when it was failing to get my GitHub SSH keys, there was no way for me to get into the server remotely. You can do other edits using that particular mechanism, but don't feel you need to.

I went through a lot of other steps, using bootcmd and even creating a mime enclosure for multiple types of cloud-init configurations (specifically a cloud-boothook) because I was having early-startup networking problems. That turned out to not be necessary, once I truncated the /etc/machine-id file which is the unique id used to determine DHCP identification. (The default image has it truncated, but networking wasn't working.)

It's been...a voyage, to get here. I spent a LOT of time on the cloud-init docs, I followed (roughly) the path laid out by untouchedwagons in 'Making an Ubuntu 24.04 VM Template for Proxmox and CloudInit' and owe them a big debt for getting me started.

I hope this is interesting, and maybe opens some eyes to the things you can do with cloud-init, and how to work around some of Proxmox's own edge-cases with regards to it.
 
  • Like
Reactions: thorazine74
Hi @cypherfox, welcome to the forum!

Thanks for sharing your experience, I’m sure it will be useful for others working with cloud-init.

My own experience has been a bit different: when using a custom cloud-init file, the internal (GUI) cloud-init settings are completely ignored. You can usually confirm this by mounting the generated cloud-init ISO and inspecting the files there, or by checking the cloud-init logs after boot.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Greetings,

@bbgeek17 Yes, if I'm understanding what you're saying, if you create a cloud init using something like qm set 8001 --cicustom "user=local:snippets/my-user-setup.yaml" it will completely override the user settings for cloud init entered either through the web ui, or directly through qm set 8001 --ciuser=foobar and I think that's intended.

The idea (I would imagine) is that if you're providing your own settings for that, then you Know What You're Doing, and don't intend to merge a potentially unknown dataset (Proxmox's internal state) with yours. That is, they don't do a merge operation, because that would violate the principle of least astonishment.

They've specifically stated that they don't use vendor and so if you want to do setup, but also rely on the existing network, user, and meta configuration, you would do it through vendor. That's what I'm doing, for example, because I don't want to manually re-implement all the special tricks (like adding the hostname to the cloud config, or dynamically creating a MAC address) that they do.

Unfortunately, I had to do the tricks listed above to be able to provision my own user(s)... Interestingly, while messing with this, I found the Terraform-generated cloud-init files from an old attempt of mine at building a Terraform k3s deployment. Terraform is also MUCH more able to override everything in cloud-init, because it has all the information at plan application time. (Hostnames, users, networking, etc., are all defined in the *.tf files.) But the byblows left in snippets of doing tf apply are GREAT sources for understanding how to build a good cloud init configuration.
 
Hi @cypherfox , the cloud-init file can be in many different formats: yaml, shell, etc. Merging different styles and formats is complicated and is simply not worth it.
As you found, using an external orchestrator like TF to drive VM creation start-to-end is a good way to avoid reinventing the wheel
:)

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox