Has cloud-init been changed between 5.3 and 6.0?

TJ Zimmerman

Active Member
Apr 8, 2018
8
0
41
30
Seattle
tjzimmerman.com
Hello, I maintain a project that bootstraps a Kubernetes cluster on Proxmox using Ansible. My code has always worked reliably when tested on 5.3, however I recently upgraded my server to 6.0-4 and now cloud-init is behaving strangely. It successfully sets the IP Address, Subnet, and Gateway to the eth0 interface. However, the DNS Nameserver and Search Domain I supply to qm create are erroneously added to the lo interface instead. Is this some sort of user error or a bug in 6.0?

Here is where I leverage the qm create command. However, when the VMs finally come online the cloud-init configuration data is not set correctly as shown below:

Code:
    debian@Eris:~$ cat /etc/network/interfaces.d/50-cloud-init.cfg
    # This file is generated from information provided by
    # the datasource.  Changes to it will not persist across an instance.
    # To disable cloud-init's network configuration capabilities, write a file
    # /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
    # network: {config: disabled}
    auto lo
    iface lo inet loopback
        dns-nameservers 192.168.1.100
        dns-search sol.milkyway
 
    auto eth0
    iface eth0 inet static
        address 192.168.40.101/24
        gateway 192.168.40.1
    debian@Eris:~$ cat /etc/resolv.conf
    nameserver 127.0.0.1

I'm using this image. Though I can also confirm it happens on this image as well as this image. The latter of which I had previously used with the exact same code successfully on 5.3.
 
I ran my Ansible playbook with some verbose output so I was able to grab the commands that ran to recreate this issue.

Code:
pvesh create /pools -poolid "Kubernetes" --Comment "Kubernetes Cluster"
qm create 4010 --pool Kubernetes --ostype "l26" --name Pluto --description "Kubernetes VM" --agent 1 --cores 2 --memory 5120  --net0 "virtio,bridge=vmbr0" --ipconfig0 "gw=192.168.40.1,ip=192.168.40.10/24" --nameserver 192.168.1.100 --searchdomain sol.milkyway --sshkeys /root/.ssh/sol.milkyway.kubernetes.pub
wget https://cdimage.debian.org/cdimage/openstack/current/debian-10.0.1-20190708-openstack-amd64.qcow2 -O /tmp/image.qcow2
qm set 4010 --net0 "virtio,bridge=vmbr0,tag=40"
qm set 4010 --serial0 /dev/tty0
qm importdisk 4010 /tmp/image.qcow2 SaturnPool
qm set 4010 --scsihw virtio-scsi-pci --scsi0 SaturnPool:vm-4010-disk-0
qm resize 4010 scsi0 50G
qm set 4010 --ide2 SaturnPool:cloudinit
qm set 4010 --boot c --bootdisk scsi0
qm start 4010


You can see in the qm create command on the second line that I'm advertising the DNS nameserver and search domain, I'm just not sure why they are being set for the loopback interface.
 
Last edited:
mhmm... there was no obvious change for the network config i can see

can you maybe post the generated cloud init config from where it is working and where not
(you can get this when you are in the vm, it should be on e.g. /dev/sr0)
 
The generated config can also be dumped with 'qm cloudinit dump <vmid> network' in PVE 6.0
 
Heya, I have the same issue currently with Proxmox 5.4 and could see that the flag "manage-resolv-conf" is not set. Is that something which proxmox doesn't set or is this a debian cloud-init image issue?

Here's my cloud-init config from /dev/sr0 (removed personal information)

Code:
version: 1
config:
    - type: physical
      name: eth0
      mac_address: 'XXX'
      subnets:
      - type: static
        address: 'XXX'
        netmask: 'XXX'
        gateway: 'XXX'
    - type: nameserver
      address:
      - 'XXX'
      search:
      - 'XXX'
#cloud-config
hostname: XXX
manage_etc_hosts: true
user: XXX
ssh_authorized_keys:
  - XXX
chpasswd:
  expire: False
users:
  - default
package_upgrade: true
 
I also have the same problem on 6.0-5

Code:
root@proxmox:~# qm cloudinit dump 100 network
auto lo
iface lo inet loopback

        dns_nameservers 192.168.10.2
        dns_search lab.lan
auto eth0
iface eth0 inet static
        address 192.168.10.50
        netmask 255.255.255.0
        gateway 192.168.10.2
 
@dcsapak I added your requested configuration file at the bottom of this post.

I have noticed another anomaly today. Not only is /etc/resolv.conf obtaining data from the DHCP server instead of cloud-init, it is also obtaining a second IP Address for eth0 that is prioritized over my cloud-init-provided IP Address on the Network Interface.

A qm create command might look like this:

Code:
pvesh create /pools -poolid "Kubernetes" --Comment "Kubernetes Cluster"

qm create 40101 --pool Kubernetes --ostype "l26" --name Eris --description "Kubernetes VM" --agent 1 --cores 4 --memory 10240 --net0 "virtio,bridge=vmbr0" --ipconfig0 "gw=192.168.40.1,ip=192.168.40.101/24" --nameserver 192.168.1.100 --searchdomain sol.milkyway --sshkeys /root/.ssh/sol.milkyway.kubernetes.pub

qm set 40101 --net0 "virtio,bridge=vmbr0,tag=40"

qm importdisk 40101 /tmp/image.qcow2 SaturnPool

qm set 40101 --scsihw virtio-scsi-pci --scsi0 SaturnPool:vm-4010-disk-0 --ide2 SaturnPool:cloudinit --serial0 /dev/tty0 --boot c --bootdisk scsi0

qm resize 40101 scsi0 50G

qm start 40101

As you can see, in the ipconfig0 stage of the qm create command, I set the IP Address to 192.168.40.101. However, after provisioning the VM, the eth0 network interface has both this IP Address and a DHCP-obtained IP Address, 192.168.40.230. For example:

Code:
    root@Eris:/home/debian# ip addr show dev eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether ae:63:41:e3:3d:48 brd ff:ff:ff:ff:ff:ff
        inet 192.168.40.230/24 brd 192.168.40.255 scope global dynamic eth0
           valid_lft 85503sec preferred_lft 85503sec
        inet 192.168.40.101/24 brd 192.168.40.255 scope global secondary eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::ac63:41ff:fee3:3d48/64 scope link
           valid_lft forever preferred_lft forever

Both IPs are routable without any issues. And normally this would be fine. However, for some reason my router is unable to peer via BGP with these VMs by the cloud-init provided IP Address. It only works via the DHCP-obtained IP Address. Which is unpredictable and not really compatible with infrastructure as code. Perhaps it is because it is the first IP Address on the interface?

Also, as a secondary issue with cloud-init on Proxmox via qm create, sometimes my /etc/resolv.conf file is not properly populated. As you can see in the qm create command above, the nameserver should be 192.168.1.100 and the search domain should be sol.milkyway. However, after provisioning the VM, neither of these are present within /etc/resolv.conf. Instead, the values that exist were given to the VM via DHCP.

Code:
    cat /etc/resolv.conf
    nameserver 192.168.1.100
    nameserver 192.168.1.110

How can I prevent this from happening?

* Here is the configuration file from /dev/sro.
* Here is the /var/log/cloud-init-output.log file.
* Here is the /var/log/cloud-init.log file.
 
mhnmm looking at your logs/config i do not see dhcp mentioned anywhere... does this image have dhcp configured elsewhere? e.g., networkmanager or systemd-networkd ? if yes, this is out of reach for our cloudinit config
 
Yeah, I suppose DHCP is likely baked into the Debian CloudStack qcow2 image I'm using. I was under the impression that cloud-init would deal with this better but I guess not. Thank you for your help!
 
I also have the same problem on 6.0-5

Code:
root@proxmox:~# qm cloudinit dump 100 network
auto lo
iface lo inet loopback

        dns_nameservers 192.168.10.2
        dns_search lab.lan
auto eth0
iface eth0 inet static
        address 192.168.10.50
        netmask 255.255.255.0
        gateway 192.168.10.2

It seems that this might be a problem with the cloud-init package within Ubuntu, given the following bug:
https://bugs.launchpad.net/cloud-init/+bug/1712440
 
We had this issue with Deb 10 in latest proxmox. So with a bit of config changes and additions to cloud.cnf in the image now works fine.

Add a known working resolver to resolv.conf in the image (i.e 8.8.8.8 - this will get overwritten).

The the following to the bottom of cloud.cfg. I've left in qemu-guest-agent as it is not installed and won't start automatically.

Code:
packages:
- qemu-guest-agent
- resolvconf

runcmd:
- ifdown lo
- ifup lo
- service qemu-guest-agent start

resolvconf on downing lo and uping will write the expected resolv.conf and this will then happen on each boot as it runs automatically in interface up.

I also removed the one line from /etc/network/interfaces under each interfaces hot-plug line referencing DHCP (as it then fails over to /etc/network/interfaces.d/50-cloud-init.cfg). This speeds the boot up massively but that is based on our setup, YMMV.

It's a bit hacky but works as intended.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!