Hi,
For some time now, we've been having an issue where Cloud-init re-runs on VMs that have already been operating for months or years. By 'run', I mean that first-boot-only stages run again, such as upgrading packages. This is obviously very problematic in production, as an 'innocent' reboot can trigger all sorts of unexpected actions.
After a lot of testing, we found that this happens after migrating a VM to a different hypervisor. The `instance_id` suddenly changes, which Cloud-init uses to determine whether this is the first boot.
Proxmox staff previously wrote (https://forum.proxmox.com/threads/rerunning-cloud-init.90932/#post-398172):
Is this accurate? Because migrating a VM from one hypervisor to the other does not change the user config nor the network config, as far as I'm aware. I've attached the output of `cloud-init query -a` from before and after the `instance_id` suddenly changes, i.e. the VM is migrated. You can see that there is no difference except for the ID.
I can confirm that this behaviour does not occur when rebooting a VM on the same hypervisor.
We're on PVE 8.4.19.
For some time now, we've been having an issue where Cloud-init re-runs on VMs that have already been operating for months or years. By 'run', I mean that first-boot-only stages run again, such as upgrading packages. This is obviously very problematic in production, as an 'innocent' reboot can trigger all sorts of unexpected actions.
After a lot of testing, we found that this happens after migrating a VM to a different hypervisor. The `instance_id` suddenly changes, which Cloud-init uses to determine whether this is the first boot.
Proxmox staff previously wrote (https://forum.proxmox.com/threads/rerunning-cloud-init.90932/#post-398172):
The 'instance-id' is basically the hash of the user config and the network config concatenated. cloud-init checks the current instance-id against all previously known ones to see if it has to rerun the different systems and modules, not just the previous instance-id, which leads to strange behavior sometimes.
Is this accurate? Because migrating a VM from one hypervisor to the other does not change the user config nor the network config, as far as I'm aware. I've attached the output of `cloud-init query -a` from before and after the `instance_id` suddenly changes, i.e. the VM is migrated. You can see that there is no difference except for the ID.
I can confirm that this behaviour does not occur when rebooting a VM on the same hypervisor.
We're on PVE 8.4.19.