[SOLVED] After using cloud init the first time, pve is messed up now (LXC etc not working)

Coffeeri · Dec 19, 2019

Hello folks.
My PVE instance was up for 19 days, during this time I did:

I used cloud init the first time a couple days ago and pretty much followed this tutorial.
i created like 3 vms with cloud init, including one with qm clone 9000 191 --name tpy2 notice the name of the vm - got hostname of pve instance now
updates and haven't rebooted yet

Whats going on now - first things that i noticed:

pve boots, has a new hostname.. its now root@tpy2 instead of root@pve
ssh keys are different ( i think the one from tpy2)
username root and the password for my pve stayed the same
pct list for my lxc containers response with "Connection refused"

ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
same for qm list
cannot access Web UI
folder /etc/pve is empty
one zfs pool where I got problems mounting it at boot time, doesnt come up as unmounted when looking at zfs list (as it did before, so I could manually mount it by zfs mount -a)

It is as if proxmox booted into that tpy2 vm but mixed in its own environment (commands etc) as well..
What the hell is going on? What can I do? Anyone has seen this occurrence?

Thanks in advance!

dcsapak · Dec 19, 2019

it seems you installed cloud init on the host? if yes, remove it and revert all config changes it did (network/hosts) etc. ..

Coffeeri · Dec 19, 2019

dcsapak said:
it seems you installed cloud init on the host? if yes, remove it and revert all config changes it did (network/hosts) etc. ..

I think you are correct, I ran
apt-get install cloud-init
on the host.
I uninstalled it via
apt purge install cloud-init

and did a reboot.

Same condition as before.

dcsapak said:
revert all config changes it did (network/hosts) etc. ..

I didn't ran any other host changing command regarding the network/ hosts manually. Did cloud init change something, that I have to revert ?

Edit:
removed
rm -rf /var/lib/cloud/
changed
/etc/hostname
changed tpy2 -> pve

pve

/etc/hosts
deleted line 127.0.0.1 tpy2
I did never touch the hosts file on my host, I don't know its initial state. Sadly I dont have a backup of this file.

# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
# /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

/etc/network/interfaces

# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

auto vmbr0
iface vmbr0 inet static
address 172.16.1.100
netmask 255.255.255.0
gateway 172.16.1.1
bridge-ports eno1
bridge-stp off
bridge-fd 0

auto vmbr1
iface vmbr1 inet manual
bridge-ports none
bridge-stp off
bridge-fd 0

auto vmbr2
iface vmbr2 inet manual
bridge-ports eno2
bridge-stp off
bridge-fd 0
#internal network

seems normal/ unharmed

I still cannot access pct/ qm, nor the web UI

EDIT2:
Just saw
root@pve:~# dpkg -l | grep cloud
ii cloud-guest-utils 0.29-1 all cloud guest utilities
so I did
root@pve:~# sudo apt-get purge --auto-remove cloud-guest-utils

● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: activating (start) since Thu 2019-12-19 13:40:16 CET; 9s ago
Cntrl PID: 8466 (pmxcfs)
Tasks: 1 (limit: 4915)
Memory: 1.5M
CGroup: /system.slice/pve-cluster.service
└─8466 /usr/bin/pmxcfs

Dec 19 13:40:16 pve systemd[1]: Starting The Proxmox VE cluster filesystem...

● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: inactive (dead) (Result: exit-code) since Thu 2019-12-19 13:40:17 CET; 8s ago
Process: 8396 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=111)
Process: 8406 ExecStart=/usr/bin/pveproxy start (code=exited, status=255/EXCEPTION)

Dec 19 13:40:17 pve systemd[1]: pveproxy.service: Service RestartSec=100ms expired, scheduling restart.
Dec 19 13:40:17 pve systemd[1]: pveproxy.service: Scheduled restart job, restart counter is at 13.
Dec 19 13:40:17 pve systemd[1]: Stopped PVE API Proxy Server.

● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2019-12-19 13:35:50 CET; 4min 34s ago
Process: 5732 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 5859 (pvedaemon)
Tasks: 4 (limit: 4915)
Memory: 124.0M
CGroup: /system.slice/pvedaemon.service
├─5859 pvedaemon
├─5860 pvedaemon worker
├─5861 pvedaemon worker
└─5863 pvedaemon worker

Dec 19 13:35:49 pve systemd[1]: Starting PVE API Daemon...
Dec 19 13:35:50 pve pvedaemon[5859]: starting server
Dec 19 13:35:50 pve pvedaemon[5859]: starting 3 worker(s)
Dec 19 13:35:50 pve pvedaemon[5859]: worker 5860 started
Dec 19 13:35:50 pve pvedaemon[5859]: worker 5861 started
Dec 19 13:35:50 pve pvedaemon[5859]: worker 5863 started
Dec 19 13:35:50 pve systemd[1]: Started PVE API Daemon.

-- The job identifier is 7962 and the job result is done.
Dec 19 13:44:32 pve pmxcfs[10958]: [main] crit: Unable to get local IP address
Dec 19 13:44:32 pve pmxcfs[10958]: [main] crit: Unable to get local IP address
Dec 19 13:44:32 pve systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit pve-cluster.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 255.
Dec 19 13:44:32 pve systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pve-cluster.service has entered the 'failed' state with result 'exit-code'.
Dec 19 13:44:32 pve systemd[1]: Failed to start The Proxmox VE cluster filesystem.
-- Subject: A start job for unit pve-cluster.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-cluster.service has finished with a failure.
--
-- The job identifier is 7907 and the job result is failed.
Dec 19 13:44:32 pve systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
-- Subject: A start job for unit corosync.service has finished successfully
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit corosync.service has finished successfully.
--
-- The job identifier is 7961.
Dec 19 13:44:32 pve systemd[1]: Starting PVE API Proxy Server...
-- Subject: A start job for unit pveproxy.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pveproxy.service has begun execution.
--
-- The job identifier is 7962.
Dec 19 13:44:32 pve systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.
Dec 19 13:44:32 pve systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 52.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Automatic restarting of the unit pve-cluster.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Dec 19 13:44:32 pve systemd[1]: Stopped The Proxmox VE cluster filesystem.
-- Subject: A stop job for unit pve-cluster.service has finished
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A stop job for unit pve-cluster.service has finished.
--
-- The job identifier is 8084 and the job result is done.
Dec 19 13:44:32 pve systemd[1]: Starting The Proxmox VE cluster filesystem...
-- Subject: A start job for unit pve-cluster.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pve-cluster.service has begun execution.
--
-- The job identifier is 8084.
Dec 19 13:44:32 pve pvecm[11055]: ipcc_send_rec[1] failed: Connection refused
Dec 19 13:44:32 pve pvecm[11055]: ipcc_send_rec[2] failed: Connection refused
Dec 19 13:44:32 pve pvecm[11055]: ipcc_send_rec[3] failed: Connection refused
Dec 19 13:44:32 pve pvecm[11055]: Unable to load access control list: Connection refused

I guess the hostsfile is not complete.

after reboot condition has not changed

EDIT3:
sorry this is getting long.
added this in /etc/hosts
127.0.0.1 localhost.leserver localhost
172.16.1.100 pve.proxmox.com pve pvelocalhost

Problem solved.

dcsapak · Dec 19, 2019

Coffeeri said:
Did cloud init change something, that I have to revert ?

yes, thats the point of cloud init

it seems you are missing the 'ip hostname' entry in /etc/hosts

there should be line probably like

172.16.1.100 pve.fqdn pve

where 'pve.fqdn' is the fully qualified domain name (e.g. pve.foo.com) (this is what was chosen in the installer for the domain name)

Coffeeri · Dec 19, 2019

dcsapak said:
yes, thats the point of cloud init

it seems you are missing the 'ip hostname' entry in /etc/hosts

there should be line probably like

172.16.1.100 pve.fqdn pve

where 'pve.fqdn' is the fully qualified domain name (e.g. pve.foo.com) (this is what was chosen in the installer for the domain name)

I saw on some post the usage of
172.16.1.100 pve.proxmox.com pve pvelocalhost
eventhough my fqdn i set at the installation was "leserver", it worked. I will change it though.
Thank you Dominik

Search

Search

[SOLVED] After using cloud init the first time, pve is messed up now (LXC etc not working)

Coffeeri

Member

dcsapak

Proxmox Staff Member

Coffeeri

Member

dcsapak

Proxmox Staff Member

Coffeeri

Member