[SOLVED] HELP - I somehow broke my Installation

MarvAmBass

New Member
Mar 20, 2018
14
0
1
33
Hi there,

this is about a `Virtual Environment 4.4-21` Proxmox Cluster with 4 hosts (debian jessie).

I executed my usual Ansible Script to installs and configures some things like Tripwire, Rsyslog-gnutls, cron, dbus, curl, ntp

- To use `ntp` it disables `systemd-timesyncd.service`

- It also tightens sshd config - but permits root login since I learned this will break proxmox cluster setups.

- after the installation the proxmox gpg key was no longer trusted to I retrusted it using `wget http://download.proxmox.com/debian/key.asc; apt-key add key.asc`

And after that I see that most of the pve*.service systemd units are masked and no longer running. WebGUI is broken for example. And I don't know what will break after a reboot

`systemctl list-unit-files --state=masked | grep masked`

```
pve-cluster.service enabled
pve-firewall.service enabled
pve-ha-crm.service masked
pve-ha-lrm.service masked
pve-manager.service masked
pvebanner.service masked
pvedaemon.service masked
pvefw-logger.service static
pvenetcommit.service masked
pveproxy.service masked
pvestatd.service masked


```

more infos the pveproxy

```
# systemctl status pveproxy

pveproxy.service
Loaded: masked (/dev/null)
Active: active (running) since Wed 2018-03-21 05:25:05 UTC; 10h ago
Main PID: 27135 (pveproxy)
CGroup: /system.slice/pveproxy.service
├─27135 pveproxy
├─27136 pveproxy worker
├─27137 pveproxy worker
└─27138 pveproxy worker

Mar 21 05:25:05 mp-db05 pveproxy[27135]: starting server
Mar 21 05:25:05 mp-db05 pveproxy[27135]: starting 3 worker(s)
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27136 started
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27137 started
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27138 started
Mar 21 05:25:05 mp-db05 systemd[1]: Started PVE API Proxy Server.
Mar 21 14:43:51 mp-db05 pveproxy[27137]: Clearing outdated entries from certificate cache
Mar 21 14:43:55 mp-db05 pveproxy[27138]: Clearing outdated entries from certificate cache


```

the interesting thing is, also some of the binaries are missing - if I compare them with a functioning cluster member

not broken member
```
ls /usr/bin/pve*

/usr/bin/pveam /usr/bin/pvemailforward.pl /usr/bin/pvesubscription
/usr/bin/pvebanner /usr/bin/pveperf /usr/bin/pveupdate
/usr/bin/pveceph /usr/bin/pveproxy /usr/bin/pveupgrade
/usr/bin/pvecm /usr/bin/pvereport /usr/bin/pveversion
/usr/bin/pvedaemon /usr/bin/pvesh
/usr/bin/pvemailforward /usr/bin/pvestatd


```

broken member

```
ls /usr/bin/pve*
/usr/bin/pvecm


```

Is there anyone how might no in which state my clients are and why did this happen? Thanks

And maybe someone has an Idea on how to recover from such a state
 
Last edited:
Okay by comparing installed packages I found that some perl libs are missing

working cluster client

```
libpve-access-control
libpve-common-perl
libpve-guest-common-perl
libpve-http-server-perl
libpve-storage-perl
pve-cluster
pve-container
pve-docs
pve-firewall
pve-firmware
pve-ha-manager
pve-kernel-4.2.6-1-pve
pve-kernel-4.4.98-4-pve
pve-libspice-server1
pve-manager
pve-qemu-kvm


```

broken cluster client

```
libpve-access-control
libpve-common-perl
libpve-http-server-perl
pve-cluster
pve-docs
pve-firewall
pve-firmware
pve-ha-manager
pve-kernel-4.2.6-1-pve
pve-kernel-4.4.98-4-pve
pve-kernel-4.4.98-6-pve
pve-libspice-server1
pve-manager
pve-qemu-kvm


```

so the difference are `libpve-guest-common-perl` and `libpve-storage-perl`

I tried to reinstall the missing packages using
```
apt-get install libpve-guest-common-perl # this will also install libpve-storage-perl
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
libpve-storage-perl nfs-common rpcbind
Suggested packages:
watchdog
The following NEW packages will be installed:
libpve-guest-common-perl libpve-storage-perl nfs-common rpcbind
0 upgraded, 4 newly installed, 0 to remove and 13 not upgraded.
Need to get 206 kB/324 kB of archives.
After this operation, 893 kB of additional disk space will be used.
Do you want to continue? [Y/n]

```
 
looks like proxmox-ve was somehow uninstalled

```
dpkg --list | grep proxmox
rc proxmox-ve 4.4-104 all The Proxmox Virtual Environment

```

so I reinstalled it using

```
apt-get install proxmox-ve
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
pve-container pve-ha-manager pve-manager qemu-server
The following NEW packages will be installed:
proxmox-ve pve-container pve-ha-manager pve-manager qemu-server
0 upgraded, 5 newly installed, 0 to remove and 13 not upgraded.
Need to get 2,592 kB/2,734 kB of archives.
After this operation, 486 kB of additional disk space will be used.
Do you want to continue? [Y/n]


```

and it seems to work again now - services got unmasked and webgui etc is available again
 
looks like you did an "apt upgrade" instead of the needed "apt dist-upgrade"?
 
  • Like
Reactions: NewDude
I also uninstalled rpcbind in the ansible script.

this lead to the removal of proxmox-ve.

I updated my ansible to no longer remove rpcbind and now everything works as expected :)