Hi there,
this is about a `Virtual Environment 4.4-21` Proxmox Cluster with 4 hosts (debian jessie).
I executed my usual Ansible Script to installs and configures some things like Tripwire, Rsyslog-gnutls, cron, dbus, curl, ntp
- To use `ntp` it disables `systemd-timesyncd.service`
- It also tightens sshd config - but permits root login since I learned this will break proxmox cluster setups.
- after the installation the proxmox gpg key was no longer trusted to I retrusted it using `wget http://download.proxmox.com/debian/key.asc; apt-key add key.asc`
And after that I see that most of the pve*.service systemd units are masked and no longer running. WebGUI is broken for example. And I don't know what will break after a reboot
`systemctl list-unit-files --state=masked | grep masked`
```
pve-cluster.service enabled
pve-firewall.service enabled
pve-ha-crm.service masked
pve-ha-lrm.service masked
pve-manager.service masked
pvebanner.service masked
pvedaemon.service masked
pvefw-logger.service static
pvenetcommit.service masked
pveproxy.service masked
pvestatd.service masked
```
more infos the pveproxy
```
# systemctl status pveproxy
● pveproxy.service
Loaded: masked (/dev/null)
Active: active (running) since Wed 2018-03-21 05:25:05 UTC; 10h ago
Main PID: 27135 (pveproxy)
CGroup: /system.slice/pveproxy.service
├─27135 pveproxy
├─27136 pveproxy worker
├─27137 pveproxy worker
└─27138 pveproxy worker
Mar 21 05:25:05 mp-db05 pveproxy[27135]: starting server
Mar 21 05:25:05 mp-db05 pveproxy[27135]: starting 3 worker(s)
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27136 started
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27137 started
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27138 started
Mar 21 05:25:05 mp-db05 systemd[1]: Started PVE API Proxy Server.
Mar 21 14:43:51 mp-db05 pveproxy[27137]: Clearing outdated entries from certificate cache
Mar 21 14:43:55 mp-db05 pveproxy[27138]: Clearing outdated entries from certificate cache
```
the interesting thing is, also some of the binaries are missing - if I compare them with a functioning cluster member
not broken member
```
ls /usr/bin/pve*
/usr/bin/pveam /usr/bin/pvemailforward.pl /usr/bin/pvesubscription
/usr/bin/pvebanner /usr/bin/pveperf /usr/bin/pveupdate
/usr/bin/pveceph /usr/bin/pveproxy /usr/bin/pveupgrade
/usr/bin/pvecm /usr/bin/pvereport /usr/bin/pveversion
/usr/bin/pvedaemon /usr/bin/pvesh
/usr/bin/pvemailforward /usr/bin/pvestatd
```
broken member
```
ls /usr/bin/pve*
/usr/bin/pvecm
```
Is there anyone how might no in which state my clients are and why did this happen? Thanks
And maybe someone has an Idea on how to recover from such a state
this is about a `Virtual Environment 4.4-21` Proxmox Cluster with 4 hosts (debian jessie).
I executed my usual Ansible Script to installs and configures some things like Tripwire, Rsyslog-gnutls, cron, dbus, curl, ntp
- To use `ntp` it disables `systemd-timesyncd.service`
- It also tightens sshd config - but permits root login since I learned this will break proxmox cluster setups.
- after the installation the proxmox gpg key was no longer trusted to I retrusted it using `wget http://download.proxmox.com/debian/key.asc; apt-key add key.asc`
And after that I see that most of the pve*.service systemd units are masked and no longer running. WebGUI is broken for example. And I don't know what will break after a reboot
`systemctl list-unit-files --state=masked | grep masked`
```
pve-cluster.service enabled
pve-firewall.service enabled
pve-ha-crm.service masked
pve-ha-lrm.service masked
pve-manager.service masked
pvebanner.service masked
pvedaemon.service masked
pvefw-logger.service static
pvenetcommit.service masked
pveproxy.service masked
pvestatd.service masked
```
more infos the pveproxy
```
# systemctl status pveproxy
● pveproxy.service
Loaded: masked (/dev/null)
Active: active (running) since Wed 2018-03-21 05:25:05 UTC; 10h ago
Main PID: 27135 (pveproxy)
CGroup: /system.slice/pveproxy.service
├─27135 pveproxy
├─27136 pveproxy worker
├─27137 pveproxy worker
└─27138 pveproxy worker
Mar 21 05:25:05 mp-db05 pveproxy[27135]: starting server
Mar 21 05:25:05 mp-db05 pveproxy[27135]: starting 3 worker(s)
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27136 started
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27137 started
Mar 21 05:25:05 mp-db05 pveproxy[27135]: worker 27138 started
Mar 21 05:25:05 mp-db05 systemd[1]: Started PVE API Proxy Server.
Mar 21 14:43:51 mp-db05 pveproxy[27137]: Clearing outdated entries from certificate cache
Mar 21 14:43:55 mp-db05 pveproxy[27138]: Clearing outdated entries from certificate cache
```
the interesting thing is, also some of the binaries are missing - if I compare them with a functioning cluster member
not broken member
```
ls /usr/bin/pve*
/usr/bin/pveam /usr/bin/pvemailforward.pl /usr/bin/pvesubscription
/usr/bin/pvebanner /usr/bin/pveperf /usr/bin/pveupdate
/usr/bin/pveceph /usr/bin/pveproxy /usr/bin/pveupgrade
/usr/bin/pvecm /usr/bin/pvereport /usr/bin/pveversion
/usr/bin/pvedaemon /usr/bin/pvesh
/usr/bin/pvemailforward /usr/bin/pvestatd
```
broken member
```
ls /usr/bin/pve*
/usr/bin/pvecm
```
Is there anyone how might no in which state my clients are and why did this happen? Thanks
And maybe someone has an Idea on how to recover from such a state
Last edited: