Error - 'Module 'devicehealth' has failed:'

da-alb

Member
Jan 18, 2021
116
3
23
Hi,

I have a PVE cluster with 3 nodes and Ceph installed.
Any idea on what it might be? After I've configured Ceph on every node, by mistake I have deleted the device_health_metrics pool and I have recreated it after that.
The containers installed on the pool work fine.

The erorr is the following.

Screenshot_2021-01-18 pm-80 - Proxmox Virtual Environment.png

Package version output:

Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Just did the same thing , I discovered than device_healths_metrics appear to be created by manager so

1 - Create a new manager , if you already have a second manager go to step two
2 - delete the first manager ( there is no data loss here ) , wait for the standby one to become active
3 - Recreate the initial manager , the pool is back

I re-deleted the device_health_metrics pool just to confirm and the problem Re-appeared , solved the same way
 
Just did the same thing , I discovered than device_healths_metrics appear to be created by manager so

1 - Create a new manager , if you already have a second manager go to step two
2 - delete the first manager ( there is no data loss here ) , wait for the standby one to become active
3 - Recreate the initial manager , the pool is back

I re-deleted the device_health_metrics pool just to confirm and the problem Re-appeared , solved the same way
Thank you for the answer, I'm going to test it the next time I'm doing another Ceph cluster.
 
Just did the same thing , I discovered than device_healths_metrics appear to be created by manager so

1 - Create a new manager , if you already have a second manager go to step two
2 - delete the first manager ( there is no data loss here ) , wait for the standby one to become active
3 - Recreate the initial manager , the pool is back

I re-deleted the device_health_metrics pool just to confirm and the problem Re-appeared , solved the same way
Great !!!!!! thank
 
+
Just did the same thing , I discovered than device_healths_metrics appear to be created by manager so

1 - Create a new manager , if you already have a second manager go to step two
2 - delete the first manager ( there is no data loss here ) , wait for the standby one to become active
3 - Recreate the initial manager , the pool is back

I re-deleted the device_health_metrics pool just to confirm and the problem Re-appeared , solved the same way
+1 this solved the issue for me too.

pve-manager/7.3-4/d69b70d4 (running kernel: 5.15.83-1-pve)
 
Did any of you find a cause / long term fix.

I have a 3 node ceph cluster on fresh proxmox 8 and just hit this error - very scary, lol

I tried the process of deleting and recreating - it hasn't so far recreated the device_health_metrics pool... oh it did recreate a .mgr pool.... that seems to have cleared the issue for me
 
Last edited:
I encountered this issue after my active manager crashed. The active manager apparently restarted and continued being active but I guess wasn't all good. Stopping the active manager and waiting for the standby to take over resolved this problem. After that, I started the original active manager and the error did not return.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!