[SOLVED] px5+ceph : TypeError: checks[key].summary is undefined

grin

Renowned Member
Dec 8, 2008
172
21
83
Hungary
grin.hu
Ceph dashboard became very unreliable, the javascript in the gui seems to neglect to even retrieve the data. I've seen one response with "500 / partial read" on trying to retrieve the pool data, but most often I see the webconsole:
TypeError: checks[key].summary is undefined
possibly it gets from ceph something it cannot parse and completely give it up. Unfortunately I cannot seem to find any logfiles on the server about this.

~# pveversion --verbose
proxmox-ve: 5.0-21 (running kernel: 4.10.15-1-pve)
pve-manager: 5.0-32 (running version: 5.0-32/2560e073)
pve-kernel-4.4.62-1-pve: 4.4.62-88
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.10.17-3-pve: 4.10.17-21
pve-kernel-4.4.59-1-pve: 4.4.59-87
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-18
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-15
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-5
pve-container: 2.0-15
pve-firewall: 3.0-3
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
 
what ceph version do you have installed ?
 
I can give you the complete error (which comes after 2-3 xhr's into ceph main dashboard).
From then on monitor tab is empty, osd is ok, configuration is often left ok right 500/partial read, pools empty, log ok. Those which is broken there seem to be no XHRs towards the API anymore.


15:57:17.469 TypeError: checks[key].summary is undefined 1 pvemanagerlib.js:16543:3
.generateCheckData/< https://freddy:8006/pve2/js/pvemanagerlib.js:16543:3
forEach self-hosted:251:13
a.forEach< https://freddy:8006/pve2/ext6/ext-all.js:22:33070
.generateCheckData https://freddy:8006/pve2/js/pvemanagerlib.js:16539:2
.updateAll https://freddy:8006/pve2/js/pvemanagerlib.js:16569:46
.fire https://freddy:8006/pve2/ext6/ext-all.js:22:141921
.doFireEvent https://freddy:8006/pve2/ext6/ext-all.js:22:148935
.monitor/a.doFireEvent https://freddy:8006/pve2/ext6/ext-all.js:22:420016
.fireEventArgs https://freddy:8006/pve2/ext6/ext-all.js:22:147804
.fireEvent https://freddy:8006/pve2/ext6/ext-all.js:22:147530
.onProxyLoad https://freddy:8006/pve2/ext6/ext-all.js:22:668875
.triggerCallbacks https://freddy:8006/pve2/ext6/ext-all.js:22:596629
.setCompleted https://freddy:8006/pve2/ext6/ext-all.js:22:596327
.setSuccessful https://freddy:8006/pve2/ext6/ext-all.js:22:596429
.process https://freddy:8006/pve2/ext6/ext-all.js:22:595704
.processResponse https://freddy:8006/pve2/ext6/ext-all.js:22:650397
.createRequestCallback/< https://freddy:8006/pve2/ext6/ext-all.js:22:653811
.callback https://freddy:8006/pve2/ext6/ext-all.js:22:68728
.onComplete https://freddy:8006/pve2/ext6/ext-all.js:22:177589
.onStateChange https://freddy:8006/pve2/ext6/ext-all.js:22:176947
k.bind/< https://freddy:8006/pve2/ext6/ext-all.js:22:58579
 
can you post the output of
ceph health detail -f json-pretty
?
 
# ceph health detail -f json-pretty
Code:
{
    "checks": {
        "TOO_MANY_PGS": {
            "severity": "HEALTH_WARN",
            "message": "too many PGs per OSD (972 > max 300)",
            "detail": []
        }
    },
    "status": "HEALTH_WARN"
}
 
ok, do all nodes have the same ceph version (12.2)?
do you have restarted all monitors/osds since the upgrade to 12.2 ?

this output does not look like the output i get here from my luminous cluster, correct output looks like this:
Code:
{
    "checks": {
        "TOO_MANY_PGS": {
            "severity": "HEALTH_WARN",
            "summary": {
                "message": "too many PGs per OSD (6464 > max 300)"
            },
            "detail": []
        }
    },
    "status": "HEALTH_WARN",
    "overall_status": "HEALTH_WARN",
    "detail": [
        "'ceph health' JSON format has changed in luminous. If you see this your monitoring system is scraping the wrong fields. Disable this with 'mon health preluminous compat warning = false'"
    ]
}
 
Code:
# ceph osd versions

{

    "ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)": 15

}

But indeed:

Code:
# ceph mon versions
{
    "ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc)": 2,
    "ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)": 1
}

And after restart the output matches yours (except the ridiculous amount of PGs ;-)).
Thanks! (But there ought to be some better error handling, this very well may happen to anyone.)
 
And after restart the output matches yours (except the ridiculous amount of PGs ;-)).
great, the number of pgs was just on my virtualized cluster ;) there is no actual data on there

Thanks! (But there ought to be some better error handling, this very well may happen to anyone.)
well, ceph did change the output format during the luminous release candidates multiple times, and instead of trying to parse multiple versions that might/might not work
we decided to parse the (hopefully) final format for luminous

if the output changes in a future patch, we will catch it before it reaches our stable ceph luminous repo and patch the gui accordingly
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!