Upgraded to VE 6.3 ceph manager not starting

Julian Lliteras

Active Member
Jun 10, 2016
20
0
41
48
Hi,

With the new upgraded versión of ProxMox VE 6.3 can't start ceph manager dashboard. Prior to upgrade dashboard was up and running without issues.


The manager versión 14.2.15 log show these records:

Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.083 7f6662c84700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.sion: ('invalid syntax', ('/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py', 34, 11, ' result: dict = {}\n'))
Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.083 7f6662c84700 -1 dashboard.serve:
Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.083 7f6662c84700 -1 Traceback (most recent call last):
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/share/ceph/mgr/dashboard/module.py", line 306, in serve
Nov 27 12:31:06 sion ceph-mgr[61338]: mapper, parent_urls = generate_routes(self.url_prefix)
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 346, in generate_routes
Nov 27 12:31:06 sion ceph-mgr[61338]: ctrls = load_controllers()
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 274, in load_controllers
Nov 27 12:31:06 sion ceph-mgr[61338]: package='dashboard')
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
Nov 27 12:31:06 sion ceph-mgr[61338]: __import__(name)
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py", line 34
Nov 27 12:31:06 sion ceph-mgr[61338]: result: dict = {}
Nov 27 12:31:06 sion ceph-mgr[61338]: ^
Nov 27 12:31:06 sion ceph-mgr[61338]: SyntaxError: invalid syntax
Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.135 7f664744d700 -1 client.0 error registering admin socket command: (17) File exists
Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.135 7f664744d700 -1 client.0 error registering admin socket command: (17) File exists
Nov 27 12:31:06 sion ceph-mgr[61338]: [27/Nov/2020:12:31:06] ENGINE Serving on http://:::9283
Nov 27 12:31:06 sion ceph-mgr[61338]: [27/Nov/2020:12:31:06] ENGINE Bus STARTED

Tried to install dashboard on another host with same results. Any idea about that?

Thanks in advance.
 
Can you please post a pveversion -v?
 
Hello, i have the same issue, everything, except the ceph dashboard is working:

Module 'dashboard' has failed: ('invalid syntax', ('/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py', 34, 11, ' result: dict = {}\n'))

1606485000895.png




pveversion -v:



proxmox-ve: 6.3-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.15-pve2
ceph-fuse: 14.2.15-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1



Thanks for any ideas to fix that :)

Best regards,

Roman
 
Can you please post a pveversion -v?
root@sion:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.15-pve2
ceph-fuse: 14.2.15-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
root@sion:~#
 
Same problem here after upgrading a test cluster from 6.2.15 to 6.3. Access to Ceph is possible, but HEALTH_ERR status.

Code:
cluster:
    id:     27438f91-f51c-431a-9343-c1d5613ce181
    health: HEALTH_ERR
            Module 'dashboard' has failed: ('invalid syntax', ('/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py', 34, 11, '    resul
t: dict = {}\n'))

pveversion -v

Code:
pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.15-pve2
ceph-fuse: 14.2.15-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-network-perl: 0.4-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
The problem disappeared suddenly when I upgraded Ceph from Nautilus to Octopus as well. Not sure if this is the reason why it disappeared and I wouldn't recommend doing this on a production cluster while ceph is in HEALTH_ERR.

It would be fine if anyone could test also and confirm that an upgrade to Ceph Octopus resolves the issue.

Kind regards,
Marco
cleardot.gif
 
I can confirm too.

Use "ceph mgr module disable dashboard" to get rid of the health error.

Upgrade to octopus as told by samson_99.
The OSDs need a little bit more time to start because of the conversion to "per pool-map omap".

reenable the dashboard "ceph mgr module enable dashboard" when all hosts are on octopus.
 
HI,
I did a new installation of proxmox 6.3 + CEPH 14.2.15 + manager (all new).
I had the same problem reported above.
I would like to keep nautilus running on my cluster.
Any suggestion?
 
Same problem here, upgraded to 6.3 on friday. I only have a production cluster at hand so I'd be really thankful if somone from proxmox could give me some advice on how to proceed. Is it considered "save" to upgrade to Octopus? I disabled the dashboard now to get rid of monitoring warnings.
 
Same problem here, upgraded to 6.3 on friday. I only have a production cluster at hand so I'd be really thankful if somone from proxmox could give me some advice on how to proceed. Is it considered "save" to upgrade to Octopus? I disabled the dashboard now to get rid of monitoring warnings.
If you disabled the dashboard, now ceph is heathy und you can proceed to upgrade to optopus - like here described, it works without problems in our test environment
https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus
 
Thanks. Upgraded to octopus and had to turn "pg_autoscale_mode" on. All three Ceph cluster manager daemons crashed after doing this but a simple "sudo systemctl stop ceph-mgr*; sudo systemctl start ceph-mgr*" fixed that and now the "1 ceph pools have too many placement groups" health error changed to "1 pools have many more objects per pg than average" and the cluster is rebalancing (or whatever the correct workding with ceph is) one pgs after another.
 
After analyzing various ceph-mgr-dashboard deb packages, to me the most likely cause of this issue seem to be a broken ceph-mgr-dashboard_14.2.15-pve2_all.deb. This package contains files (like the mentioned /usr/share/ceph/mgr/dashboard/controllers/orchestrator.py), that seem to belong to the ceph octopus branch.

If this is correct, it should be checked, if other packages are affected as well. The dashboard might not be that important (however if it causes people to migrate it might be), but if other ceph packages are affected as well, it might interfere with more critical components.
 
Thanks for sharing your experiences!
We run a 3-nodes full meshed hyper-converged ceph cluster - and face the same issue after upgrading to 6.3.2.
Disabling the dashboard, as suggested, lead to a HEALTH_OK status of ceph again.
We will wait with ceph update to octopus until the situation is clarified - thus please keep us informend...
 
Thanks for sharing your experiences!
We run a 3-nodes full meshed hyper-converged ceph cluster - and face the same issue after upgrading to 6.3.2.
Disabling the dashboard, as suggested, lead to a HEALTH_OK status of ceph again.
We will wait with ceph update to octopus until the situation is clarified - thus please keep us informend...
You can upgrade to Octopus without any issues! This we have done in our production environment.
Octopus works great with the new features
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!