Upgraded to VE 6.3 ceph manager not starting

Julian Lliteras · Nov 27, 2020

Hi,

With the new upgraded versión of ProxMox VE 6.3 can't start ceph manager dashboard. Prior to upgrade dashboard was up and running without issues.

The manager versión 14.2.15 log show these records:

Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.083 7f6662c84700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.sion: ('invalid syntax', ('/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py', 34, 11, ' result: dict = {}\n'))
Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.083 7f6662c84700 -1 dashboard.serve:
Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.083 7f6662c84700 -1 Traceback (most recent call last):
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/share/ceph/mgr/dashboard/module.py", line 306, in serve
Nov 27 12:31:06 sion ceph-mgr[61338]: mapper, parent_urls = generate_routes(self.url_prefix)
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 346, in generate_routes
Nov 27 12:31:06 sion ceph-mgr[61338]: ctrls = load_controllers()
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 274, in load_controllers
Nov 27 12:31:06 sion ceph-mgr[61338]: package='dashboard')
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
Nov 27 12:31:06 sion ceph-mgr[61338]: __import__(name)
Nov 27 12:31:06 sion ceph-mgr[61338]: File "/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py", line 34
Nov 27 12:31:06 sion ceph-mgr[61338]: result: dict = {}
Nov 27 12:31:06 sion ceph-mgr[61338]: ^
Nov 27 12:31:06 sion ceph-mgr[61338]: SyntaxError: invalid syntax
Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.135 7f664744d700 -1 client.0 error registering admin socket command: (17) File exists
Nov 27 12:31:06 sion ceph-mgr[61338]: 2020-11-27 12:31:06.135 7f664744d700 -1 client.0 error registering admin socket command: (17) File exists
Nov 27 12:31:06 sion ceph-mgr[61338]: [27/Nov/2020:12:31:06] ENGINE Serving on http://:::9283
Nov 27 12:31:06 sion ceph-mgr[61338]: [27/Nov/2020:12:31:06] ENGINE Bus STARTED

Tried to install dashboard on another host with same results. Any idea about that?

Thanks in advance.

Alwin · Nov 27, 2020

Can you please post a pveversion -v?

Romsch · Nov 27, 2020

Hello, i have the same issue, everything, except the ceph dashboard is working:

Module 'dashboard' has failed: ('invalid syntax', ('/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py', 34, 11, ' result: dict = {}\n'))

pveversion -v:

proxmox-ve: 6.3-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.15-pve2
ceph-fuse: 14.2.15-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

Thanks for any ideas to fix that

Best regards,

Roman

spirit · Nov 27, 2020

Code:

 File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module

This is strange, I think than dashboard is now using python3

is "ceph-mgr-dashboard" package installed ?

Julian Lliteras · Nov 27, 2020

Alwin said:
Can you please post a pveversion -v?

root@sion:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.15-pve2
ceph-fuse: 14.2.15-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
root@sion:~#

Romsch · Nov 27, 2020

spirit said:
Code:

File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module

This is strange, I think than dashboard is now using python3

is "ceph-mgr-dashboard" package installed ?

yes it is. i had a ceph dashboard before - like @

Julian Lliteras

maybe this is the problem?

Best regards

mgabriel · Nov 27, 2020

Same problem here after upgrading a test cluster from 6.2.15 to 6.3. Access to Ceph is possible, but HEALTH_ERR status.

Code:

cluster:
    id:     27438f91-f51c-431a-9343-c1d5613ce181
    health: HEALTH_ERR
            Module 'dashboard' has failed: ('invalid syntax', ('/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py', 34, 11, '    resul
t: dict = {}\n'))

pveversion -v

Code:

pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.15-pve2
ceph-fuse: 14.2.15-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-network-perl: 0.4-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

kifeo · Nov 27, 2020

I encountered also this issue after upgrading today

mgabriel · Nov 27, 2020

The problem disappeared suddenly when I upgraded Ceph from Nautilus to Octopus as well. Not sure if this is the reason why it disappeared and I wouldn't recommend doing this on a production cluster while ceph is in HEALTH_ERR.

It would be fine if anyone could test also and confirm that an upgrade to Ceph Octopus resolves the issue.

Kind regards,
Marco

samson_99 · Nov 28, 2020

Hi,

I can confirm, that the upgrade from nautilus to octopus resolves the issue.
helpful link:
https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus
Have a nice weekend
Ralf

Toranaga · Nov 28, 2020

I can confirm too.

Use "ceph mgr module disable dashboard" to get rid of the health error.

Upgrade to octopus as told by samson_99.
The OSDs need a little bit more time to start because of the conversion to "per pool-map omap".

reenable the dashboard "ceph mgr module enable dashboard" when all hosts are on octopus.

vitor.prado · Nov 28, 2020

HI,
I did a new installation of proxmox 6.3 + CEPH 14.2.15 + manager (all new).
I had the same problem reported above.
I would like to keep nautilus running on my cluster.
Any suggestion?

samson_99 · Nov 28, 2020

vitor.prado said:
HI,
I did a new installation of proxmox 6.3 + CEPH 14.2.15 + manager (all new).
I had the same problem reported above.
I would like to keep nautilus running on my cluster.
Any suggestion?

Hmmm, disable Dashboard?

Xislmo · Nov 30, 2020

Same problem here, upgraded to 6.3 on friday. I only have a production cluster at hand so I'd be really thankful if somone from proxmox could give me some advice on how to proceed. Is it considered "save" to upgrade to Octopus? I disabled the dashboard now to get rid of monitoring warnings.

Romsch · Nov 30, 2020

Xislmo said:
Same problem here, upgraded to 6.3 on friday. I only have a production cluster at hand so I'd be really thankful if somone from proxmox could give me some advice on how to proceed. Is it considered "save" to upgrade to Octopus? I disabled the dashboard now to get rid of monitoring warnings.

If you disabled the dashboard, now ceph is heathy und you can proceed to upgrade to optopus - like here described, it works without problems in our test environment
https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus

Julian Lliteras · Nov 30, 2020

Romsch said:
If you disabled the dashboard, now ceph is heathy und you can proceed to upgrade to optopus - like here described, it works without problems in our test environment
https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus

Confirmed.

In my case, upgraded successfully to octopus and back to normality again. No upgrade issues found and new dashboard online.

Xislmo · Nov 30, 2020

Thanks. Upgraded to octopus and had to turn "pg_autoscale_mode" on. All three Ceph cluster manager daemons crashed after doing this but a simple "sudo systemctl stop ceph-mgr*; sudo systemctl start ceph-mgr*" fixed that and now the "1 ceph pools have too many placement groups" health error changed to "1 pools have many more objects per pg than average" and the cluster is rebalancing (or whatever the correct workding with ceph is) one pgs after another.

Glaciar Errante · Dec 2, 2020

After analyzing various ceph-mgr-dashboard deb packages, to me the most likely cause of this issue seem to be a broken ceph-mgr-dashboard_14.2.15-pve2_all.deb. This package contains files (like the mentioned /usr/share/ceph/mgr/dashboard/controllers/orchestrator.py), that seem to belong to the ceph octopus branch.

If this is correct, it should be checked, if other packages are affected as well. The dashboard might not be that important (however if it causes people to migrate it might be), but if other ceph packages are affected as well, it might interfere with more critical components.

inxamc · Dec 2, 2020

Thanks for sharing your experiences!
We run a 3-nodes full meshed hyper-converged ceph cluster - and face the same issue after upgrading to 6.3.2.
Disabling the dashboard, as suggested, lead to a HEALTH_OK status of ceph again.
We will wait with ceph update to octopus until the situation is clarified - thus please keep us informend...

Romsch · Dec 3, 2020

inxamc said:
Thanks for sharing your experiences!
We run a 3-nodes full meshed hyper-converged ceph cluster - and face the same issue after upgrading to 6.3.2.
Disabling the dashboard, as suggested, lead to a HEALTH_OK status of ceph again.
We will wait with ceph update to octopus until the situation is clarified - thus please keep us informend...

You can upgrade to Octopus without any issues! This we have done in our production environment.
Octopus works great with the new features

Upgraded to VE 6.3 ceph manager not starting

Active Member

Proxmox Retired Staff

Well-Known Member

Distinguished Member

Active Member

Well-Known Member

Julian Lliteras​

Renowned Member

Well-Known Member

Renowned Member

New Member

Well-Known Member

Member

New Member

New Member

Well-Known Member

Active Member

New Member

Member

Active Member

Well-Known Member

Julian Lliteras