pve6to7 Fail on CEPH section

SoMoney

Member
Feb 29, 2020
9
1
23
55
Hey guys, was looking to upgrade my proxmox 6.4.1 server this weekend to 7 but it looks like it's having Difficulties with Ceph.

1. root@pve:~# pve6to7
Code:
= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages uptodate

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 6.4-1

Checking running kernel version..
PASS: expected running kernel '5.4.189-2-pve'.

= CHECKING CLUSTER HEALTH/SETTINGS =

SKIP: standalone node.

= CHECKING HYPER-CONVERGED CEPH STATUS =

INFO: hyper-converged ceph setup detected!
INFO: getting Ceph status/health information..
FAIL: failed to get 'noout' flag status - got timeout

FAIL: unable to determine Ceph status!
INFO: getting Ceph daemon versions..
FAIL: unable to determine Ceph daemon versions!
WARN: 'noout' flag not set - recommended to prevent rebalancing during cluster-wide upgrades.
INFO: checking Ceph config..

= CHECKING CONFIGURED STORAGES =

PASS: storage 'BIGNAS' enabled and active.
PASS: storage 'ISO' enabled and active.
PASS: storage 'NVME-STORE' enabled and active.
PASS: storage 'local' enabled and active.
PASS: storage 'local-zfs' enabled and active.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for running guests..
PASS: no running guest detected.
INFO: Checking if the local node's hostname 'pve' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '10.12.153.5' configured and active on single interface.
INFO: Checking backup retention settings..
PASS: no problems found.
INFO: checking CIFS credential location..
PASS: no CIFS credentials at outdated location found.
INFO: Checking custom roles for pool permissions..
INFO: Checking node and guest description/note legnth..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking storage content type configuration..
PASS: no problems found
INFO: Checking if the suite for the Debian security repository is correct..
INFO: Make sure to change the suite of the Debian security repository from 'buster/updates' to 'bullseye-security' - in /etc/apt/sources.list:6
WARN: Found at least one CT (103) which does not support running in a unified cgroup v2 layout.
    Either upgrade the Container distro or set systemd.unified_cgroup_hierarchy=0 in the Proxmox VE hosts' kernel cmdline! Skipping further CT compat checks.

= SUMMARY =

TOTAL:    25
PASSED:   19
SKIPPED:  1
WARNINGS: 2
FAILURES: 3

ATTENTION: Please check the output for detailed information!
Try to solve the problems one at a time and then run this checklist tool again.

2. root@pve:~# cat /etc/pve/ceph.conf
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.12.153.5/21
         fsid = 1c0487fc-6076-4060-bf9a-e27d74c5bf41
         mon_allow_pool_delete = true
         mon_host = 10.12.153.5
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.12.153.5/21

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve]
         host = pve
         mds_standby_for_name = pve

[mon.pve]
         public_addr = 10.12.153.5

3. I did something stupid and decided to pveceph purge, init, and recreate

4. All the pveceph commands except init return timeout
TASK ERROR: got timeout

I'm at a loss and need some next steps if I should bother getting ceph working again with the old keyrings, or just give up on pve6to7...

Thanks!
 
Last edited:
!RESOLVED!

Since I wasn't using Ceph pools and it was only the monitor that was broken(thank god) I decided to stop/disable and remove all the configs and hope the proxmox gui would rebuild the monitor for me when it saw it unconfigured...

systemctl status ceph-mon.target
systemctl stop ceph-mon@pve
systemctl disable ceph-mon@pve
mv /etc/ceph /etc/ceph.bak
mkdir -p /etc/ceph && chown -R ceph:ceph /etc/ceph
mv /var/lib/ceph/ /var/lib/ceph.bak
mkdir -p /var/lib/ceph/mon && chown ceph:ceph /var/lib/ceph/mon

When I clicked the Ceph icon sure enough proxmox started to configure it again, I then went to monitor and created a Monitor.
After a final reboot and running pve6to7 I now have 0 failures...

I was then able to successfully upgrade to 7.2-7 following this guide
https://dannyda.com/2021/07/06/how-...6-4-11-to-7-0-8-latest-pve-7-release-version/
 
Last edited: