pve6to7 Fail on CEPH section

SoMoney

Member
Feb 29, 2020
9
1
23
53
Hey guys, was looking to upgrade my proxmox 6.4.1 server this weekend to 7 but it looks like it's having Difficulties with Ceph.

1. root@pve:~# pve6to7
Code:
= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages uptodate

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 6.4-1

Checking running kernel version..
PASS: expected running kernel '5.4.189-2-pve'.

= CHECKING CLUSTER HEALTH/SETTINGS =

SKIP: standalone node.

= CHECKING HYPER-CONVERGED CEPH STATUS =

INFO: hyper-converged ceph setup detected!
INFO: getting Ceph status/health information..
FAIL: failed to get 'noout' flag status - got timeout

FAIL: unable to determine Ceph status!
INFO: getting Ceph daemon versions..
FAIL: unable to determine Ceph daemon versions!
WARN: 'noout' flag not set - recommended to prevent rebalancing during cluster-wide upgrades.
INFO: checking Ceph config..

= CHECKING CONFIGURED STORAGES =

PASS: storage 'BIGNAS' enabled and active.
PASS: storage 'ISO' enabled and active.
PASS: storage 'NVME-STORE' enabled and active.
PASS: storage 'local' enabled and active.
PASS: storage 'local-zfs' enabled and active.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for running guests..
PASS: no running guest detected.
INFO: Checking if the local node's hostname 'pve' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '10.12.153.5' configured and active on single interface.
INFO: Checking backup retention settings..
PASS: no problems found.
INFO: checking CIFS credential location..
PASS: no CIFS credentials at outdated location found.
INFO: Checking custom roles for pool permissions..
INFO: Checking node and guest description/note legnth..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking storage content type configuration..
PASS: no problems found
INFO: Checking if the suite for the Debian security repository is correct..
INFO: Make sure to change the suite of the Debian security repository from 'buster/updates' to 'bullseye-security' - in /etc/apt/sources.list:6
WARN: Found at least one CT (103) which does not support running in a unified cgroup v2 layout.
    Either upgrade the Container distro or set systemd.unified_cgroup_hierarchy=0 in the Proxmox VE hosts' kernel cmdline! Skipping further CT compat checks.

= SUMMARY =

TOTAL:    25
PASSED:   19
SKIPPED:  1
WARNINGS: 2
FAILURES: 3

ATTENTION: Please check the output for detailed information!
Try to solve the problems one at a time and then run this checklist tool again.

2. root@pve:~# cat /etc/pve/ceph.conf
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.12.153.5/21
         fsid = 1c0487fc-6076-4060-bf9a-e27d74c5bf41
         mon_allow_pool_delete = true
         mon_host = 10.12.153.5
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.12.153.5/21

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve]
         host = pve
         mds_standby_for_name = pve

[mon.pve]
         public_addr = 10.12.153.5

3. I did something stupid and decided to pveceph purge, init, and recreate

4. All the pveceph commands except init return timeout
TASK ERROR: got timeout

I'm at a loss and need some next steps if I should bother getting ceph working again with the old keyrings, or just give up on pve6to7...

Thanks!
 
Last edited:
!RESOLVED!

Since I wasn't using Ceph pools and it was only the monitor that was broken(thank god) I decided to stop/disable and remove all the configs and hope the proxmox gui would rebuild the monitor for me when it saw it unconfigured...

systemctl status ceph-mon.target
systemctl stop ceph-mon@pve
systemctl disable ceph-mon@pve
mv /etc/ceph /etc/ceph.bak
mkdir -p /etc/ceph && chown -R ceph:ceph /etc/ceph
mv /var/lib/ceph/ /var/lib/ceph.bak
mkdir -p /var/lib/ceph/mon && chown ceph:ceph /var/lib/ceph/mon

When I clicked the Ceph icon sure enough proxmox started to configure it again, I then went to monitor and created a Monitor.
After a final reboot and running pve6to7 I now have 0 failures...

I was then able to successfully upgrade to 7.2-7 following this guide
https://dannyda.com/2021/07/06/how-...6-4-11-to-7-0-8-latest-pve-7-release-version/
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!