[SOLVED] Ceph upgrade to 12.2.10 hang

udo

Distinguished Member
Apr 22, 2009
5,977
201
163
Ahrensburg; Germany
Hi,
just tried an upgrade on the first node and the process hang, without activity.
Code:
root@pve01:~# apt dist-upgrade
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  ceph ceph-base ceph-common ceph-fuse ceph-mds ceph-mgr ceph-mon ceph-osd libcephfs2 librados2 libradosstriper1 librbd1 librgw2 python-ceph python-cephfs python-rados
  python-rbd python-rgw
18 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 51.6 MB of archives.
After this operation, 1,476 kB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 ceph-mds amd64 12.2.10-pve1 [3,611 kB]
Get:2 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 ceph-osd amd64 12.2.10-pve1 [14.2 MB]
Get:3 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 ceph-mon amd64 12.2.10-pve1 [4,512 kB]
Get:4 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 ceph-base amd64 12.2.10-pve1 [3,363 kB]
Get:5 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 ceph-common amd64 12.2.10-pve1 [13.0 MB]
Get:6 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 ceph amd64 12.2.10-pve1 [7,474 B]
Get:7 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 ceph-mgr amd64 12.2.10-pve1 [3,535 kB]
Get:8 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 librgw2 amd64 12.2.10-pve1 [1,820 kB]
Get:9 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 libradosstriper1 amd64 12.2.10-pve1 [322 kB]
Get:10 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 librbd1 amd64 12.2.10-pve1 [999 kB]
Get:11 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 python-rgw amd64 12.2.10-pve1 [98.4 kB]
Get:12 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 python-rados amd64 12.2.10-pve1 [291 kB]
Get:13 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 python-rbd amd64 12.2.10-pve1 [155 kB]
Get:14 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 python-ceph amd64 12.2.10-pve1 [7,406 B]
Get:15 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 python-cephfs amd64 12.2.10-pve1 [95.3 kB]
Get:16 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 libcephfs2 amd64 12.2.10-pve1 [411 kB]
Get:17 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 librados2 amd64 12.2.10-pve1 [2,716 kB]
Get:18 http://download.proxmox.com/debian/ceph-luminous stretch/main amd64 ceph-fuse amd64 12.2.10-pve1 [2,463 kB]
Fetched 51.6 MB in 4s (12.2 MB/s)     
Reading changelogs... Done
(Reading database ... 117388 files and directories currently installed.)
Preparing to unpack .../00-ceph-mds_12.2.10-pve1_amd64.deb ...
Unpacking ceph-mds (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../01-ceph-osd_12.2.10-pve1_amd64.deb ...
Unpacking ceph-osd (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../02-ceph-mon_12.2.10-pve1_amd64.deb ...
Unpacking ceph-mon (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../03-ceph-base_12.2.10-pve1_amd64.deb ...
Unpacking ceph-base (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../04-ceph-common_12.2.10-pve1_amd64.deb ...
Unpacking ceph-common (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../05-ceph_12.2.10-pve1_amd64.deb ...
Unpacking ceph (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../06-ceph-mgr_12.2.10-pve1_amd64.deb ...
Unpacking ceph-mgr (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../07-librgw2_12.2.10-pve1_amd64.deb ...
Unpacking librgw2 (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../08-libradosstriper1_12.2.10-pve1_amd64.deb ...
Unpacking libradosstriper1 (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../09-librbd1_12.2.10-pve1_amd64.deb ...
Unpacking librbd1 (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../10-python-rgw_12.2.10-pve1_amd64.deb ...
Unpacking python-rgw (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../11-python-rados_12.2.10-pve1_amd64.deb ...
Unpacking python-rados (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../12-python-rbd_12.2.10-pve1_amd64.deb ...
Unpacking python-rbd (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../13-python-ceph_12.2.10-pve1_amd64.deb ...
Unpacking python-ceph (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../14-python-cephfs_12.2.10-pve1_amd64.deb ...
Unpacking python-cephfs (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../15-libcephfs2_12.2.10-pve1_amd64.deb ...
Unpacking libcephfs2 (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../16-librados2_12.2.10-pve1_amd64.deb ...
Unpacking librados2 (12.2.10-pve1) over (12.2.8-pve1) ...
Preparing to unpack .../17-ceph-fuse_12.2.10-pve1_amd64.deb ...
Unpacking ceph-fuse (12.2.10-pve1) over (12.2.8-pve1) ...
Setting up ceph-fuse (12.2.10-pve1) ...
Setting up librados2 (12.2.10-pve1) ...
Setting up libcephfs2 (12.2.10-pve1) ...
Processing triggers for libc-bin (2.24-11+deb9u3) ...
Processing triggers for systemd (232-25+deb9u6) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up python-rados (12.2.10-pve1) ...
Setting up python-cephfs (12.2.10-pve1) ...
Setting up libradosstriper1 (12.2.10-pve1) ...
Setting up librgw2 (12.2.10-pve1) ...
Setting up python-rgw (12.2.10-pve1) ...
Setting up librbd1 (12.2.10-pve1) ...
Setting up python-rbd (12.2.10-pve1) ...
Setting up ceph-common (12.2.10-pve1) ...
Setting system user ceph properties..usermod: no changes
..done
Fixing /var/run/ceph ownership....done


Progress: [ 82%] [############################################################################################################################...........................]
related processes:
Code:
root@pve01:~# ps aux | grep ceph | grep -v kvm
ceph        5201  0.0  0.1 441036 50368 ?        Ssl  Dec07  11:01 /usr/bin/ceph-mgr -f --cluster ceph --id pve01 --setuser ceph --setgroup ceph
ceph        5210  2.6  0.9 829184 298236 ?       Ssl  Dec07 406:38 /usr/bin/ceph-mon -f --cluster ceph --id pve01 --setuser ceph --setgroup ceph
ceph        5252  0.0  0.0 360376 27056 ?        Ssl  Dec07   7:44 /usr/bin/ceph-mds -f --cluster ceph --id pve01 --setuser ceph --setgroup ceph
ceph        5627  0.0  0.0 349104 22264 ?        Ssl  Dec07   7:14 /usr/bin/ceph-mds -i pve01 --pid-file /var/run/ceph/mds.pve01.pid -c /etc/ceph/ceph.conf --cluster ceph --setuser ceph --setgroup ceph
ceph        5969  1.1  6.4 3055796 2126252 ?     Ssl  Dec07 184:20 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root        6691  0.0  0.0      0     0 ?        I<   Dec07   0:00 [ceph-msgr]
root        6701  0.0  0.0      0     0 ?        I<   Dec07   0:00 [ceph-watch-noti]
ceph       10249  1.4  6.4 3008676 2117496 ?     Ssl  Dec07 231:04 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
root     2939812  0.0  0.0   4292  1504 pts/2    S+   10:51   0:00 /bin/sh /var/lib/dpkg/info/ceph-common.postinst configure 12.2.8-pve1
root     2939857  0.0  0.0  17772  4024 pts/2    S+   10:51   0:00 perl /usr/bin/deb-systemd-invoke start ceph.target rbdmap.service
root     2939861  0.0  0.0  39600  4184 pts/2    S+   10:51   0:00 /bin/systemctl start ceph.target
root     2950146  0.0  0.0  12788   936 pts/3    S+   11:00   0:00 grep ceph
ceph versions still show the old 12.2.8

Udo
 
ceph versions still show the old 12.2.8
I just ran an upgrade on my test cluster from Ceph 12.28 -> 12.2.10 and didn't see this.

What does your 'pveversion -v' say? And if you kill and redo the upgrade, does it appear again?
 
Hi Alwin,
after kill the "/bin/systemctl start ceph.target" the dist-upgrade finshed.
My versions (now):
Code:
pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
ceph: 12.2.10-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
On the other nodes the upgrade runs without interaction.
The only thing was, that an ceph restart don't work - I had to kill the mon, but after that systemctl restart ceph work:
Code:
systemctl restart ceph
Job for ceph.service failed because the control process exited with error code.
See "systemctl status ceph.service" and "journalctl -xe" for details.
ps aux | grep ceph | grep -v kvm
ceph        4888  2.9  0.5 886368 322680 ?       Ssl  Dec05 543:38 /usr/bin/ceph-mon -f --cluster ceph --id pve03 --setuser ceph --setgroup ceph
ceph        5780  1.4  4.0 3148272 2329424 ?     Ssl  Dec05 277:32 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
ceph        6120  0.7  3.0 2521440 1750152 ?     Ssl  Dec05 132:58 /usr/bin/ceph-osd -f --cluster ceph --id 9 --setuser ceph --setgroup ceph
ceph        6406  2.4  4.1 3192832 2417776 ?     Ssl  Dec05 456:04 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
root        6752  0.0  0.0      0     0 ?        I<   Dec05   0:00 [ceph-msgr]
root        6772  0.0  0.0      0     0 ?        I<   Dec05   0:00 [ceph-watch-noti]
ceph     3501844  0.1  0.0 340020 12836 ?        Ssl  12:05   0:00 /usr/bin/ceph-mds -i pve03 --pid-file /var/run/ceph/mds.pve03.pid -c /etc/ceph/ceph.conf --cluster ceph --setuser ceph --setgroup ceph
ceph     3502313  1.0  0.0 348216 14320 ?        Ssl  12:05   0:00 /usr/bin/ceph-mds -f --cluster ceph --id pve03 --setuser ceph --setgroup ceph
root     3502474  0.0  0.0  12788   960 pts/5    S+   12:05   0:00 grep ceph

kill 4888
systemctl restart ceph
After restarting the osds too, the cluster is up-to-date
Code:
ceph versions
{
    "mon": {
        "ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)": 3
    },
    "mgr": {
        "ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)": 3
    },
    "osd": {
        "ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)": 10
    },
    "mds": {
        "ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)": 3
    },
    "overall": {
        "ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)": 19
    }
}
I will mark this as solved.

Udo
 
after kill the "/bin/systemctl start ceph.target" the dist-upgrade finshed.
Was the start of the ceph.target at the package upgrade or a manual task afterwards?
 
Was the start of the ceph.target at the package upgrade or a manual task afterwards?
Hi Alwin,
was startet through the update process,
all had the same start date (10:51):
Code:
root     2939812  0.0  0.0   4292  1504 pts/2    S+   10:51   0:00 /bin/sh /var/lib/dpkg/info/ceph-common.postinst configure 12.2.8-pve1
root     2939857  0.0  0.0  17772  4024 pts/2    S+   10:51   0:00 perl /usr/bin/deb-systemd-invoke start ceph.target rbdmap.service
root     2939861  0.0  0.0  39600  4184 pts/2    S+   10:51   0:00 /bin/systemctl start ceph.target
Udo
 
Was there a ceph service that wasn't running? As the 'ceph.target' hanged, maybe there might be something in the logs.