Ceph issue

Alex_u-94

New Member
Feb 9, 2022
8
1
1
42
A few days ago, I had a VM hang on a cluster of 4 servers. 3 servers have 2 SSDs per server for CEPH, in total 6 SSDs. At the time of the problem, version 7.1 was installed. CEPH 16.2.7 was also launched.
There are 3 Ethernet interfaces on each node. Two built-in motherboard (1Gb) and one 10Gb PCIE. 1Gb interfaces are used to provide access to VMs, PROXMOX management interface and cluster operation. The 10G interface has been dedicated to CEPH. IPv6 subnets were created for the cluster and CEPH.

At the time of the problem, I found that one of the nodes was placed in the CEPH monitor as not working (node B). I have rebooted this node. After returning back, nothing has changed. At this point, I obviously made a fatal mistake. Because in any case, all VMs were not working, I rebooted all nodes on which CEPH was used as storage for VMs and updated the system to the current version 7.2-7.

After the update, I found that CEPH completely stopped working and did not respond to ceph -s.
After conducting a series of tests, I found out that all nodes lost the ability to transmit IPv6 packets through 10G interfaces, while IPv4 packets were transmitted without any restrictions. There was no such problem on other interfaces. I also tested IPv6 on these interfaces using Linux BRIDGE and OVS BRIDGE, it was the same... only IPv4 was working...

Today I lost hope of restoring the IPv6 network. I made a number of changes and attached IPv6 subnets intended for CEPH to the 1Gb interfaces. As a result of these actions, CEPH started working in a few minutes. But in the monitor I found that 3 out of 6 OSDs did not start. All non-working OSDs had an old version. I checked the SMART status on all SSDs that included non-working OSDs. All SSDs had 3-4% wearout (a series for servers) and SMART without errors. After a few minutes, another OSD went into the DOWN state. As a result, I had two nodes with a non working OSD.

I tried to figure out what the problem was. I bring the output of commands to NodeA (the same for others).

Code:
root@node-A:~# systemctl status ceph-osd@0.service
● ceph-osd@0.service - Ceph object storage daemon osd.0
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: failed (Result: signal) since Mon 2022-09-05 15:55:38 EEST; 2h 31min ago
    Process: 5938 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 0 (code=exited, status=0/SUCCESS)
    Process: 5942 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 0 --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
   Main PID: 5942 (code=killed, signal=ABRT)
        CPU: 1min 5.361s

Sep 05 15:55:28 node-A systemd[1]: ceph-osd@0.service: Consumed 1min 5.361s CPU time.
Sep 05 15:55:38 node-A systemd[1]: ceph-osd@0.service: Scheduled restart job, restart counter is at 3.
Sep 05 15:55:38 node-A systemd[1]: Stopped Ceph object storage daemon osd.0.
Sep 05 15:55:38 node-A systemd[1]: ceph-osd@0.service: Consumed 1min 5.361s CPU time.
Sep 05 15:55:38 node-A systemd[1]: ceph-osd@0.service: Start request repeated too quickly.
Sep 05 15:55:38 node-A systemd[1]: ceph-osd@0.service: Failed with result 'signal'.
Sep 05 15:55:38 node-A systemd[1]: Failed to start Ceph object storage daemon osd.0.

Code:
root@node-A:~# ceph-volume lvm list


====== osd.0 =======

  [block]       /dev/ceph-1690a548-18f3-4ef6-936c-36dca7faa707/osd-block-2f745f4d-3155-4811-89fd-4fd56902fbae

      block device              /dev/ceph-1690a548-18f3-4ef6-936c-36dca7faa707/osd-block-2f745f4d-3155-4811-89fd-4fd56902fbae
      block uuid                9tbjoJ-uckJ-tOKo-1IeH-ELp1-41hm-RAUSwF
      cephx lockbox secret      
      cluster fsid              e3a3a2e8-7d85-432c-8ba3-f98f4e54c96e
      cluster name              ceph
      crush device class        ssd
      encrypted                 0
      osd fsid                  2f745f4d-3155-4811-89fd-4fd56902fbae
      osd id                    0
      osdspec affinity          
      type                      block
      vdo                       0
      devices                   /dev/sdb

====== osd.1 =======

  [block]       /dev/ceph-d73181a4-5030-46f8-abda-5f2e4bc3311a/osd-block-09c04b90-ba59-4bd0-b69f-113828c676cc

      block device              /dev/ceph-d73181a4-5030-46f8-abda-5f2e4bc3311a/osd-block-09c04b90-ba59-4bd0-b69f-113828c676cc
      block uuid                rrcKqT-2zDY-gjD6-Tw3f-YCx0-MNeg-XNIw7j
      cephx lockbox secret      
      cluster fsid              e3a3a2e8-7d85-432c-8ba3-f98f4e54c96e
      cluster name              ceph
      crush device class        ssd
      encrypted                 0
      osd fsid                  09c04b90-ba59-4bd0-b69f-113828c676cc
      osd id                    1
      osdspec affinity          
      type                      block
      vdo                       0
      devices                   /dev/sdc

Code:
root@node-A:/etc/sysctl.d# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   63G     0   63G   0% /dev
tmpfs                  13G  1.4M   13G   1% /run
/dev/mapper/pve-root   59G   11G   45G  19% /
tmpfs                  63G   66M   63G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
/dev/fuse             128M   52K  128M   1% /etc/pve
tmpfs                  63G   24K   63G   1% /var/lib/ceph/osd/ceph-1
tmpfs                  63G   24K   63G   1% /var/lib/ceph/osd/ceph-0
tmpfs                  13G     0   13G   0% /run/user/0

After some time, I found that all the OSDs were missing from the Proxmox WEB interface.

OSD.png

CRASH also does not work correctly due to the absence of a keyring

Code:
auth: unable to find a keyring on /etc/pve/priv/ceph.client.crash.keyring: (2) No such file or directory

Code:
root@node-A:/etc/pve/priv# ls -l
total 4
drwx------ 2 root www-data    0 Jan 18  2022 acme
-rw------- 1 root www-data 1679 Sep  5 09:37 authkey.key
-rw------- 1 root www-data 1573 Feb  9  2022 authorized_keys
drwx------ 2 root www-data    0 Jan 19  2022 ceph
-rw------- 1 root www-data  151 Jan 19  2022 ceph.client.admin.keyring
-rw------- 1 root www-data  228 Jan 19  2022 ceph.mon.keyring
-rw------- 1 root www-data 5074 Feb  9  2022 known_hosts
drwx------ 2 root www-data    0 Jan 18  2022 lock
-rw------- 1 root www-data 3243 Jan 18  2022 pve-root-ca.key
-rw------- 1 root www-data    3 Feb  9  2022 pve-root-ca.srl
drwx------ 2 root www-data    0 Feb 10  2022 storage

At the moment, I have a need to make a backup of one of the VMs, all other services have been successfully restored and are working.

Current CEPH status:

Code:
root@node-A:/etc/pve/priv# ceph -s
  cluster:
    id:     e3a3a2e8-7d85-432c-8ba3-f98f4e54c96e
    health: HEALTH_WARN
            2 osds down
            2 hosts (4 osds) down
            Reduced data availability: 129 pgs inactive, 129 pgs stale
            Degraded data redundancy: 188104/282156 objects degraded (66.667%), 129 pgs degraded, 129 pgs undersized
            36 pgs not deep-scrubbed in time
            14 daemons have recently crashed
 
  services:
    mon: 4 daemons, quorum node-A,node-B,node-C,node-D (age 3h)
    mgr: node-D(active, since 4h), standbys: node-A
    osd: 6 osds: 2 up (since 3h), 4 in (since 3h)
 
  data:
    pools:   2 pools, 129 pgs
    objects: 94.05k objects, 357 GiB
    usage:   348 GiB used, 129 GiB / 477 GiB avail
    pgs:     100.000% pgs not active
             188104/282156 objects degraded (66.667%)
             129 stale+undersized+degraded+peered

I will be grateful for any help.
 
Hi,
Did you take a look into the logs in /var/log/ceph there is the "ceph.log" as well as logs for the osds. Maybe you find some more specific error in there.
 
Finally I managed to eliminate the original cause of the CEPH failure. It was a hardware problem with network equipment (switch + some network cards). At the moment I have a problem with CRASH.
CRASH logs contain keyring error: auth: unable to find a keyring on /etc/pve/priv/app.client.crash.keyring
There are really no files on the nodes.

Please tell me the correct format of the command for generating new keys.

Code:
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash'
 
By adding a -o /etc/pve/priv/FOOBAR.keyring you can save the created keyring into a file.
 
By adding a -o /etc/pve/priv/FOOBAR.keyring you can save the created keyring into a file.

That is, to restore a lost keyring, I need to run the following command on each node and restart CRASH?

Code:
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' -o /etc/pve/priv/ceph.client.crash.keyring
 
I added a keyring once (it has spread to all nodes)
Code:
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' -o /etc/pve/priv/ceph.client.crash.keyring

I checked the logs, an additional key was required for each node
On each node executed the command corresponding to the node name
Code:
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' > /etc/pve/priv/ceph.client.crash.node-A.keyring
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' > /etc/pve/priv/ceph.client.crash.node-B.keyring
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' > /etc/pve/priv/ceph.client.crash.node-C.keyring
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' > /etc/pve/priv/ceph.client.crash.node-D.keyring

Now I have new errors

Code:
root@node-B:~# ceph-crash
INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s
WARNING:ceph-crash:post /var/lib/ceph/crash/2022-09-05T09:33:33.467417Z_d7ab3acb-42a0-4eb5-89ed-624c337b202a as client.crash.node-B failed: (None, b'2022-09-12T16:14:43.896+0300 7f4d35d9b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]\n2022-09-12T16:14:46.896+0300 7f4d35d9b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]\n[errno 1] RADOS permission error (error connecting to the cluster)\n')
WARNING:ceph-crash:post /var/lib/ceph/crash/2022-09-05T09:33:33.467417Z_d7ab3acb-42a0-4eb5-89ed-624c337b202a as client.crash failed: (None, b'')
WARNING:ceph-crash:post /var/lib/ceph/crash/2022-09-05T09:33:33.467417Z_d7ab3acb-42a0-4eb5-89ed-624c337b202a as client.admin failed: (None, b'')
WARNING:ceph-crash:post /var/lib/ceph/crash/2022-09-06T09:50:56.579668Z_3febaacd-e55f-474b-ba43-5c7a797266ff as client.crash.node-B failed: (None, b'2022-09-12T16:15:47.004+0300 7fda8ad9d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]\n2022-09-12T16:15:47.004+0300 7fda8a59c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]\n2022-09-12T16:15:47.004+0300 7fda89d9b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]\n[errno 13] RADOS permission denied (error connecting to the cluster)\n')

Each node has the same thing, only the name of the node itself changes

Please tell me what I did wrong?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!