rbd error: rbd: listing images failed: (2) No such file or directory (500)

sai.dasari

New Member
Oct 23, 2020
18
0
1
35
Hi,

Since past two days I'm unable to delete any VM's in our proxmox cluster when trying to look for rbd images or disks list in Proxmox UI I get the below error.

1625126849040.png

But when I search for images through cli using rbd ls hdd. I am able to fetch the list.

I found this log in ceph logs

2021-06-30T10:02:05.825+0530 7fc35da43700 0 [pg_autoscaler ERROR root] pg_num adjustment on ssd-cache to 128 failed: (-1, '', 'splits in cache pools must be followed by scrubs and leave sufficient free space to avoid overfilling. use --yes-i-really-mean-it to force.')

I really don't know if this is the reason. Why it is showing that way. Can someone please help me out. All I can do is go to vm.conf file and comment out the disk file and move the config file to a separate directory and VM entry gets removed from UI. But I don't think this is a valid solution. Migration is also not working from node to node. I am moving the vm.conf file to the required node folder manually in proxmox to do the migration.

I found this thread https://forum.proxmox.com/threads/r...failed-2-no-such-file-or-directory-500.56577/

but I don't know if rebuilding the whole pool is a good idea As we have production level deployments running in here.

Can anyone please help me. I really don't know what to do here.

Thanks,
Sai.
 
What is the current status of the pool?

Can you post the results of the following commands in [code][/code] blocks?
Code:
ceph -s
ceph df
ceph osd df tree
 
Hello Aaron,

Thank you for the quick response.

These are the outputs for the commands you sent
Code:
cluster:
    id:     f1f583c0-f576-4b47-k908-110cc20055d7
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum dell-r730xd-1,dell-r730xd-2,dell-r730xd-3 (age 2w)
    mgr: dell-r730xd-1(active, since 2w)
    osd: 26 osds: 25 up (since 2w), 25 in (since 46h)

  data:
    pools:   3 pools, 161 pgs
    objects: 645.15k objects, 1.7 TiB
    usage:   5.0 TiB used, 8.0 TiB / 13 TiB avail
    pgs:     161 active+clean

  io:
    client:   15 KiB/s rd, 1.3 MiB/s wr, 0 op/s rd, 87 op/s wr

ceph df

Code:
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    8.7 TiB  6.5 TiB  2.2 TiB   2.3 TiB      25.92
ssd    4.2 TiB  1.5 TiB  2.7 TiB   2.7 TiB      64.67
TOTAL   13 TiB  8.0 TiB  4.9 TiB   5.0 TiB      38.49

--- POOLS ---
POOL                   ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1    1   13 MiB       26   38 MiB      0    723 GiB
ssd-cache               2   32  964 GiB  339.10k  2.7 TiB  80.39    225 GiB
hdd                     3  128  765 GiB  306.03k  2.2 TiB  27.26    2.0 TiB

ceph osd df tree
Code:
ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP      META      AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         13.47057         -   13 TiB  5.0 TiB  4.9 TiB   142 MiB    26 GiB  8.0 TiB  38.49  1.00    -          root default
 -3          4.67212         -  4.1 TiB  1.6 TiB  1.6 TiB    44 MiB   8.3 GiB  2.5 TiB  39.55  1.03    -              host dell-r730xd-1
  3    hdd   0.54579         0      0 B      0 B      0 B       0 B       0 B      0 B      0     0    0    down          osd.3
  4    hdd   0.54579   1.00000  559 GiB  150 GiB  149 GiB   642 KiB  1023 MiB  408 GiB  26.92  0.70   25      up          osd.4
  5    hdd   0.54579   1.00000  559 GiB  163 GiB  162 GiB   1.1 MiB  1023 MiB  396 GiB  29.23  0.76   27      up          osd.5
  6    hdd   0.54579   1.00000  559 GiB  151 GiB  150 GiB   2.3 MiB  1022 MiB  408 GiB  26.97  0.70   25      up          osd.6
  7    hdd   0.54579   1.00000  559 GiB  139 GiB  138 GiB   1.9 MiB  1022 MiB  420 GiB  24.87  0.65   23      up          osd.7
  8    hdd   0.54579   1.00000  559 GiB  169 GiB  168 GiB    16 MiB  1008 MiB  390 GiB  30.22  0.79   29      up          osd.8
  0    ssd   0.46579   1.00000  477 GiB  376 GiB  375 GiB   9.5 MiB   1.2 GiB  100 GiB  78.93  2.05   13      up          osd.0
  1    ssd   0.46579   1.00000  477 GiB  261 GiB  259 GiB   7.5 MiB   1.1 GiB  216 GiB  54.63  1.42    9      up          osd.1
  2    ssd   0.46579   1.00000  477 GiB  262 GiB  261 GiB   5.1 MiB   1.0 GiB  215 GiB  54.90  1.43    9      up          osd.2
 -7          4.67212         -  4.7 TiB  1.7 TiB  1.7 TiB    45 MiB   9.3 GiB  3.0 TiB  36.73  0.95    -              host dell-r730xd-2
 12    hdd   0.54579   1.00000  559 GiB  133 GiB  132 GiB   1.7 MiB  1022 MiB  426 GiB  23.72  0.62   22      up          osd.12
 13    hdd   0.54579   1.00000  559 GiB  145 GiB  144 GiB   431 KiB  1024 MiB  414 GiB  25.88  0.67   24      up          osd.13
 14    hdd   0.54579   1.00000  559 GiB  127 GiB  126 GiB   1.4 MiB  1023 MiB  432 GiB  22.65  0.59   21      up          osd.14
 15    hdd   0.54579   1.00000  559 GiB  127 GiB  126 GiB    13 MiB  1011 MiB  432 GiB  22.70  0.59   22      up          osd.15
 16    hdd   0.54579   1.00000  559 GiB  170 GiB  169 GiB   2.2 MiB  1022 MiB  389 GiB  30.41  0.79   28      up          osd.16
 17    hdd   0.54579   1.00000  559 GiB  133 GiB  132 GiB   1.7 MiB  1022 MiB  426 GiB  23.72  0.62   22      up          osd.17
  9    ssd   0.46579   1.00000  477 GiB  286 GiB  285 GiB   9.1 MiB   1.1 GiB  191 GiB  60.06  1.56   10      up          osd.9
 10    ssd   0.46579   1.00000  477 GiB  259 GiB  258 GiB   4.9 MiB   1.0 GiB  218 GiB  54.36  1.41    9      up          osd.10
 11    ssd   0.46579   1.00000  477 GiB  378 GiB  377 GiB    11 MiB   1.2 GiB   99 GiB  79.28  2.06   13      up          osd.11
-10          4.12633         -  4.1 TiB  1.6 TiB  1.6 TiB    52 MiB   8.5 GiB  2.5 TiB  39.41  1.02    -              host dell-r730xd-3
 21    hdd   0.54579   1.00000  559 GiB  133 GiB  132 GiB    13 MiB  1011 MiB  426 GiB  23.77  0.62   23      up          osd.21
 22    hdd   0.54579   1.00000  559 GiB  156 GiB  155 GiB   906 KiB  1023 MiB  403 GiB  27.94  0.73   26      up          osd.22
 23    hdd   0.54579   1.00000  559 GiB  152 GiB  151 GiB   2.4 MiB  1022 MiB  407 GiB  27.19  0.71   25      up          osd.23
 24    hdd   0.54579   1.00000  559 GiB  114 GiB  113 GiB  1003 KiB  1023 MiB  445 GiB  20.42  0.53   19      up          osd.24
 25    hdd   0.54579   1.00000  559 GiB  157 GiB  156 GiB   878 KiB  1023 MiB  402 GiB  28.04  0.73   26      up          osd.25
 18    ssd   0.46579   1.00000  477 GiB  288 GiB  287 GiB   9.3 MiB   1.1 GiB  189 GiB  60.36  1.57   10      up          osd.18
 19    ssd   0.46579   1.00000  477 GiB  315 GiB  314 GiB    13 MiB   1.2 GiB  162 GiB  66.09  1.72   11      up          osd.19
 20    ssd   0.46579   1.00000  477 GiB  350 GiB  349 GiB    12 MiB   1.2 GiB  127 GiB  73.43  1.91   12      up          osd.20
                         TOTAL   13 TiB  5.0 TiB  4.9 TiB   142 MiB    26 GiB  8.0 TiB  38.49
MIN/MAX VAR: 0.53/2.06  STDDEV: 19.66
 
Last edited:
Please put the output in [code][/code] blocks as it is really hard to impossible to read otherwise.

One more question, do you get that error on each of the nodes in the cluster or just with one?
 
Thanks for editing the post :)
One thing that I noticed is that the OSDs do have quite few PGs. All have below, some way below, 50 PGs. Ideally you would have around 100 Pgs per OSD in the ceph osd df tree output. Though anything over 50 is okayish.

Do you use the autoscaler? If so, you should set the target_ratio for each pool so the autoscaler knows how much space each pool will approximately consume. Otherwise, it can only use the current usage for the estimation. Once the pool grows, it will only become active once the ideal number of PGs for the pool is off from the current number of PGs by a factor of 3.

Since you have 2 pools (we can ignore the device_health_metrics pool as it usually stays very small) and from what I can see, each of those had a custom crush rule limiting it to either the SSDs or HDDs OSDs.

That means each pool can take up all the space. You can then set the target_ratio for both to 1 and enable the autoscaler.

You can calculate the number of PGs which would be good for the pools with the PG Calculator (unfortunately the Ceph website got reworked, and it seems to be missing there, therefore the link to the web archive).

For the HDD Pool with 17 OSDs (also for 18) the pg_num would best be 512. For the ssd pool with its 9 OSDs the pg_num should be 256.

If you do set the target_ratio and enable the autoscaler, it most likely will come to the same conclusion.
 
Thank you for the response aaron. So What do I do here. Yesterday I deleted an osd which is showing down status. Also checked the hardware two physical disks are down and not working completely. So could this cause the issue how do I find the osd's linked with these disks and remove them Before I deleted the OSD we have 25 OSD. Now also it is still showing 25 but osd 3 is missing as I deleted it and the issue still persists. Can you please advise what to do here.
 
Okay there is quite a bit there :)

So What do I do here. Yesterday I deleted an osd which is showing down status. Also checked the hardware two physical disks are down and not working completely. So could this cause the issue
What is the ceph status? ( ceph -s )

how do I find the osd's linked with these disks and remove them
ceph device ls should give you a list that should help you identify which OSD is using which physical disk on which host.

Now also it is still showing 25 but osd 3 is missing as I deleted it and the issue still persists. Can you please advise what to do here.
How did you delete OSD 3?
 
Hello Aaron,

I found out in the logs that someone has did a dist upgrade in node 1 and node 3. What does dist upgrade do?
I did a bit of googling and found out that certain packages will upgraded but not sure how that caused the issue which I am facing currently,
Could you or anyone help me out.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!