CEPH pool shrank at same rate it was being filled

rilesdun · Apr 17, 2023

Last Friday I encountered a strange issue - I was cloning a 1.55TB virtual machine, on NVMe ceph pool. There was 3.5TB of free space on our NVMe pool remaining.

Cloning was normal at first, but sometime throughout the night the actual NVMe pool began to shrink, at the same rate it was being filled (refer to the picture below)

After noticing this the following morning, all VMs using this pool were stuck and I couldn't remove the unfinished clone.
I followed the following steps:
- Stopped clone, it was still running but stuck because out of space
- Removed VM lock from cli
- Attempt to remove the disk of the failed clone but it would not remove until ceph fully healed

I am wondering if anyone has seen this before? You can see, once ceph healed and my the VM disk removed - The datastore grew back up to 3.5TB

Nuke Bloodaxe · Apr 17, 2023

Ah, I've seen this in ZFS. So, your pool in the above picture is probably being shared with another storage location.
In my case, I had a ZFS thin pool, and directory storage on the same device. If I add files to the directory, the ZFS thin pool shrinks accordingly.

Do you have a backup job? Where does it store that data? The above info and questions are to give you some pointers; I don't have CEPH implemented on this end yet.

aaron · Apr 17, 2023

Can you post the output of ceph osd df tree? If the total size is going down, you either have more pools in the same cluster (& device class) that might take up space, or one OSD is getting a lot fuller than others -> the fullest OSD limits the estimated free space. So if one OSD is getting very full, you will see the estimated free space go down.

rilesdun · Apr 18, 2023

Sorry for the delayed response:
Heres the output of - ceph osd df tree

Code:

ID   CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE  DATA     OMAP     META      AVAIL     %USE   VAR   PGS  STATUS  TYPE NAME       
 -1         119.04283         -  119 TiB   46 TiB   48 TiB  1.3 GiB   120 GiB    73 TiB  38.96  1.00    -          root default     
-16          20.04865         -   20 TiB  7.7 TiB  8.4 TiB  248 MiB    15 GiB    12 TiB  38.35  0.98    -              host R-29-12
 14    hdd    9.09569   1.00000  9.1 TiB  3.4 TiB  3.4 TiB  5.4 MiB   8.5 GiB   5.7 TiB  37.21  0.96  217      up          osd.14   
 15    hdd    5.45119   1.00000  5.5 TiB  2.4 TiB  3.1 TiB  232 MiB       0 B   3.1 TiB  43.75  1.12  142      up          osd.15   
 19   nvme    0.93149   1.00000  954 GiB  312 GiB  311 GiB  4.2 MiB   1.3 GiB   641 GiB  32.76  0.84    6      up          osd.19   
 39   nvme    0.93149   1.00000  954 GiB   58 GiB   57 GiB  1.9 MiB  1022 MiB   895 GiB   6.12  0.16    2      up          osd.39   
  1    ssd    1.81940   1.00000  1.8 TiB  693 GiB  691 GiB  618 KiB   1.8 GiB   1.1 TiB  37.20  0.95   20      up          osd.1   
  3    ssd    1.81940   1.00000  1.8 TiB  901 GiB  899 GiB  3.6 MiB   2.3 GiB   962 GiB  48.39  1.24   26      up          osd.3   
 -3          20.48145         -   20 TiB  7.9 TiB  9.0 TiB  521 MiB    17 GiB    13 TiB  38.58  0.99    -              host R1-12-15
  2    hdd    9.09569   1.00000  9.1 TiB  3.2 TiB  3.2 TiB  9.3 MiB   7.5 GiB   5.9 TiB  34.77  0.89  209      up          osd.2   
 11    hdd    5.45099   1.00000  5.5 TiB  2.2 TiB  3.3 TiB  486 MiB       0 B   3.3 TiB  39.80  1.02  129      up          osd.11   
 24   nvme    0.93149   1.00000  954 GiB  628 GiB  627 GiB  3.3 MiB   1.6 GiB   325 GiB  65.88  1.69   13      up          osd.24   
 41   nvme    0.90970   1.00000  932 GiB  157 GiB  156 GiB  3.6 MiB  1020 MiB   774 GiB  16.86  0.43    3      up          osd.41   
  5    ssd    1.81940   1.00000  1.8 TiB  730 GiB  727 GiB   11 MiB   3.3 GiB   1.1 TiB  39.19  1.01   21      up          osd.5   
 10    ssd    1.81940   1.00000  1.8 TiB  762 GiB  760 GiB  4.8 MiB   2.2 GiB   1.1 TiB  40.92  1.05   22      up          osd.10   
 44    ssd    0.45479   1.00000  466 GiB  354 GiB  353 GiB  3.2 MiB   1.2 GiB   112 GiB  76.03  1.95   11      up          osd.44   
 -7          20.03366         -   20 TiB  7.8 TiB  7.8 TiB  107 MiB    21 GiB    12 TiB  39.06  1.00    -              host R1-8-11
  6    hdd    9.09569   1.00000  9.1 TiB  3.6 TiB  3.6 TiB   82 MiB   7.9 GiB   5.5 TiB  39.43  1.01  229      up          osd.6   
  7    hdd    5.45799   1.00000  5.5 TiB  2.1 TiB  2.1 TiB  4.5 MiB   5.7 GiB   3.3 TiB  38.85  1.00  134      up          osd.7   
 40   nvme    0.90970   1.00000  932 GiB  339 GiB  338 GiB  7.6 MiB   1.0 GiB   593 GiB  36.36  0.93   11      up          osd.40   
 43   nvme    0.93149   1.00000  954 GiB  156 GiB  155 GiB  258 KiB   1.1 GiB   798 GiB  16.38  0.42    3      up          osd.43   
  8    ssd    1.81940   1.00000  1.8 TiB  767 GiB  765 GiB  4.7 MiB   2.0 GiB   1.1 TiB  41.19  1.06   23      up          osd.8   
  9    ssd    1.81940   1.00000  1.8 TiB  907 GiB  904 GiB  7.8 MiB   3.0 GiB   956 GiB  48.66  1.25   27      up          osd.9   
-19                 0         -      0 B      0 B      0 B      0 B       0 B       0 B      0     0    -              host R2-14-17
-13          20.05534         -   20 TiB  8.1 TiB  8.1 TiB  287 MiB    21 GiB    12 TiB  40.27  1.03    -              host R2-4-8 
  4    hdd    9.09569   1.00000  9.1 TiB  3.6 TiB  3.6 TiB  5.8 MiB   7.9 GiB   5.5 TiB  39.72  1.02  231      up          osd.4   
 21    hdd    5.45789   0.95001  5.5 TiB  2.1 TiB  2.1 TiB   35 MiB   5.0 GiB   3.3 TiB  39.07  1.00  137      up          osd.21   
 22   nvme    0.93149   1.00000  954 GiB  368 GiB  367 GiB  2.2 MiB   1.4 GiB   586 GiB  38.61  0.99    8      up          osd.22   
 42   nvme    0.93149   1.00000  954 GiB  362 GiB  361 GiB  2.7 MiB  1021 MiB   592 GiB  37.98  0.97    7      up          osd.42   
 12    ssd    1.81940   1.00000  1.8 TiB  782 GiB  779 GiB  237 MiB   3.4 GiB   1.1 TiB  41.98  1.08   27      up          osd.12   
 13    ssd    1.81940   1.00000  1.8 TiB  875 GiB  872 GiB  5.4 MiB   2.3 GiB   988 GiB  46.94  1.20   27      up          osd.13   
-10                 0         -      0 B      0 B      0 B      0 B       0 B       0 B      0     0    -              host R2-9-12
-29          19.12396         -   19 TiB  7.4 TiB  7.4 TiB   33 MiB    22 GiB    12 TiB  38.81  1.00    -              host R214-17
 16    hdd    9.09569   1.00000  9.1 TiB  3.3 TiB  3.3 TiB  5.7 MiB   8.2 GiB   5.8 TiB  36.45  0.94  206      up          osd.16   
 17    hdd    5.45799   1.00000  5.5 TiB  2.4 TiB  2.4 TiB  2.5 MiB   6.3 GiB   3.1 TiB  44.06  1.13  149      up          osd.17   
 20   nvme    0.93149   1.00000  954 GiB  369 GiB  368 GiB  2.3 MiB   1.4 GiB   585 GiB  38.72  0.99    8      up          osd.20   
  0    ssd    1.81940   1.00000  1.8 TiB  635 GiB  632 GiB   14 MiB   3.4 GiB   1.2 TiB  34.11  0.88   20      up          osd.0   
 18    ssd    1.81940   1.00000  1.8 TiB  737 GiB  734 GiB  9.0 MiB   3.1 GiB   1.1 TiB  39.54  1.02   22      up          osd.18   
-33           6.43326         -  6.4 TiB  2.2 TiB  2.2 TiB   31 MiB   7.8 GiB   4.2 TiB  34.71  0.89    -              host R3-12-15
 23   nvme    0.93149   1.00000  954 GiB  319 GiB  318 GiB  6.5 MiB  1018 MiB   635 GiB  33.44  0.86    7      up          osd.23   
 25   nvme    0.93149   1.00000  954 GiB  110 GiB  109 GiB   12 MiB  1012 MiB   844 GiB  11.53  0.30    3      up          osd.25   
 38   nvme    0.93149   1.00000  954 GiB  318 GiB  317 GiB  7.6 MiB   1.3 GiB   635 GiB  33.38  0.86    7      up          osd.38   
 26    ssd    1.81940   1.00000  1.8 TiB  697 GiB  695 GiB  4.3 MiB   2.2 GiB   1.1 TiB  37.44  0.96   21      up          osd.26   
 27    ssd    1.81940   1.00000  1.8 TiB  842 GiB  839 GiB  703 KiB   2.3 GiB  1021 GiB  45.18  1.16   26      up          osd.27   
-41           6.43326         -  6.4 TiB  2.8 TiB  2.8 TiB   35 MiB   8.2 GiB   3.6 TiB  44.14  1.13    -              host R3-16-19
 33   nvme    0.93149   1.00000  954 GiB  276 GiB  275 GiB   12 MiB   1.2 GiB   678 GiB  28.93  0.74    8      up          osd.33   
 34   nvme    0.93149   1.00000  954 GiB  323 GiB  322 GiB  7.1 MiB   1.1 GiB   631 GiB  33.82  0.87    8      up          osd.34   
 35   nvme    0.93149   1.00000  954 GiB  313 GiB  312 GiB  5.1 MiB   1.1 GiB   641 GiB  32.84  0.84    6      up          osd.35   
 36    ssd    1.81940   1.00000  1.8 TiB  907 GiB  904 GiB  5.6 MiB   2.2 GiB   956 GiB  48.66  1.25   27      up          osd.36   
 37    ssd    1.81940   1.00000  1.8 TiB  1.1 TiB  1.1 TiB  4.6 MiB   2.6 GiB   773 GiB  58.48  1.50   33      up          osd.37   
-37           6.43326         -  6.4 TiB  2.4 TiB  2.4 TiB  105 MiB   8.4 GiB   4.0 TiB  37.18  0.95    -              host R3-8-11
 28   nvme    0.93149   1.00000  954 GiB  223 GiB  222 GiB   85 MiB   1.2 GiB   731 GiB  23.42  0.60    8      up          osd.28   
 29   nvme    0.93149   1.00000  954 GiB  267 GiB  266 GiB  7.5 MiB   1.2 GiB   687 GiB  27.97  0.72    6      up          osd.29   
 30   nvme    0.93149   1.00000  954 GiB  208 GiB  207 GiB  3.2 MiB  1021 MiB   746 GiB  21.78  0.56    4      up          osd.30   
 31    ssd    1.81940   1.00000  1.8 TiB  810 GiB  808 GiB  3.4 MiB   2.1 GiB   1.0 TiB  43.50  1.12   25      up          osd.31   
 32    ssd    1.81940   1.00000  1.8 TiB  941 GiB  938 GiB  6.6 MiB   3.0 GiB   922 GiB  50.53  1.30   28      up          osd.32   
                          TOTAL  119 TiB   46 TiB   48 TiB  1.3 GiB   120 GiB    73 TiB  38.96                                     
MIN/MAX VAR: 0.16/1.95  STDDEV: 12.39

Search

Search

CEPH pool shrank at same rate it was being filled

rilesdun

New Member

Nuke Bloodaxe

Active Member

aaron

Proxmox Staff Member

rilesdun

New Member

We value your privacy