Empty Ceph pool still uses storage

bacbat32

New Member
Oct 2, 2025
3
0
1
I have just finished migrating my VMs on the cluster from a hdd pool to an ssd pool, but now that there are no vm disks or other proxmox related items left on the pool, it still is using ~7.3TiB on what i assume is orphaned data? This cluster is currently running PVE 8 with Ceph 18, but has been upgraded from PVE 5 with Ceph 12 over the years. Would it be safe to remove the data and pool entirely?

Bash:
# pvesm list ceph-hdd
Volid Format  Type      Size VMID

Bash:
# ceph df
--- RAW STORAGE ---
CLASS       SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd       49 TiB   42 TiB  7.3 TiB   7.3 TiB      14.75
hdd7200  176 TiB  113 TiB   64 TiB    64 TiB      36.05
ssd      196 TiB  120 TiB   76 TiB    76 TiB      38.90
TOTAL    422 TiB  275 TiB  147 TiB   147 TiB      34.89
 
--- POOLS ---
POOL         ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
ceph-ssd      3  2048   26 TiB    6.82M   76 TiB  45.37     31 TiB
ceph-hdd      5   128  1.8 TiB  914.93k  5.3 TiB  12.25     13 TiB
ceph-backup   8   256   21 TiB    5.47M   62 TiB  45.35     25 TiB
.mgr          9     1  1.0 GiB      253  3.1 GiB      0     31 TiB

Bash:
# rados -p ceph-hdd ls | cut -d. -f2 | sort | uniq -c
   1095 0756139d8b5a1d
     12 163391af52760b
     17 196cac6b8b4567
      1 2508fde9eb01b4
    404 261da9a6d5e4ac
 406792 314c206b8b4567
    112 324fc695e2826e
    857 392810f65ebff8
   4378 3c280d6b8b4567
      3 47a5d9d650eb2a
     11 4b6cf3d813b381
      8 4ce0f3c9987cd2
     38 64ccad53c1c3fa
     26 6580a1f3b9dfc7
     15 7358c6940cc263
    269 79455b8d0b1ae4
   1106 7acf583455b9d9
    682 815d8d6b8b4567
   2399 8cb29960f33726
   1065 8d3ebeb5c0beb6
    401 c92b578bd6ead0
   6848 cb7e866dfe4ee7
   1823 d00922b6797e2d
  10962 df1fffd4c1a80b
    813 e0cd326b8b4567
   2613 e618d9c943b4a2
   4416 eaacd16b8b4567
      1 rbd_children
      1 rbd_directory
      1 rbd_info
      1 rbd_trash
 
Hmm, interesting—how did you migrate all of those VMs, if I may ask? Because by default, we don't delete the old disks from the source storage.

If it's just VM disks on RBD that you've got lying around, you should be able to see them with the following command:

Bash:
rbd --conf /etc/pve/ceph.conf --cluster ceph --pool <POOL> ls
 
  • Like
Reactions: bacbat32
Hmm, interesting—how did you migrate all of those VMs, if I may ask? Because by default, we don't delete the old disks from the source storage.

If it's just VM disks on RBD that you've got lying around, you should be able to see them with the following command:

Bash:
rbd --conf /etc/pve/ceph.conf --cluster ceph --pool <POOL> ls
When i migrated the VMs, it did indeed leave the vmdisks on the old storage, but these have since been deleted.

Sadly nothing gets listed using the command. I am suspecting these to be old vms that have been incompletely deleted before the migration but not sure.
Bash:
root@proxmox01:~# rbd --conf /etc/pve/ceph.conf --cluster ceph --pool ceph-hdd ls
root@proxmox01:~#


i do get vmdisks when ran on the ssd pool
Bash:
root@proxmox01:~# rbd --conf /etc/pve/ceph.conf --cluster ceph --pool ceph-ssd ls
base-129-disk-0
base-199-disk-0
base-701-disk-0
base-702-disk-0
base-703-disk-0
base-9000-disk-0
base-9000-disk-1
base-9001-disk-0
base-9001-disk-1
vm-100-disk-0
vm-1000-disk-0
vm-1000-disk-1
vm-10001-disk-0
vm-1001-disk-0
vm-1001-disk-1
...
 
Do the HDDs have separate WAL/DB disks? There are inconsistencies with how those are counted I know, but I'm not sure if it would show like that. IIRC it adds the allocated WAL space to the OSD and also marks it used.

If nothing else you could stop the OSDs before deleting them.
 
They do indeed have seperate WAL/DB disks, so that will probably be it. Would you expect the size to go down when stopping osd's?
 
Hello @bacbat32,

While @SteveITS is right that WAL/DB can contribute to the raw usage, the 1.8 TiB of STORED data seems to be caused by the ~900k objects that your initial rados -p ceph-hdd ls command showed. Since rbd ls is empty for that pool, these are likely orphaned objects from disks that were not completely deleted.

You could try to inspect the content of one of these objects to identify its origin. For example, pick a full object name from one of the larger groups you found (e.g., one ending in 314c206b8b4567) and run:rados -p ceph-hdd get THE_FULL_OBJECT_NAME - | strings | head
 
  • Like
Reactions: Max Carrara
Hello @bacbat32,

While @SteveITS is right that WAL/DB can contribute to the raw usage, the 1.8 TiB of STORED data seems to be caused by the ~900k objects that your initial rados -p ceph-hdd ls command showed. Since rbd ls is empty for that pool, these are likely orphaned objects from disks that were not completely deleted.

You could try to inspect the content of one of these objects to identify its origin. For example, pick a full object name from one of the larger groups you found (e.g., one ending in 314c206b8b4567) and run:rados -p ceph-hdd get THE_FULL_OBJECT_NAME - | strings | head
 
  • Like
Reactions: Max Carrara