Cluster Ceph 4 nodes, 24 OSD (mixed ssd and hdd), ceph Nautilus 14.2.1 (via proxmox 6, 7 nodes).
Autoscale PG is ON, 5 pools, 1 big pool with all the VM's 512 PG (all ssd). This size did not change when i turned on Autoscale on SSD pool, only the smaller for HDD and test.
All OSD installed in Luminous.
I took out and destroyed a NVME P900 (set as SSD), it was a GPT and name osd.22.
Created it again as new, it turns up as LVM, stores like 22% of 447 GB there, circa 100 GB.
After some 5 min the size of pool RBD report is down from 3.77 TB data to 100 GB. PG num from 512 to 4.
And then whole CEPH went unresponsive, no VM responded, for 1 min.
Then i turn of autoscale for this pool and set it manually to 512 again, and BAAM, all Proxmox cluster went down and restarted, took 10 min. Probably HA went down due to CEPH traffic was skyhigh, same network (shouldnt).
Now everything works again, but size of SSD pool is used: 0.52 % of 95.61 GB. Should be 3.77 TB of 10 TB.
All files is there, nothing missing.
So is this a bug in Ceph that autoscale can go wrong when reinstalling 1 disk ?
Any hints on how to restore the report of size of RBD in this pool ?
Autoscale PG is ON, 5 pools, 1 big pool with all the VM's 512 PG (all ssd). This size did not change when i turned on Autoscale on SSD pool, only the smaller for HDD and test.
All OSD installed in Luminous.
I took out and destroyed a NVME P900 (set as SSD), it was a GPT and name osd.22.
Created it again as new, it turns up as LVM, stores like 22% of 447 GB there, circa 100 GB.
After some 5 min the size of pool RBD report is down from 3.77 TB data to 100 GB. PG num from 512 to 4.
And then whole CEPH went unresponsive, no VM responded, for 1 min.
Then i turn of autoscale for this pool and set it manually to 512 again, and BAAM, all Proxmox cluster went down and restarted, took 10 min. Probably HA went down due to CEPH traffic was skyhigh, same network (shouldnt).
Now everything works again, but size of SSD pool is used: 0.52 % of 95.61 GB. Should be 3.77 TB of 10 TB.
All files is there, nothing missing.
So is this a bug in Ceph that autoscale can go wrong when reinstalling 1 disk ?
Any hints on how to restore the report of size of RBD in this pool ?