I'm not exactly fully up to speed on ceph at this time, but I wanted to just share some thoughts I had.
Firstly, I've used "move disk" (including delete original) a LOT. This is on v2, v3 and v4 clusters. I've found it very reliable, so if something as you said here is causing corruption, then that's not indicative of what I would say is typical behavior. A lot of the disks I've moved have ran smaller databases, including mysql and mail databases. Not exactly massive examples though.
Secondly, My examples are with storage backed by ZFS served through NFS. Not horribly relevant here, but I wanted to be clear about this.
Thirdly, a thought comes to mind here. What I've seen a few times in this thread is that the data that's showing corruption is old data, not recently written data. The thought is that perhaps this data was corrupt before the move, but the move simply triggered mysql and the mail databases to actually check that data, and then they see it's corrupt. Silent corruption/bit rot can happen if your storage isn't setup to counter that (why I use ZFS). Now, I haven't read all the details in this thread, so you may have covered this already. If not, perhaps consider this?
Fourthly, perhaps some more info about the storage is in order? Also, how big are these VMs that are failing to move disk successfully?
Firstly, I've used "move disk" (including delete original) a LOT. This is on v2, v3 and v4 clusters. I've found it very reliable, so if something as you said here is causing corruption, then that's not indicative of what I would say is typical behavior. A lot of the disks I've moved have ran smaller databases, including mysql and mail databases. Not exactly massive examples though.
Secondly, My examples are with storage backed by ZFS served through NFS. Not horribly relevant here, but I wanted to be clear about this.
Thirdly, a thought comes to mind here. What I've seen a few times in this thread is that the data that's showing corruption is old data, not recently written data. The thought is that perhaps this data was corrupt before the move, but the move simply triggered mysql and the mail databases to actually check that data, and then they see it's corrupt. Silent corruption/bit rot can happen if your storage isn't setup to counter that (why I use ZFS). Now, I haven't read all the details in this thread, so you may have covered this already. If not, perhaps consider this?
Fourthly, perhaps some more info about the storage is in order? Also, how big are these VMs that are failing to move disk successfully?