"Move Disk" data corruption on 4.3

May I join you conversation.....

We have PVE cluster with 3 nodes (All of them 4.3-3), and use NFS Share based on FreeNAS.
Not so far we opened to ourselves "move disks".......It was very cool....and gives us opportunity migrate VM from one storage to another without "backup/restore".
For a test we moved about 10 VM's.....some of them were Win, some Linux based......but!...after 10-12 hours....2 of them crushed.....they were: our wiki....some pages were lost.....and freeradius-dhcp.......
On both VM's were troubles with mysql.
On both cache - Write through
Type of disk - virtio.

Is this reproducible?

Did you use the "Delete Source" option?

How many VM's did you move at a time?

Does your PVE cluster have ECC?

Do both your source and target fileservers have ECC?

Are there any network errors on the interface of any involved server or switch port?

Were you able to get "before" and "after" views of a corrupted file to see exactly what the corruption looks like? (I.e. is it all zeroes, random noise, data from some other file or image, etc.) This is one of the big things I wish I had -- besides the ability to reproduce the problem on non-production data -- that I don't yet.
 
Is this reproducible?
In progress.....on "hot" cluster don't want make tests, so installing another cluster for tests.
Did you use the "Delete Source" option?
Yes
How many VM's did you move at a time?
one by one......non-multiple movement
Does your PVE cluster have ECC?
Do both your source and target fileservers have ECC?
Yes to all.....also there are hardware controllers with BBU and cache
Are there any network errors on the interface of any involved server or switch port?
No errors on network.....

It's strange that all other VM's without MySQL moved fine.....even Win7 with D-View Cam.....and on-line streaming
 
It is definitely strange that this (so far) affects only servers running MySQL.

But in our case, in addition to files that happen not to have been written, even some .frm files got corrupted and those are never written unless the database schema is manually changed, which definitely wasn't the case for us and probably wasn't the case for you.

None of my tests with moving busy MySQL servers have resulted in corruption. However, the MySQL servers in our case that got corruption were relatively idle, and the files that got corrupted (both .frm and .ibd) weren't being used at the time, so maybe heavy workload is not the right direction.

Could somebody maybe explain (or point to a reference for) how "move disk" actually works?
 
Please let go of the idea that MySQL is somehow writing to files that aren't being written to, and that this is somehow causing corruption. Your understanding of what, when and how MySQL writes is not correct.

The binlog is a special-purpose feature used for replication and has nothing to do with this.
 
I'm sorry you feel that way.

The frm files that are being corrupted are structural files that have a special purpose and are not written during ordinary operation. They do not contain table data. They do not contain indexes. They are not part of the binlog. They are not part of the innodb transaction log. They are very small files that contain only static information about the table definitions.

None of that is opinion or cleverness, it is just fact. And if the facts do not match your theory, it is the theory that has to be revised, not the facts.
 
on my side, my migration from nfs to ceph, for vms with mysql, this was with xfs as filesystem, cache=none, and I don't use remove disk option.
I don't have had any error (without around100 vms with mysql).
This is with proxmox 4.3, and last jewel librbd, ceph jewel cluster with last tunables.
 
Right now, I am a little suspicious of the "delete source" option, though I do not have a strong basis for that. It's just mainly what we immediately stopped doing it as a safety precaution and suddenly we can no longer reproduce the issue. Also Black Knight MHT used it and had the problem, and you did not use it and did not have the problem. But that's correlation, not causation.

The other theory I'm working on is that it may not be the disk or the write cache getting corrupted, it may be the VM's buffer cache. There is no evidence for or against that either, but it would explain how read-only files got "corrupted." Also it would be easy to test if we could just reproduce the stupid problem. :-( Hopefully we will be able to try migrating some read-intensive workloads today and see what happens.
 
Last edited:
Ok, people. Let's live in peace. There is nobody "clever than anyone else in the universe"....we simply have some troubles with function of moving disk.
And I try to understand is it safety use it.
Anybody knows how it work with cache (raid, disk, host-system, VM)? And why it is only with mysql? I checked my cameras....disk with 140Gb encrypted video.....i moved it 2 times......everything alright.... and VM's with mysql crushed after 10-12 hours of working....
 
BHM, what error message did MySQL give you when it crashed, and which specific files were corrupted in your case?
 
Sorry....but I had no time to check and dig deeper the problem with mysql....and restore from backup ((((((((
I'll try to make more tests tomorrow on the test Cluster.
 
Yes, my situation was much the same. :( Being focused on recovery, I did not gather nearly the amount of information in retrospect it would be good to have now.
 
Right now, I am a little suspicious of the "delete source" option, though I do not have a strong basis for that. It's just mainly what we immediately stopped doing it as a safety precaution and suddenly we can no longer reproduce the issue. Also Black Knight MHT used it and had the problem, and you did not use it and did not have the problem. But that's correlation, not causation.

I can't tell if it could be related, but I never use it by safety.
The original disk is deleted when qemu drive-mirror block job is finish, so pending writes are normally flushed.
(But maybe the zero init could be related, i'm not sure here).
I'll try to build a firewall cluster for testing and reproduce your problem.

(BTW, if you can test with comment the zero init the rbdplugin, it could help to debug if you can reproduce easily the problem)
 
I have finish to build my jewel and firefly cluster,I'm begin to debug

one on the default 0.80.7. We promptly got a "function not implemented" error trying to create an image on the Firefly cluster,or on trying to migrate an image from a Jewel cluster to a Firefly cluster
Ok, I'm able to reproduce this.

rbd command from ceph-jewel try to create new volume with all new features.
you can add in /etc/ceph/ceph.conf in your proxmox host
"rbd default features = 1"

You should have same behaviour than rbd command from firefly.

And live migrating any VMs from the Proxmox-jewel node to the Proxmox-firefly node failes:

Can't reproduce, works fine for me.
I can migrate a vm from source host with librbd jewel to destination host with librbd firefly
I can migrate a vm from source host with librbd firefly to destination host with librbd jewel

both with the ceph storage firefly or jewel with firefly tunables



I have done a lot of move disk this night, firefly to jewel, jewel to firefly, with 2 librbd jewel && firefly. running a mysql benchmark during the migration, and I can't reproduce the error.

librbd firefly was 0.80.11 from ceph.com firefly repository.

Not sure it's related, but an openstack user have reported today fs error with librbd 0.80.7
http://www.spinics.net/lists/ceph-users/msg32017.html
 
Last edited:
So far I also have not been able to reproduce the problem, although I haven't had as much time for testing as I would want. Still haven't conducted the read-workload tests I hope to try. As many of the tests involve reinstalling Proxmox over remote IPMI at a glacial pace, it's a very slow process. :-(
 
Another server that had its disks moved around the same time popped up with serious filesystem corruption today. This was an email server, not MySQL, and again the files that got corrupted were largely static system configuration files that had not been updated in months -- years in a couple of cases -- rather than the (busy) mail spool. The problem was thus not noticed until we attempted to apply a security update to the base system and got a system crash from the corruption it ran into.

It is reasonably likely that this happened when the disks were moved, but that's not a certainty. It is, to my knowledge, the only time in recent history when any disk write activity even peripherally related to those config files occurred.

Still no luck doing it on purpose. :-(
 
Just to follow up on this, after updating our Ceph clusters and Proxmox nodes to Jewel 10.2.3, we have moved over 100 disk images without incident or (detected) corruption, including dozens of MySQL servers.

The only other change we have made is that we also changed policy to forbid use of "Delete Source" when moving disks; all moves must now wait 24 hours before deleting disks.

It's hard not to conclude that the problem is related to either of two things.

- A bug in librbd on Ceph 0.80.7 that gets installed on Proxmox 4.1, which must be fixed in the 0.80.8 that comes with Proxmox 3.x.

- A bug in "Delete Source."

Black Knight MHT reported a similar issue using NFS, and stated that they also used Delete Source when it happened. If the causes are the same (not proven), then that would tend to implicate Delete Source rather than Ceph.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!