"Move Disk" data corruption on 4.3

jdw

Renowned Member
Sep 22, 2011
117
4
83
Data corruption appears to have occurred while using "Move Disk" on a KVM VM running Ubuntu Xenial that is a database server. The MySQL server crashed during the migration and refused to start, citing InnoDB checksum errors in several tables, many of which had not been written to in months.

The move was from an old Ceph Firefly cluster to a new Ceph Jewel cluster (both on separate/dedicated hardware).

The data was rolled back.

But we have a lot of data to move, so this is a serious concern. "Move Disk" doesn't have the best track record in this area. Is it perhaps acting up again?

# pveversion -v
proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
pve-manager: 4.3-1 (running version: 4.3-1/e7cdc165)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-88
pve-firmware: 1.1-9
libpve-common-perl: 4.0-73
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-61
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-6
pve-container: 1.0-75
pve-firewall: 2.0-29
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
openvswitch-switch: 2.5.0-1
 
Yes, it's large, so I put it here: http://pastebin.com/bn4VnTf8

Moving disk images is very common for us as well; we have done a lot of RBD-to-RBD moves as well since the last instance of this issue a couple years ago, which is why an issue like this sucks the oxygen right out of the room.
 
Last edited:
this is strange,I don't see any error in the task log. (if they are read/write block error on source or destination, the job fail automaticly).

Does your mysql crash occur during the migration, or at the end ? (maybe storage overload ?)
 
Data corruption appears to have occurred while using "Move Disk" on a KVM VM running Ubuntu Xenial that is a database server. The MySQL server crashed during the migration and refused to start, citing InnoDB checksum errors in several tables, many of which had not been written to in months.

The move was from an old Ceph Firefly cluster to a new Ceph Jewel cluster (both on separate/dedicated hardware).

The data was rolled back.

But we have a lot of data to move, so this is a serious concern. "Move Disk" doesn't have the best track record in this area. Is it perhaps acting up again?

isn't the simplest explanation that the corruption was caused by the crash of the mysql server? disk migration can cause a lot of I/O contention, which can lead to system instability..
 
Yes, it's large, so I put it here: http://pastebin.com/bn4VnTf8

Moving disk images is very common for us as well; we have done a lot of RBD-to-RBD moves as well since the last instance of this issue a couple years ago, which is why an issue like this sucks the oxygen right out of the room.
Is qemu-guest-agent active on the VMs?
If so, then the fsfreeze command from move disk should effectively have stopped all writes to disk at the end of the move.
 
Is qemu-guest-agent active on the VMs?
If so, then the fsfreeze command from move disk should effectively have stopped all writes to disk at the end of the move.

fsfreeze is not used in drive mirror. (only for snapshot currently) But a sync is done internally by qemu when disk are swapped. So it shouldn't be a problem even without guest-agent.
I have done it on a lot of database (mysql,postgresql,sqlserver) without any corruption
 
Yes, the old disk was removed due to "delete source."

No, the corruption was not caused by MySQL crashing as the corrupt tables have not been written in many weeks.

We have now found a second case on another MySQL VM where the process didn't crash until a half an hour after the migration (presumably when it tried to access the corrupted table).

I'm not sure what "storage overload" refers to, but both Ceph clusters are lightly-loaded and made up entirely of Intel DC S37XX SSD's so they definitely weren't overloaded by a simple disk move.

And if disk migration can cause system instability, then the feature should be fixed or removed.
 
Also worth nothing: we have two Proxmox cluster, one running 3.4 and one running 4.3. The 3.4 cluster did a lot more migrations yesterday, including doing up to 5 at once, and thusfar they are all fine. The 4.3 cluster was doing them later, one at a time, and we have found multiple problems with corrupted data. It really seems like an issue with 4.3.
 
What are cache setting for the Ceph pool itself?
What filesystem is used inside the VM and what mount options is used?

Another thing: Do you use asynchronous writes for the database? If so updates could be lost.
 
As referenced in the thread linked above, rbd_cache is set based on the Qemu cache setting. If you are referring to some other cache setting, please be more specific as this is the only relevant setting I am aware of.

The filesystem is ext4 (rw,relatime,data=ordered).

And to reiterate, the database did not write to the suddenly-corrupted tables for weeks prior to the corruption. The MySQL process crashed because of the corruption, the corruption did not occur because the MySQL process crashed.

As there were no writes, pending or otherwise, this issue is unlikely to be cache-related.
 
Last edited:
Unless you are claiming MySQL backdates file timestamps, the corrupted files were not written to for weeks before they became corrupted. The modification times and sizes were identical to the backups; only the contents differed. Please stop trying to blame this on MySQL.

Spirit:

# dpkg -l | egrep -i '(ceph|rbd|rados)'
ii ceph-common 0.80.7-2+deb8u1 amd64 common utilities to mount and interact with a ceph storage cluster
ii libcephfs1 0.80.7-2+deb8u1 amd64 Ceph distributed file system client library
ii librados2 0.80.7-2+deb8u1 amd64 RADOS distributed object store client library
ii librados2-perl 1.0-3 amd64 Perl bindings for librados
ii librbd1 0.80.7-2+deb8u1 amd64 RADOS block device client library
ii python-ceph 0.80.7-2+deb8u1 amd64 Python libraries for the Ceph distributed filesystem
 
Not sure it's related, but you could try to edit
/usr/share/perl5/PVE/Storage/RBDPlugin.pm

and remove
Code:
 sparseinit => { base => 1, current => 1},
from
sub volume_has_feature

and restart pvedaemon service.
This is a new feature in proxmox 4, to try to have sparse drive on target storage after move disk
 
[QUOTEj
# dpkg -l | egrep -i '(ceph|rbd|rados)'
ii ceph-common 0.80.7-2+deb8u1 amd64 common utilities to mount and interact with a ceph storage cluster
ii libcephfs1 0.80.7-2+deb8u1 amd64 Ceph distributed file system client library
ii librados2 0.80.7-2+deb8u1 amd64 RADOS distributed object store client library
ii librados2-perl 1.0-3 amd64 Perl bindings for librados
ii librbd1 0.80.7-2+deb8u1 amd64 RADOS block device client library
ii python-ceph 0.80.7-2+deb8u1 amd64 Python libraries for the Ceph distributed filesystem[/QUOTE]


You don't have updated librbd to jewel ??????? (10.2.0) (I mean on your proxmox 4.3 node)
 
Unless you are claiming MySQL backdates file timestamps, the corrupted files were not written to for weeks before they became corrupted. The modification times and sizes were identical to the backups; only the contents differed. Please stop trying to blame this on MySQL.
Count to 10 and then try reading wants more what a wrote.
 
  • Like
Reactions: jeffwadsworth

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!