"Move Disk" data corruption on 4.3

There is definitely some versioning weirdness.

On the 3.4 servers where this problem did not occur, they have ceph 0.80.8 from the enterprise.proxmox.com repository.

On the 4.3 servers where this problem did occur, they have ceph 0.80.7 from the debian repository. They don't show anything from the enterprise.proxmox.com repository.

Ceph 0.80 is firefly, which is really super old and what we're trying to get rid of, so that's fundamentally scary, but why does 4.x have an older version than 3.x? Are the packages missing from the PVE enterprise repository? (For some reason I thought proxmox was based on at least ceph hammer client libraries.)

Is it even remotely safe to change the ceph client libraries out from under Proxmox? apt-cache shows a lot of version-specific dependencies.
 
From my latest 4.3 using enterprise repo:
Code:
dpkg -l | egrep -i '(ceph|rbd|rados)'
ii  ceph-common                      0.80.8-1~bpo70+1               amd64        common utilities to mount and interact with a ceph storage cluster
ii  libcephfs1                       0.80.7-2+deb8u1                amd64        Ceph distributed file system client library
ii  librados2                        0.80.8-1~bpo70+1               amd64        RADOS distributed object store client library
ii  librados2-perl                   1.0-3                          amd64        Perl bindings for librados
ii  librbd1                          0.80.8-1~bpo70+1               amd64        RADOS block device client library
ii  python-ceph                      0.80.8-1~bpo70+1               amd64        Python libraries for the Ceph distributed filesystem
Remember 3.4 is based on wheezy which is now under LTS support (not under Debian maintenance). Maybe this repo have been updated with newer backported versions?
 
apt-cache policy shows the PVE enterprise repo is simply not offering any ceph packages on 4.3.

For 3.4:

# apt-cache policy librbd1
librbd1:
Installed: 0.80.8-1~bpo70+1
Candidate: 0.80.8-1~bpo70+1
Version table:
*** 0.80.8-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
100 /var/lib/dpkg/status
0.80.6-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.80.5-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.67.9-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.67.7-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.67.5-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.67.4-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.67.3-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.67.2-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.67.1-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages
0.67-1~bpo70+1 0
500 https://enterprise.proxmox.com/debian/ wheezy/pve-enterprise amd64 Packages

For 4.3:

# apt-cache policy librbd1
librbd1:
Installed: 0.80.7-2+deb8u1
Candidate: 0.80.7-2+deb8u1
Version table:
*** 0.80.7-2+deb8u1 0
500 http://ftp.us.debian.org/debian/ jessie/main amd64 Packages
100 /var/lib/dpkg/status

spirit, can you run this on yours to see if perhaps you are picking up something from the nosub repository, or at least see where exactly it is coming from?
 
From the changelog:
ceph (0.80.8-1~bpo70+1) wheezy; urgency=low

*

-- Jenkins Build <gary.lowell@inktank.com> Wed, 14 Jan 2015 10:19:38 -0800

ceph (0.80.8-1) stable; urgency=low

* New upstream release

librbd1 from 3.4 is package by proxmox while librbd1 from 4.3 comes directly from debian repository so I think the different version numbers is of no concern.
 
if you use ceph jewel, you need to update yourself the librbd on your host node !

add the debian jewel ceph repo

/etc/apt/sources.list.d/ceph.list
deb http://download.ceph.com/debian-jewel/ jessie main


I'm surprised that you are able to connect to your jewel cluster.
Using old librbd on new ceph cluster version is not supported and tested by ceph team.

Use sint new librbd with old ceph cluster is ok and tested.
 
Whoops, I thought spirit posted the 0.80.8 versions.

Mir, where did *you* get 0.80.8 on 4.3?

Also, I read the release notes for 0.80.8 vs. 0.80.7 and indeed none of the client-side fixes (and there weren't many) sound relevant so I agree 0.80.7 vs. 0.80.8 is not likely to be a big deal.
 
I'm surprised that you are able to connect to your jewel cluster.
Using old librbd on new ceph cluster version is not supported and tested by ceph team.

Use sint new librbd with old ceph cluster is ok and tested.

What about using new librbd with proxmox in violation of version-specific dependencies? Is that tested?

That is not an area where I want to blaze experimental new trails.
 
What about using new librbd with proxmox in violation of version-specific dependencies? Is that tested?
The packages from inktank is build on and for debian jessie the same way packages are made for debian jessie-backports so this is a "supported" install.
 
Whether or not the ceph versioning issue is a problem, it is a separate problem, unlikely to be the cause of the issue. This is not a new situation. These two SSD-based ceph clusters are our smallest ones. Our largest 55TB storage cluster is also running Jewel and has been under heavy load for some time (including mass migrations to and from during the upgrade, albeit based on 3.4) without incident. Prior to that, the jewel clusters were hammer for a long time.

And, to reiterate, 3.4 can run 5 migrations at once with no issue. 4.3 corrupts with one. All on the same clusters.

All of the available evidence points at Proxmox 4.3 being the only factor present in all of the corruption cases.

The new-for-4.x sparse thing seems worth testing though thusfar I have not found a safe way to test it.
 
Last edited:
Per the Ceph developers, mismatched client/server versions should not cause any issues without supplemental stupidity (such as using features or tunables not supported by the older client), and that they work very hard to preserve compatibility.

And even in the case of supplemental stupidity, errors would be the expected result, not silent data corruption.

(They did add the caveat that Firefly is so old that they no longer test it.)

By contrast, we did set up a test with two Proxmox 4.3 nodes, one running Jewel and one on the default 0.80.7. We promptly got a "function not implemented" error trying to create an image on the Firefly cluster, or on trying to migrate an image from a Jewel cluster to a Firefly cluster. And live migrating any VMs from the Proxmox-jewel node to the Proxmox-firefly node failes: "Error: start failed."

So the conservative approach appears to be the right one in situations with multiple ceph clusters running different versions, as is inherently the case in a mid-upgrade situation like ours.

The 0.80.8 thing is still bothering me as well, as the debian jessie repositories are all showing me 0.80.7, including on a freshly-installed debian jessie VM I just spun up right now. Nothing I can find indicates that 0.80.8 can be obtained from the debian jessie main repository.

So, right now, there are two main possible causes of the corruption:

1) A problem in Ceph 0.80.7 that does not exist in 0.80.8.
2) The sparse thing suggested by spirit.

Currently my efforts are focused on finding a repro that does not involve corrupting live production data. So far, no luck.
 
A "repro" is a way to reliably reproduce the problem so it can be further examined and resolved. (And if the repro is any good, other people will be able to follow it and reproduce the problem as well.)

As it stands, nothing I do outside of production has been able to reproduce the problem, up to and including sparing out the machine it actually happened on and setting up a VM that just saturates its I/O doing InnoDB inserts while the disk migrates back and forth between the same two ceph clusters.

Whatever the problem is, it must depend on an external factor not yet identified.
 
Per the Ceph developers, mismatched client/server versions should not cause any issues without supplemental stupidity (such as using features or tunables not supported by the older client), and that they work very hard to preserve compatibility.
Do you have configured tunables on your jewel cluster to be compatible with firefly librbd ?

http://docs.ceph.com/docs/jewel/rados/operations/crush-map/

because hammer have CRUSH_V4 tunable and jewel CRUSH_V5, and it's not comptable with librbd of firefly

(Just to be sure, you don't use krbd right ?)

And even in the case of supplemental stupidity, errors would be the expected result, not silent data corruption.
again, qemu drive-mirror abort if any io error (read or write) occur during the migration (on source or destination)

By contrast, we did set up a test with two Proxmox 4.3 nodes, one running Jewel and one on the default 0.80.7. We promptly got a "function not implemented" error trying to create an image on the Firefly cluster, or on trying to migrate an image from a Jewel cluster to a Firefly cluster. And live migrating any VMs from the Proxmox-jewel node to the Proxmox-firefly node failes: "Error: start failed."
This is strange. for image create, the command execute by proxmox is the same.
For live migration, seem strange too. As the "start failed", as it's only a new qemu process is started with is own librbd version.
(but of course, you can't migrate from a new qemu version to an old qemu version)
 
May I join you conversation.....

We have PVE cluster with 3 nodes (All of them 4.3-3), and use NFS Share based on FreeNAS.
Not so far we opened to ourselves "move disks".......It was very cool....and gives us opportunity migrate VM from one storage to another without "backup/restore".
For a test we moved about 10 VM's.....some of them were Win, some Linux based......but!...after 10-12 hours....2 of them crushed.....they were: our wiki....some pages were lost.....and freeradius-dhcp.......
On both VM's were troubles with mysql.
On both cache - Write through
Type of disk - virtio.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!