ceph problem or am i doing something wrong?

Cayuga

Renowned Member
May 3, 2011
86
0
71
We have over 60 VM's running on our ceph cluster. Most are running fine. But, I have a Solaris 10 VM that runs fine on local storage and gets disk errors that prevent it from even booting ("disk read error, sector xxxxxxxxx" where xxxxxxxxx is usually a large value (e.g. 31780350) followed by "Short read. 0xffffffff chars read").

Here is the "vm config" from a working copy:

acpi: 1
boot: cd
bootdisk: ide0
cores: 1
cpuunits: 1000
freeze: 0
ide0: local:157/vm-157-disk-1.raw,cache=writeback,size=20G
ide2: none,media=cdrom
kvm: 0
memory: 512
name: truffle
net0: rtl8139=92:C8:C2:14:5F:5F,bridge=vmbr0
onboot: 1
ostype: other
sockets: 1

If I run "rbd import /var/lib/vz/images/157/vm-157-disk-1.raw vm-157-disk-1" and change the ide0 line to:

ide0: cephcluster:vm-157-disk-1,cache=writeback,size=20492M

It doesn't boot and gets the errors described above.

I've used the same process for lots of Windows, Linux and BSD VM's and it works fine -- what's going on with Solaris?

Thanks!

Jeff
 
I just tried those. I still get disk read errors and short reads, but the reported "bad sectors" move around.
 
Thanks for the suggestion. I did verify via md5sum that the imported and exported filesystem were identical.
 
That probably means it's a ceph bug. Could you generate a log of this happening with cache=none and upload it to a new bug in tracker.ceph.com, e.g. by putting this in your /etc/ceph/ceph.conf on the node that runs the vm:

[client]
debug ms = 1
debug rbd = 20
log file = /tmp/rbd.$pid.log
log to stderr = false

If you copy the I/O error reports you get in the guest too, we can see what they correspond to in the ceph log.