ceph problem or am i doing something wrong?

Cayuga · Mar 11, 2013

We have over 60 VM's running on our ceph cluster. Most are running fine. But, I have a Solaris 10 VM that runs fine on local storage and gets disk errors that prevent it from even booting ("disk read error, sector xxxxxxxxx" where xxxxxxxxx is usually a large value (e.g. 31780350) followed by "Short read. 0xffffffff chars read").

Here is the "vm config" from a working copy:

acpi: 1
boot: cd
bootdisk: ide0
cores: 1
cpuunits: 1000
freeze: 0
ide0: local:157/vm-157-disk-1.raw,cache=writeback,size=20G
ide2: none,media=cdrom
kvm: 0
memory: 512
name: truffle
net0: rtl8139=92:C8:C2:14:5F:5F,bridge=vmbr0
onboot: 1
ostype: other
sockets: 1

If I run "rbd import /var/lib/vz/images/157/vm-157-disk-1.raw vm-157-disk-1" and change the ide0 line to:

ide0: cephcluster:vm-157-disk-1,cache=writeback,size=20492M

It doesn't boot and gets the errors described above.

I've used the same process for lots of Windows, Linux and BSD VM's and it works fine -- what's going on with Solaris?

Thanks!

Jeff

spirit · Mar 12, 2013

do you have tried to change cache=writeback to cache=none or cache=directsync ?

Cayuga · Mar 12, 2013

I just tried those. I still get disk read errors and short reads, but the reported "bad sectors" move around.

jdurgin · Mar 13, 2013

Could you verify that the disk imported correctly? i.e. compare the md5sum before import with the md5sum the file created by 'rbd export'? You may be running into http://tracker.ceph.com/issues/4388.

Cayuga · Mar 14, 2013

Thanks for the suggestion. I did verify via md5sum that the imported and exported filesystem were identical.

jdurgin · Mar 14, 2013

That probably means it's a ceph bug. Could you generate a log of this happening with cache=none and upload it to a new bug in tracker.ceph.com, e.g. by putting this in your /etc/ceph/ceph.conf on the node that runs the vm:

[client]
debug ms = 1
debug rbd = 20
log file = /tmp/rbd.$pid.log
log to stderr = false

If you copy the I/O error reports you get in the guest too, we can see what they correspond to in the ceph log.

Cayuga · Mar 15, 2013

Thanks, I've reported the problem and will report back when there is news.

http://tracker.ceph.com/issues/4446

Search

Search

ceph problem or am i doing something wrong?

Cayuga

Renowned Member

spirit

Distinguished Member

Cayuga

Renowned Member

jdurgin

New Member

Cayuga

Renowned Member

jdurgin

New Member

Cayuga

Renowned Member

We value your privacy