rbd Error, cant make new virtual machines

tchmnkyz

New Member
Mar 19, 2013
7
0
1
Hey Guys,

So i have my nice ceph cluster running nicely and everything has been great till i did a recent update. Now after the upgrade (apt-get dist-upgrade) i get the following error when trying to create VM's:

TASK ERROR: create failed - rbd create vm-101-disk-1' error: rbd: create error: (22) Invalid argument

I am not sure what other info to give to help get this issue resolved. Please see the info below:

From the Ceph Cluster
# ceph -v
ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)

From the first node in my cluster

root@node01:~ # pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-18-pve
proxmox-ve-2.6.32: 2.3-88
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-18-pve: 2.6.32-88
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-18
pve-firmware: 1.0-21
libpve-common-perl: 1.0-48
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-8
ksm-control-daemon: 1.1-1

root@node01:~ # ceph -v
ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)

Any help anyone can give would be greatly appreciated!
 
Hey Guys,

So i have my nice ceph cluster running nicely and everything has been great till i did a recent update. Now after the upgrade (apt-get dist-upgrade) i get the following error when trying to create VM's:

TASK ERROR: create failed - rbd create vm-101-disk-1' error: rbd: create error: (22) Invalid argument

I am not sure what other info to give to help get this issue resolved. Please see the info below:

From the Ceph Cluster
# ceph -v
ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)

From the first node in my cluster

root@node01:~ # pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-18-pve
proxmox-ve-2.6.32: 2.3-88
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-18-pve: 2.6.32-88
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-18
pve-firmware: 1.0-21
libpve-common-perl: 1.0-48
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-8
ksm-control-daemon: 1.1-1

root@node01:~ # ceph -v
ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)

Any help anyone can give would be greatly appreciated!


Hi, what is your ceph cluster version ? 0.56 is the mininum now.
 
all of the nodes in the cluster are the same version (ceph):

ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)

I took that from each of the servers and made sure they all match.
 
all of the nodes in the cluster are the same version (ceph):

ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)

I took that from each of the servers and made sure they all match.


oh, I also forgot to say to rbd storage configuration have some little change in proxmox 2.3

http://pve.proxmox.com/wiki/Storage:_Ceph



rbd: mycephcluster
monhost 192.168.0.1:6789;192.168.0.2:6789;192.168.0.3:6789
pool rbd (optionnal, default =r rbd)
username admin (optionnal, default = admin)
content images
 
rbd: Ceph01
monhost 10.15.8.50:6789;10.15.8.51:6789;10.15.8.52:6789
pool vms
content images
username admin

It is setup with similar fashion.
 
ok, i think i may have found more insite into it. When trying to manually create test images. It seems that with my version of rbd the "--format 2" fails for creating the image but when it is set to 1 it will create it just fine. So my question is, if i change it in the file "/usr/share/perl5/PVE/Storage/RBDPlugin.pm" to only do format 1 will it temporarily resolve my issue? the chunk in question is:

<pre>
sub alloc_image {
my ($class, $storeid, $scfg, $vmid, $fmt, $name, $size) = @_;


die "illegal name '$name' - sould be 'vm-$vmid-*'\n"
if $name && $name !~ m/^vm-$vmid-/;

$name = &$find_free_diskname($storeid, $scfg, $vmid);

my $cmd = &$rbd_cmd($scfg, $storeid, 'create', '--format' , 2, '--size', ($size/1024), $name);
run_command($cmd, errmsg => "rbd create $name $size $storeid $scfg' error", errfunc => sub {});

return $name;
}
</pre>

So my thoughts were to change this to a 1 and give that a try please let me know what you think!
 
So my thoughts were to change this to a 1 and give that a try please let me know what you think!

I guess we don't really want to use that old format. Maybe you need to create a new pool to allow format 2 - please can you test that?
 
Just created a brand new pool and then tried again and it still errors out when trying to use format 2.
 
I found the issue. It turns out that when ceph updates its deb packages. it does not restart the daemons. As such the version running in memory was actually version 48.3. I restarted all of the nodes in the cluster and the problem was fixed!