Hi, I'm new to the forum
First of all,I want to congrat all the pve team for all the work and nice software that proxmox is.
I decide to write this post because I'm facing a weird problem with ceph.
I'm making a test installation in 3 servers, form a cluster, having NFS shared storage and some other things I'm testing. I installed them from debian wheezy.
All that is working as expected, have a couple of VM running with no problem
I then decided to install ceph, followed the instructions with the only difference that I'm short of free disks, just a couple of them for OSD, but they should suffice for the test.
I've created 3 Mons(one in each server) and 2 osd located each in a different server
Then Ive created a 2/1 pool so 2 osds can accomodate it
All looks well except for the ceph pool status which shows pg stuck unclean but active for the pools
First thing to notice is that I'm unable to bring the status HEALTH_OK, I actually got it in that state for a little bit after deletion of the 3/1 default groups (data,metadata,rbd). But after a new deletion and recreation of the 2/1 pool the pg stuck unclean cames back
But that is not the worst
If I create a rbd storage to that pool, all the storages go to "active no" state (even local one) , and no VM can be created (even non-HA ones) as storage combo is greyed out.
in the syslog start to appear
pveproxy[879914]: WARNING: proxy detected vanished client connection
and web gui becames unstable, comming and going with timeout and lost connections
If I remove the rbd storage from the cluster all restores to normal operation
I've been struggling with this situation ( at first I thought it was a cluster problem and dropped and reconfigured whole cluster) , done several pveceph comands, creation and deletion of pools etc
Do you have a clue of what it may be happening? I've read there was problems with NFS storage in the past, can this be a similar situation?
Any help would be appreciated
Thanks in advance, tell me if you need more info or have any suggestion
pveversion -v output:
proxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.2-30 (running version: 3.2-30/1d095287)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-31-pve: 2.6.32-132
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-14
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-22
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
pveceph lspools
Name size pg_num used
rbd 2 64 0
ceph osd tree
# id weight type name up/down reweight
-1 3.62 root default
-2 1.81 host vhostHp0
1 1.81 osd.1 up 1
-3 1.81 host pr01
0 1.81 osd.0 up 1
pveceph status
{
"monmap" : {
"mons" : [
{
"name" : "2",
"addr" : "192.168.200.244:6789/0",
"rank" : 0
},
{
"name" : "1",
"addr" : "192.168.200.245:6789/0",
"rank" : 1
},
{
"name" : "0",
"addr" : "192.168.200.246:6789/0",
"rank" : 2
}
],
"created" : "2014-09-08 09:45:00.331663",
"epoch" : 3,
"modified" : "2014-09-08 09:45:29.943950",
"fsid" : "70459b79-6354-4bfc-b9bc-6254125dfe47"
},
"election_epoch" : 12,
"health" : {
"detail" : [],
"overall_status" : "HEALTH_WARN",
"summary" : [
{
"summary" : "64 pgs stuck unclean",
"severity" : "HEALTH_WARN"
}
],
"timechecks" : {
"mons" : [
{
"name" : "2",
"latency" : "0.000000",
"skew" : "0.000000",
"health" : "HEALTH_OK"
},
{
"name" : "1",
"latency" : "0.001316",
"skew" : "-0.002077",
"health" : "HEALTH_OK"
},
{
"name" : "0",
"latency" : "0.001328",
"skew" : "-0.001080",
"health" : "HEALTH_OK"
}
],
"epoch" : 12,
"round_status" : "finished",
"round" : 188
},
"health" : {
"health_services" : [
{
"mons" : [
{
"kb_used" : 1921548,
"last_updated" : "2014-09-08 23:18:34.290705",
"name" : "2",
"health" : "HEALTH_OK",
"kb_total" : 28814332,
"kb_avail" : 25429080,
"store_stats" : {
"bytes_total" : 7786201,
"last_updated" : "0.000000",
"bytes_misc" : 65552,
"bytes_sst" : 446153,
"bytes_log" : 7274496
},
"avail_percent" : 88
},
{
"kb_used" : 2780324,
"last_updated" : "2014-09-08 23:18:57.999205",
"name" : "1",
"health" : "HEALTH_OK",
"kb_total" : 38413808,
"kb_avail" : 33682152,
"store_stats" : {
"bytes_total" : 11107263,
"last_updated" : "0.000000",
"bytes_misc" : 65552,
"bytes_sst" : 621487,
"bytes_log" : 10420224
},
"avail_percent" : 87
},
{
"kb_used" : 2463452,
"last_updated" : "2014-09-08 23:19:01.016973",
"name" : "0",
"health" : "HEALTH_OK",
"kb_total" : 30737344,
"kb_avail" : 26712500,
"store_stats" : {
"bytes_total" : 14680714,
"last_updated" : "0.000000",
"bytes_misc" : 65552,
"bytes_sst" : 634,
"bytes_log" : 14614528
},
"avail_percent" : 86
}
]
}
]
}
},
"osdmap" : {
"osdmap" : {
"num_in_osds" : 2,
"epoch" : 87,
"nearfull" : false,
"num_up_osds" : 2,
"full" : false,
"num_osds" : 2
}
},
"mdsmap" : {
"epoch" : 1,
"by_rank" : [],
"in" : 0,
"max" : 1,
"up" : 0
},
"pgmap" : {
"bytes_total" : 3990058311680,
"pgs_by_state" : [
{
"count" : 64,
"state_name" : "active"
}
],
"data_bytes" : 0,
"num_pgs" : 64,
"version" : 144,
"bytes_avail" : 3908428120064,
"bytes_used" : 2572288
},
"quorum" : [
0,
1,
2
],
"quorum_names" : [
"2",
"1",
"0"
],
"fsid" : "70459b79-6354-4bfc-b9bc-6254125dfe47"
}
First of all,I want to congrat all the pve team for all the work and nice software that proxmox is.
I decide to write this post because I'm facing a weird problem with ceph.
I'm making a test installation in 3 servers, form a cluster, having NFS shared storage and some other things I'm testing. I installed them from debian wheezy.
All that is working as expected, have a couple of VM running with no problem
I then decided to install ceph, followed the instructions with the only difference that I'm short of free disks, just a couple of them for OSD, but they should suffice for the test.
I've created 3 Mons(one in each server) and 2 osd located each in a different server
Then Ive created a 2/1 pool so 2 osds can accomodate it
All looks well except for the ceph pool status which shows pg stuck unclean but active for the pools
First thing to notice is that I'm unable to bring the status HEALTH_OK, I actually got it in that state for a little bit after deletion of the 3/1 default groups (data,metadata,rbd). But after a new deletion and recreation of the 2/1 pool the pg stuck unclean cames back
But that is not the worst
If I create a rbd storage to that pool, all the storages go to "active no" state (even local one) , and no VM can be created (even non-HA ones) as storage combo is greyed out.
in the syslog start to appear
pveproxy[879914]: WARNING: proxy detected vanished client connection
and web gui becames unstable, comming and going with timeout and lost connections
If I remove the rbd storage from the cluster all restores to normal operation
I've been struggling with this situation ( at first I thought it was a cluster problem and dropped and reconfigured whole cluster) , done several pveceph comands, creation and deletion of pools etc
Do you have a clue of what it may be happening? I've read there was problems with NFS storage in the past, can this be a similar situation?
Any help would be appreciated
Thanks in advance, tell me if you need more info or have any suggestion
pveversion -v output:
proxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.2-30 (running version: 3.2-30/1d095287)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-31-pve: 2.6.32-132
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-14
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-22
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
pveceph lspools
Name size pg_num used
rbd 2 64 0
ceph osd tree
# id weight type name up/down reweight
-1 3.62 root default
-2 1.81 host vhostHp0
1 1.81 osd.1 up 1
-3 1.81 host pr01
0 1.81 osd.0 up 1
pveceph status
{
"monmap" : {
"mons" : [
{
"name" : "2",
"addr" : "192.168.200.244:6789/0",
"rank" : 0
},
{
"name" : "1",
"addr" : "192.168.200.245:6789/0",
"rank" : 1
},
{
"name" : "0",
"addr" : "192.168.200.246:6789/0",
"rank" : 2
}
],
"created" : "2014-09-08 09:45:00.331663",
"epoch" : 3,
"modified" : "2014-09-08 09:45:29.943950",
"fsid" : "70459b79-6354-4bfc-b9bc-6254125dfe47"
},
"election_epoch" : 12,
"health" : {
"detail" : [],
"overall_status" : "HEALTH_WARN",
"summary" : [
{
"summary" : "64 pgs stuck unclean",
"severity" : "HEALTH_WARN"
}
],
"timechecks" : {
"mons" : [
{
"name" : "2",
"latency" : "0.000000",
"skew" : "0.000000",
"health" : "HEALTH_OK"
},
{
"name" : "1",
"latency" : "0.001316",
"skew" : "-0.002077",
"health" : "HEALTH_OK"
},
{
"name" : "0",
"latency" : "0.001328",
"skew" : "-0.001080",
"health" : "HEALTH_OK"
}
],
"epoch" : 12,
"round_status" : "finished",
"round" : 188
},
"health" : {
"health_services" : [
{
"mons" : [
{
"kb_used" : 1921548,
"last_updated" : "2014-09-08 23:18:34.290705",
"name" : "2",
"health" : "HEALTH_OK",
"kb_total" : 28814332,
"kb_avail" : 25429080,
"store_stats" : {
"bytes_total" : 7786201,
"last_updated" : "0.000000",
"bytes_misc" : 65552,
"bytes_sst" : 446153,
"bytes_log" : 7274496
},
"avail_percent" : 88
},
{
"kb_used" : 2780324,
"last_updated" : "2014-09-08 23:18:57.999205",
"name" : "1",
"health" : "HEALTH_OK",
"kb_total" : 38413808,
"kb_avail" : 33682152,
"store_stats" : {
"bytes_total" : 11107263,
"last_updated" : "0.000000",
"bytes_misc" : 65552,
"bytes_sst" : 621487,
"bytes_log" : 10420224
},
"avail_percent" : 87
},
{
"kb_used" : 2463452,
"last_updated" : "2014-09-08 23:19:01.016973",
"name" : "0",
"health" : "HEALTH_OK",
"kb_total" : 30737344,
"kb_avail" : 26712500,
"store_stats" : {
"bytes_total" : 14680714,
"last_updated" : "0.000000",
"bytes_misc" : 65552,
"bytes_sst" : 634,
"bytes_log" : 14614528
},
"avail_percent" : 86
}
]
}
]
}
},
"osdmap" : {
"osdmap" : {
"num_in_osds" : 2,
"epoch" : 87,
"nearfull" : false,
"num_up_osds" : 2,
"full" : false,
"num_osds" : 2
}
},
"mdsmap" : {
"epoch" : 1,
"by_rank" : [],
"in" : 0,
"max" : 1,
"up" : 0
},
"pgmap" : {
"bytes_total" : 3990058311680,
"pgs_by_state" : [
{
"count" : 64,
"state_name" : "active"
}
],
"data_bytes" : 0,
"num_pgs" : 64,
"version" : 144,
"bytes_avail" : 3908428120064,
"bytes_used" : 2572288
},
"quorum" : [
0,
1,
2
],
"quorum_names" : [
"2",
"1",
"0"
],
"fsid" : "70459b79-6354-4bfc-b9bc-6254125dfe47"
}