Proxmox Node is down, VM is stucked and not possible to restart the VM

TMTMTM

New Member
May 18, 2017
8
0
1
42
Hello everyone,


I have a problem with my Proxmox Ceph Cluster:


There are 4 machines in a Proxmox 4.4-87 Cluster. All of this machines have 2 CEPH OSDs. So in summary we have 8 OSDs.

Ceph Pool config is like this:

ceph osd dump | grep -i rbd

pool 5 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 59469 flags hashpspool stripe_width 0

If one of the Proxmox nodes down, then all VMs are stucked. For example: I rebooted stor04 and all VMs are freezed / stucked. I stopped the VM on stor01 and tried to start the VM again. But nothing happened, until stor04 was back again...


Do you have any suggestions for me? Normally it should not be a problem if one node is down. Especially I have replicated size 3. So all VM data should be present on 3 OSDs, which means on 2 hardware machines. But the cluster is not usable if one proxmox node is down. That´s not normal and I think there must be an error!
 
also, running with min_size 1 is asking for trouble IMHO..
 
also, running with min_size 1 is asking for trouble IMHO..

Why is this asking for trouble? Sure, it could be very slowly. But normally it should be okay...?


I used min_size 1 because I have replica-size 3. And with this configuration it could be possible, that the VM100 is on both OSDs on proxmox4 and on one OSD of proxmox3. But with min_size 2 the VM would not be usable if proxmox4 is down. With min_size 1 it should be usable.


Or do you have a better idea for me / for my configuration?



Physically I have:


4 Nodes with 2 SATAs and 2 SSDs. What I did now is:

1 SSD used for system
1 SSD used for Index of both OSDs
Both SATAs as OSD.0 and OSD.1 (and so on)
 
If you run 4 monitors, you simply use quorum as soon as one node fail! So you should only run 3 monitors.
 
Hmm... Okay. So the rule of thumb is "Number_of_Nodes - 1" = Number_of_Monitors?
What´s about the min_size as Fabian said?
 
Found an interesting problem:

rbd: ceph-storage
monhost 10.2.19.11;10.2.19.12;10.2.19.13
content images
krbd 0
pool rbd

This is my storage.conf configuration for the Ceph Storage. But I configured all 4 nodes (10.2.19.14 too) as a monitor in Ceph:

I will try to disable the monitor service on node .14 (this is the machine which was down and then nothing was working).
 
What´s about the min_size as Fabian said?

the problem with running with min_size 1 is that ceph will allow changing data which is not replicated, and if that single copy also fails, your data is gone. it is an additional safe guard, but the general recommendation is to run with min_size 2 in production, and only downgrade to 1 temporarily after careful consideration of the implications when doing emergency maintenance or disaster recovery, and only if required.

see this longer thread for some discussion about size and min_size: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html , and especially this mail for a summary of what running with min_size 1 might entail: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014892.html
 
Currently now one machine is down. And ALL VMs are freezed.... I have no idea. How I can figure out where the problem is?


Here is the output of pveceph status


{
"health" : {
"overall_status" : "HEALTH_WARN",
"summary" : [
{
"severity" : "HEALTH_WARN",
"summary" : "190 pgs stuck unclean"
},
{
"severity" : "HEALTH_WARN",
"summary" : "7 requests are blocked > 32 sec"
},
{
"summary" : "recovery 68380/502920 objects degraded (13.597%)",
"severity" : "HEALTH_WARN"
},
{
"severity" : "HEALTH_WARN",
"summary" : "recovery 124465/502920 objects misplaced (24.748%)"
},
{
"severity" : "HEALTH_WARN",
"summary" : "2/8 in osds are down"
},
{
"summary" : "noout flag(s) set",
"severity" : "HEALTH_WARN"
}
],
"health" : {
"health_services" : [
{
"mons" : [
{
"store_stats" : {
"last_updated" : "0.000000",
"bytes_misc" : 103528931,
"bytes_total" : 107870981,
"bytes_log" : 4342050,
"bytes_sst" : 0
},
"kb_avail" : 18129344,
"last_updated" : "2017-05-27 23:14:26.065363",
"health" : "HEALTH_OK",
"kb_total" : 26704124,
"avail_percent" : 67,
"kb_used" : 7342064,
"name" : "0"
},
{
"avail_percent" : 60,
"kb_used" : 3500644,
"name" : "1",
"store_stats" : {
"bytes_total" : 106411715,
"bytes_log" : 2146219,
"bytes_sst" : 0,
"last_updated" : "0.000000",
"bytes_misc" : 104265496
},
"health" : "HEALTH_OK",
"kb_total" : 10190136,
"last_updated" : "2017-05-27 23:15:11.261335",
"kb_avail" : 6148820
},
{
"avail_percent" : 57,
"kb_used" : 3796068,
"name" : "2",
"store_stats" : {
"bytes_sst" : 0,
"bytes_log" : 449948,
"bytes_total" : 94489705,
"bytes_misc" : 94039757,
"last_updated" : "0.000000"
},
"last_updated" : "2017-05-27 23:14:38.311855",
"kb_total" : 10190136,
"health" : "HEALTH_OK",
"kb_avail" : 5853396
}
]
}
]
},
"timechecks" : {
"mons" : [
{
"skew" : 0,
"health" : "HEALTH_OK",
"latency" : 0,
"name" : "0"
},
{
"latency" : 0.002798,
"health" : "HEALTH_OK",
"skew" : 0,
"name" : "1"
},
{
"name" : "2",
"health" : "HEALTH_OK",
"skew" : -0.000384,
"latency" : 0.013545
}
],
"epoch" : 156,
"round" : 250,
"round_status" : "finished"
},
"detail" : []
},
"mdsmap" : {
"epoch" : 1,
"up" : 0,
"max" : 0,
"by_rank" : [],
"in" : 0
},
"quorum" : [
0,
1,
2
],
"election_epoch" : 156,
"quorum_names" : [
"0",
"1",
"2"
],
"osdmap" : {
"osdmap" : {
"full" : false,
"num_remapped_pgs" : 190,
"nearfull" : false,
"num_in_osds" : 8,
"num_osds" : 8,
"epoch" : 79804,
"num_up_osds" : 6
}
},
"fsid" : "78667f72-e04d-416a-b37d-e86590ae0422",
"monmap" : {
"epoch" : 5,
"modified" : "2017-05-21 12:22:34.679432",
"created" : "2016-11-13 18:18:12.273042",
"mons" : [
{
"rank" : 0,
"name" : "0",
"addr" : "10.2.19.11:6789/0"
},
{
"name" : "1",
"addr" : "10.2.19.12:6789/0",
"rank" : 1
},
{
"rank" : 2,
"addr" : "10.2.19.13:6789/0",
"name" : "2"
}
],
"fsid" : "78667f72-e04d-416a-b37d-e86590ae0422"
},
"pgmap" : {
"num_pgs" : 256,
"misplaced_objects" : 124465,
"bytes_used" : 2103791534080,
"degraded_ratio" : 0.135966,
"bytes_total" : 15995368865792,
"degraded_total" : 502920,
"data_bytes" : 696680397124,
"degraded_objects" : 68380,
"misplaced_ratio" : 0.247485,
"write_bytes_sec" : 189638,
"misplaced_total" : 502920,
"bytes_avail" : 13891577331712,
"version" : 5669919,
"pgs_by_state" : [
{
"count" : 190,
"state_name" : "active+remapped"
},
{
"count" : 66,
"state_name" : "active+clean"
}
],
"op_per_sec" : 19
}
}
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!