Hello,
We have a 3 node proxmox/ceph cluster. This morning on one node all of the OSD were marked down/out. Right now all the VM's are up but the IO is getting killed by the recovery of the OSD's and they are basically not responding.
I have added the following in /etc/pve/ceph.conf
	
	
	
		
How do I make this active to lessen the IO of the rebuild?
We already had one OSD out but why would all of the OSD's on one server get marked down/out at once? Here is all I see in the logs.
	
	
	
		
My first priority is how to lesson the recovery so the VM's will become responsive again. Thanks in advance for any advice or help!
Best regards,
Eric
				
			We have a 3 node proxmox/ceph cluster. This morning on one node all of the OSD were marked down/out. Right now all the VM's are up but the IO is getting killed by the recovery of the OSD's and they are basically not responding.
I have added the following in /etc/pve/ceph.conf
		Code:
	
	osd max backfills = 1
osd recovery max active = 1How do I make this active to lessen the IO of the rebuild?
We already had one OSD out but why would all of the OSD's on one server get marked down/out at once? Here is all I see in the logs.
		Code:
	
	2015-09-21 7:00:54.877290 mon.0 10.0.3.11:6789/0 9323720 : [INF] pgmap v12130886: 1536 pgs: 1431 active+clean, 77 active+degraded, 28 active+remapped; 3394 GB data, 9949 GB used, 53353 GB / 63302 GB avail; 1635 kB/s rd, 222 kB/s wr, 78 op/s; 63933/2613351 objects degraded (2.446%)
2015-09-21 07:03:54.142139 mon.1 10.0.3.12:6789/0 3973 : [INF] pgmap v12130959: 1536 pgs: 1518 active+degraded, 18 active+remapped; 3394 GB data, 9949 GB used, 53353 GB / 63302 GB avail; 16683 kB/s rd, 1012 kB/s wr, 868 op/s; 870311/2613351 objects degraded (33.302%)
2015-09-21 07:03:55.670321 mon.1 10.0.3.12:6789/0 3974 : [INF] mon.1 calling new monitor election
2015-09-21 07:03:55.673390 mon.0 10.0.3.11:6789/0 9323722 : [INF] mon.0 calling new monitor election
2015-09-21 07:03:55.675321 mon.0 10.0.3.11:6789/0 9323723 : [INF] mon.0 calling new monitor election
2015-09-21 07:03:55.893161 mon.0 10.0.3.11:6789/0 9323724 : [INF] mon.0@0 won leader election with quorum 0,1,2
2015-09-21 07:03:56.231606 mon.0 10.0.3.11:6789/0 9323725 : [INF] monmap e3: 3 mons at {0=10.0.3.11:6789/0,1=10.0.3.12:6789/0,2=10.0.3.13:6789/0}
2015-09-21 07:03:56.231748 mon.0 10.0.3.11:6789/0 9323726 : [INF] pgmap v12130959: 1536 pgs: 1518 active+degraded, 18 active+remapped; 3394 GB data, 9949 GB used, 53353 GB / 63302 GB avail; 870311/2613351 objects degraded (33.302%)
2015-09-21 07:03:56.232198 mon.0 10.0.3.11:6789/0 9323727 : [INF] mdsmap e1: 0/0/1 up
2015-09-21 07:03:56.251357 mon.0 10.0.3.11:6789/0 9323728 : [INF] osdmap e1155: 18 osds: 12 up, 17 inMy first priority is how to lesson the recovery so the VM's will become responsive again. Thanks in advance for any advice or help!
Best regards,
Eric
 
	 
	