After running CEPH for a year, I decided to replace a node as the performance was not what I expected. (Two days later, insert multiple expletives, one less keyboard and a few grey hairs)
First thing - double check you have put in the right network card.
-> those 2x 10GB look very similar to those 2 x 1GB cards -> lesson learnt
--> just removing the OSD from the node 3 with the wrong card increased CEPH performance 3x
Second - check the PING -> if it is above 1 ms then something is wrong.
-> turns out one of the 10GB ports on the switch was dead. Not happy - took may tests to find it running at about 100MB but still showing Green (10GB).
--> lesson learnt
SO NOW EVERYTHING IS FIXED
This isn't home network this is SMB (Small to Medium Business) so flame away.....
Simple question
4 node Proxmox system
2 to 4 OSD's on each Node (currently 2xSSD (consumer) and plan to add 2X??? (see below) )
20-40 Windows VM's in total over all nodes
The Nodes CPU's are Threadripper , Threadripper , i7 , Dual E5 (mixture of AMD and Intel) with lots of RAM
Running all VM's -> this is the memory usage
Each node will have 2x 10GB + multiple 1GB (once I fix node 3)
Simple question #1
Should I change the CEPH CLUSTER NETWORK to the slower 1GB network ( all the pings now work ) Rather than use the 10GB network for both. Apparently it is better.
OR
given the system isn't very big should I just leave it. NOTE: the NICS are there and come with the motherboard, doing nothing at the moment.
Simple Question #2
Preamble: With Ceph I understand that Enterprise SSD's are the best and I will only replace the current ones with Enterprise SSD
-> I has a Consumer SSD Fail (sometime recently) and replace it today - latency was in the 100's and VM's were blue screening
What are the recommendations for adding an extra (please assume Enterprise or NAS quality) ???
2x 256 GB SSD
or
2x 1 TB HDD
Just to clarify -> In other words is the network going to throttle the choice of Drive regardless?
Will a 10 GB network perform the same with a HDD or a SSD?
I am running a small test now in the node I have replaced and it looks to make a huge difference HOWEVER this is while it is re balancing
Simple Question #3
Some on the nodes have 1 or 2 unused 1GB NIC's (maybe more depending on the answer to #1)
Any point in using them as well?
Any hints as to what to do?
thanks
again
Damon
First thing - double check you have put in the right network card.
-> those 2x 10GB look very similar to those 2 x 1GB cards -> lesson learnt
--> just removing the OSD from the node 3 with the wrong card increased CEPH performance 3x
Second - check the PING -> if it is above 1 ms then something is wrong.
-> turns out one of the 10GB ports on the switch was dead. Not happy - took may tests to find it running at about 100MB but still showing Green (10GB).
--> lesson learnt
SO NOW EVERYTHING IS FIXED
This isn't home network this is SMB (Small to Medium Business) so flame away.....
Simple question
4 node Proxmox system
2 to 4 OSD's on each Node (currently 2xSSD (consumer) and plan to add 2X??? (see below) )
20-40 Windows VM's in total over all nodes
The Nodes CPU's are Threadripper , Threadripper , i7 , Dual E5 (mixture of AMD and Intel) with lots of RAM
Running all VM's -> this is the memory usage
Each node will have 2x 10GB + multiple 1GB (once I fix node 3)
Simple question #1
Should I change the CEPH CLUSTER NETWORK to the slower 1GB network ( all the pings now work ) Rather than use the 10GB network for both. Apparently it is better.
OR
given the system isn't very big should I just leave it. NOTE: the NICS are there and come with the motherboard, doing nothing at the moment.
Simple Question #2
Preamble: With Ceph I understand that Enterprise SSD's are the best and I will only replace the current ones with Enterprise SSD
-> I has a Consumer SSD Fail (sometime recently) and replace it today - latency was in the 100's and VM's were blue screening
What are the recommendations for adding an extra (please assume Enterprise or NAS quality) ???
2x 256 GB SSD
or
2x 1 TB HDD
Just to clarify -> In other words is the network going to throttle the choice of Drive regardless?
Will a 10 GB network perform the same with a HDD or a SSD?
I am running a small test now in the node I have replaced and it looks to make a huge difference HOWEVER this is while it is re balancing
Simple Question #3
Some on the nodes have 1 or 2 unused 1GB NIC's (maybe more depending on the answer to #1)
Any point in using them as well?
Any hints as to what to do?
thanks
again
Damon
Last edited: