Sounds like a great idea to me. That way we can see how others are configuring their systems and what works well.
As far as benchmarking for Ceph, you want to use the RADOS testing tool. This is included in ceph-common and thus already installed on Proxmox.
I recommend first creating a test...
Running from two nodes at once netted a 10-20% increase in most tests. Part of this could be due to me starting them about a second apart. I assume reduced load on the NICs/cables and thus less packet loss probably helps as well.
Yes, 10G ethernet. I've got 3 nodes, 12 disks per node. Each node also has 2 SSDs being used for OS/mon.
You might be right that my 10G interface on Proxmox could be a bottleneck. After that point, the IO Aggregator (connects all the blades together) has a 40G trunk to the primary switch, which...
Assuming you have multiple mons, or your 1 mon doesn't go down, as soon as the node goes down the OSDs will be marked "down" and that IO will be redirected. By default, there is a 5 minute timeout before the OSDs are marked out and the data starts to migrate. If shortly after the 5 minute mark...
Don't forget replication. By default you have 2x replication groups so it gets broken up and evenly distributed, then based on the default crush map each osd will replicate its data to an osd in a different chassis.
Also on the switches it was also my understanding that you needed to be all on...
How about this: "For a production system you need 3 servers minimum. For testing you can get by with less, although you may be unable to properly test all the features of the cluster."
This informs users that a reliable system will need at least 3 servers, but if you are just testing you can...
You are right, I wasn't completely correct in that section. There is actually a new way to prepare/zap/activate all in one step. I updated that as well as added some customization tips relevant to using Ceph with Proxmox.
Guys,
Just majorly overhauled http://pve.proxmox.com/wiki/Storage:_Ceph. The information was severely outdated. I believe everything to be correct now but would appreciate a second look and/or feedback.
I have Proxmox 3 running on (3) M620 blades. For storage we use Ceph running on (3) R720xd servers. Each R720xd has the PERC H710 with (12) SAS 7200RPM 3TB disks for data and (2) SSD in RAID-1 for the OS. Each data disk is in its own RAID-0 array (since there is no JBOD support). I am using 10Gb...
Take a look here: http://forum.proxmox.com/threads/3271-balance-rr-bond-unbalanced?p=18965#post18965
I experienced this same thing. Lots of packet loss once data starts flowing. I now use 802.3ad because it seemed to be more reliable.
Seems complex to *properly* configure balance-rr even...
I assume that you would want to update Grub. If you select "NO" it warns that the existing Grub installation may be unable to load new modules. That doesn't sound good. I would like some clarification on the correct way to go about this as well.
I am configuring Proxmox on M600 blades in a Dell M1000E blade chassis. All of the blades pass through an "IO Aggregator" switch which I cannot get to pass multicast traffic.
For now I have followed your guide to enable the cluster to operate over unicast...
Realize we are on an old version and should upgrade, I am planning on upgrading soon, but thought I would contribute my information in case it helps someone else. I also just recently started having this problem even though my backups have been reliable since I upgraded to 2.3 at the beginning...
Martin,
This is great! I would love to attend but unfortunately a trip from the US to Germany is not in the budget for me. Hopefully you can eventually partner with someone such as Global Knowledge to offer training in the US as well. You have a great product with a great potential to grow in...
Are both fully up to date? If not, run your updates then restart both again. If they are both up to date, just try another restart on both. I may have rebooted a couple times before it started working again.
I have the same problem after upgrading from 2.0 to 2.1
no connection : Connection timed out
TASK ERROR: command '/bin/nc -l -p 5900 -w 10 -c '/usr/sbin/qm vncproxy 107 2>/dev/null'' failed: exit code 1
I get this for VM's on the master and on a node.
eth0 is your primary adapter. lo is your loopback adapter. tap108i0 I'm not sure about... venet0 - http://kb.parallels.com/en/647. vmbr0 is your bridge adapter.
I apologize, I was just making a suggestion for willy. If it helps, I use Windows 7 Professional, Firefox 8.0.1, and Java 6.0.290 Update 29. I believe this was an issue with Beta 1, I don't recall whether I've seen the issue with Beta 2 or 3 since most of the time I use SSH or Remote Desktop anyway.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.