Volunteers Wanted to test Virtual Machine Cloud Platform!

LVM is an not option for Windows PC's - what's wrong with sparse qcow2?

Sent from my MT27i using Tapatalk
 
CLUSTER CRASH UPDATE
==================
The Proxmox and CEPH Cluster is finally up and running, although there has been mass casualty of Virtual Machines, critical data of the cluster was intact. As part of the Beta Testing plan, none of the Beta Tester's VM was backed up, so all has been lost. We are in the process of VM re-cloning and reassigning to the beta users account

The Cause of Crash
==============
It was completely human error which brought down the CEPH cluster. The Human was none other but myself. I was trying to change the CEPH Cluster Subnet by simply changing IP address of all CEPH Nodes. This totally broke CEPH MON Quorum and OSDs became inaccessible. When i realized what i have done, i changed all IP Addresses the way they were but the MONs could never re-establish connection with OSDs. For several days of fighting in vain i sought help on CEPH IRC. Several people tried to help including Sage W. ,one of the founder of CEPH. But the damage was too far gone to resolve.

Symptom of CEPH Crash
==================
Surprisingly all VMs were still working even though MONs could not talk to OSDs. But the moment i rebooted a VM, it would not start again. All VMs were stored on CEPH RBD. Some of the OSDs started Flapping, meaning going online and offline randomly. MONs started marking OSDs Down and Out one by one. Sometimes just Down but In. Eventually all OSDs were marked Down and Out permanently.

Crash Resolved
===========
The only way to resolve this issue was to recreate CEPH cluster from ground up. I reinstalled fresh OS and recreated entire CEPH cluster nodes. Prior to taking this extreme measure, I tried to recreate Cluster Keys, made sure all nodes are seeing and talking to each other, qourum was established, added new OSDs, restarted and verified all Daemons were running. But nothing made the MONs acknowledge that existing OSDs were still there untouched.

Lesson Learned
===========
Going through CEPH Documentation i found out later that, changing IP address by simply modifying /etc/network/interfaces of CEPH Nodes are a BIG NO NO. Proper way to do this is to add new MONs with new IP address one by one and deleting old ones. During my time of learning CEPH i missed this critical part or may be just did not care. Thankfully i had up to date backup on separate NFS storage of all critical VMs. So no real data was lost. Word of Wisdom for everybody, Keep your Backup Up to Date and try to go things by the book!
 
To All Beta Testing Participants
========================
To test CEPH performance with SSDs as OSD, we are adding bunch of SSDs into a separate POOL in CEPH cluster. We will be moving the VMs with the following IDs to test the SSD platform. If your VM ID is listed below, please provide us feedback of your I/O performance. The changes will take place on Feb 15th, 2014. The VM IDs has been picked randomly.


VM IDs : 503, 509, 508, 511, 514, 516, 520, 521, 526, 542, 556, 558, 562, 588, 591.


Thanks for all the feedback you have been providing thus far!
 
TO PROXMOX COMMUNITY
===================
It had been amazing to see the Proxmox Community stepping up to the challenge to participate in the Cloud Beta Testing program. The amount of data you have provided is priceless. The power the hypervisor we have seen through the beta testing only renewed our commitment to Proxmox. We are nothing short of proud to be part of Proxmox Community.

As all good things, the beta testing is soon coming to an end. A week from today on Feb 12th, 2014, we will discontinue our Beta Testing and all VMs will be destroyed.

Please note the VM IDs(503, 509, 508, 511, 514, 516, 520, 521, 526, 542, 556, 558, 562, 588, 591) mentioned in the last post which has been choosen to take part in SSD CEPH Cluster testing will NO be destroyed. That phase of the beta testing will still continue.

We greatly appreciate for showing us your support! Proxmox has a great community!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!