Backup storage option if solely used CEPH with Proxmox

wahmed

Famous Member
Oct 28, 2012
1,137
53
113
Calgary, Canada
www.symmcom.com
Hello wonderful Proxmox Community,
Could anybody tell me what would be the best backup option in CEPH RBD Shared Storage only environment? Currently all my VMs are on RBD and i do full backup once a week on OmniOS NFS shared storage. If i want to go with RBD only environment how would i go about with backup. I did some Snapshots of VMs. Can this be used as substitute of full backup since CEPH itself is resiliant enough to withstand any sort hdd or whole host outage? I can still stick with RBD+NFS environment, just wandering what other option are there.

Thanks!
 
  • Like
Reactions: Elie.SF
Backup update.............

Looks like my idea of going with full CEPH only is not that viable. I was finally able to create CEPH FileSystem in the cluster. CEPH FS is much lke NFS where you can copy, delete files, create/browse folder etc. CEPH FS can be mounted on Proxmox host in a folder then create storage from GUI using DIR type. You can use CEPH FS mounted folder to put proxmox VM images, ISOs, backups. But my backup test showed very very bad write performance. It was averaging around 12-18 MiB/s sometimes less. Total backup time was about 11 hrs. Same amount of data backup takes 5 hrs from CEPH to NFS share. So for the time being i am keeping the OmniOS NFS storage host for nothing but backup usage. At least my eggs will not be in the same basket. May be through some tweak of CEPH FS the speed can be increased but my knowledge is not there yet. But CEPH FS works.
 
My personal experience with the backup features of proxmox (including the snapshotting) is that they are a poor substitute for a real granular backup.

I have relied on them at times and had success, but the over all performance of that backup method and storage space used has consistently left something to be desired.

I ended up using idera enterprise backup. So far I've really liked it.

Because it uses portable storage you could setup a VM to be the backup host server, but point the storage anywhere you can mount (sounds like it would solve your issue).

It supports mysql, mssql, and exchange backups out-of-the-box, and has a semi-friendly web interface.

Also if your using semi-private storage or sending traffic over unsecured networks it offers aes encryption for the on-disk storage so it should be pretty solid there as well.

The other thing of note, it is a low network traffic, pretty low write method of backup (lots of reads though), so it may be ok even with your poor write performance.
 
My personal experience with the backup features of proxmox (including the snapshotting) is that they are a poor substitute for a real granular backup.

I have relied on them at times and had success, but the over all performance of that backup method and storage space used has consistently left something to be desired.

Totally agree. To me performance is not that big issue as much as the storage requirement is. Since Proxmox doing full backup everytime, large amount of storage space needed specially if i want to keep backup history say 6 months or a year. But Proxmox is a Hypervisor after all which is where it shines like the brightest star in the sky. For granular file backup i personally use Backuppc as VM appliance. It allows me to pull data from just about anywhere and it also has somewhat file level deduplication so it does not copy exact same file all over again. It also does incremental backup. I have set it up a year ago and it has been running flawlessly. It does require some major configuration and learning to setup but once it is done it does not need any attention. Backuppc is for file backup only, it cannot read emails in mail server or database though.
If Proxmox team ever comes up with a granular backup option, that will be "the" major breakthrough Proxmox ever had. Who knows may be they are already working on it.. :)
 
Backup update.............

Looks like my idea of going with full CEPH only is not that viable. I was finally able to create CEPH FileSystem in the cluster. CEPH FS is much lke NFS where you can copy, delete files, create/browse folder etc. CEPH FS can be mounted on Proxmox host in a folder then create storage from GUI using DIR type. You can use CEPH FS mounted folder to put proxmox VM images, ISOs, backups. But my backup test showed very very bad write performance. It was averaging around 12-18 MiB/s sometimes less. Total backup time was about 11 hrs. Same amount of data backup takes 5 hrs from CEPH to NFS share. So for the time being i am keeping the OmniOS NFS storage host for nothing but backup usage. At least my eggs will not be in the same basket. May be through some tweak of CEPH FS the speed can be increased but my knowledge is not there yet. But CEPH FS works.

I can give you 3 suggestions:

1- May be your best option add a local disk (or several) to your PVE Node(s), and make local backups in this/these new HDDs, in this mode your PVE Nodes will not use your net for writes (useful until "Ceph" improve his performance)
- This is the cheapest option, and if a PVE Node breaks, always can remove this/these HDDs

2- If you run "cat /etc/vzdump.conf", you will see a option that say "#bwlimit: KBPS", but i don't know if this option will be useful

3- For better performance will be need add NICs of 10 Gb/s to your PVE and Ceph Nodes. Or finding the economy do Network bonding, then read this link of Ceph and benchmarks speed (and always think: "What if it breaks a Switch?"):
http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/

@Dietmar :
About of the option bwlimit on the vzdump.conf file:
1- Is it useful?,
2- If the answer is positive, what should I put there if we have a NIC of 1 Gb/s?

Best regards
Cesar
 
I can give you 3 suggestions:

1- May be your best option add a local disk (or several) to your PVE Node(s), and make local backups in this/these new HDDs, in this mode your PVE Nodes will not use your net for writes (useful until "Ceph" improve his performance)
- This is the cheapest option, and if a PVE Node breaks, always can remove this/these HDDs

I considered that option and tried with a single local HDD for backup. Speed was super impressive. But the problem is the Proxmox hosts are on 1U rackmount with extremely limited storage option. I can only attach 1 SSD or 1 HDD locally. Proxmox os are installed on the SSD in all the 4 nodes.

2- If you run "cat /etc/vzdump.conf", you will see a option that say "#bwlimit: KBPS", but i don't know if this option will be useful
I did try that option from Proxmox wiki. But did not help much. The #bwlimit is actually set to maximum it can do on gigabit network. At one point i tested with twice the limit.

3- For better performance will be need add NICs of 10 Gb/s to your PVE and Ceph Nodes. Or finding the economy do Network bonding, then read this link of Ceph and benchmarks speed (and always think: "What if it breaks a Switch?"):
http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
Never considered this option. But my current network is gigabit and it is doing back up on CEPH at the speed of half or less of that bandwidth. Is it likely that a 10Gb network will increase the backup speed?

@cesarpk: After reading the article you provided, seems like there are few things i can try. Right now everything is on same switch. Proxmox nodes, storage cluster nodes, admin, monitors all. I can separate them in 2 switches on 2 different subnets. Then separate the Journal from the same OSD disk. I know i should have separate from the beginning. :) But hey i am learning. And the last but not least i can drop a raid controller so all hdd can perform at 6gb/s. Currently Some drives are on 3gb/s some on 6gb/s. I will try it over the weekend and post some results here. Thanks for the link cesarpk. Tons of new information to digest. Very helpful!
 
Last edited:
Update
====

I have made some major changes in the Proxmox and CEPH environment. Hopefully this info will be a helpful to somebody.

Hardware changes:
1. I went ahead and invested in Two 12 Bay Hot swap chassis with Xeon E3 cpu and motherboard with 8GB ECC RAM. I also added a LSI raid card. Each chassis has four 2 TB Sata HDD.
2. Separated CEPH Storage network and Proxmox Nodes network using 2 Netgear switches. Also divided 2 networks into 2 subnets.

Software Changes:
Both CEPH nodes running on Ubuntu 12.04 LTS.

Outcome:
I started noticing major performance in both read/write. Proxmox VM backups are happening to the OmniOS NFS Shared storage at avg of 90 to 100 mb/s. Finally a speed i can live with happily ever after. :) I am finding CEPH better and better every day. No issue interacting with Proxmox and all the VMs just works. If anybody planning to start a CEPH storage node, i would recommend putting minimum a quad core CPU. I tried with Dual core. During any sort of CEPH cluster recovery the cpu takes a dip. It gets used 100% plus some more. I have total of 16TB cluster wide storage space and memory consumption hoovering around 1.2 GB out of 8GB. In NAS such FreeNAS/OmniOS/Nexenta which uses ZFS as file system, the more memory the better. But in CEPH world it seems to be the opposite. Better CPU, better chance of faster cluster recovery during hdd or entire node failure.
 
UPDATE
======
Just wanted to share some quick update on the Proxmox+CEPH project.
I have switched my NFS shared storage for back up from OmniOS to FreeNAS. I was a long term FreeNAS user and like the simplicity of GUI and all other features that it offers. So wanted to see how it performs with Proxmox+CEPH environment.
I have been doing back up for last few days and the speed is amazing. It is backing up about 242GB of data in avg. of 3 hr 52 mins consistently.
The CEPH cluster itself functioning very nicely even in Artificial cluster recovery mode. I do no think CephFS is completely up to the challenge yet to use as shared storage due to its poor performance. But at least it works and stable. I am still doing backup on CephFS shared storage in the Cluster but only because i want to fill up the entire 16TB cluster space with data to see how it handles large data. But FreeNAS shared storage is my primary backup storage at this moment. I am going to setup another FreeNAS node so both FreeNAS can replicate each other.

I deeply thank Proxmox team for the CEPH support they have provided into Proxmox VE.
 
Update
====

I have made some major changes in the Proxmox and CEPH environment. Hopefully this info will be a helpful to somebody.

It is indeed !

Hardware changes:
1. I went ahead and invested in Two 12 Bay Hot swap chassis with Xeon E3 cpu and motherboard with 8GB ECC RAM. I also added a LSI raid card. Each chassis has four 2 TB Sata HDD.

Would you tell us how you configured your RAID ? Single Array on each node ? Which level ?

Only made introductory readings about Ceph but as I understood it, one of it strength seems also to provide the redundancy feature instead of dedicated HW controlers (but doesn't replace the battery/flash backed cache !)... unless, à la DRBD, RAID handles disk level redundancy and Ceph host/network level one ? Would greatly appreciate insight on this point.

2. Separated CEPH Storage network and Proxmox Nodes network using 2 Netgear switches. Also divided 2 networks into 2 subnets.

Did you add bonding too ?

Outcome:
I started noticing major performance in both read/write. Proxmox VM backups are happening to the OmniOS NFS Shared storage at avg of 90 to 100 mb/s. Finally a speed i can live with happily ever after. :) I am finding CEPH better and better every day. No issue interacting with Proxmox and all the VMs just works. If anybody planning to start a CEPH storage node, i would recommend putting minimum a quad core CPU. I tried with Dual core. During any sort of CEPH cluster recovery the cpu takes a dip. It gets used 100% plus some more. I have total of 16TB cluster wide storage space and memory consumption hoovering around 1.2 GB out of 8GB. In NAS such FreeNAS/OmniOS/Nexenta which uses ZFS as file system, the more memory the better. But in CEPH world it seems to be the opposite. Better CPU, better chance of faster cluster recovery during hdd or entire node failure.

Thank you very much for this feedback
Bests
 
Would you tell us how you configured your RAID ? Single Array on each node ? Which level ?
I only used RAID card to give me the ability to connect 12 Bay Hot Swap backplane. The chassis i used has backplanes with 3 Mini SAS connector. So needed a RAID card to connect all of them. I have not done any setup what so ever on the RAID. Simple dropped them in the slot, installed Ubuntu 12.04 and CEPH, thats it.
So far what i have seen how CEPH operates, i do not see need to configure Hardware RAID at all. I have 2 Nodes with 4 HDD in each which makes combine clustered storage space of 16TB. Even if i "accidentally" unplug one node, my cluster still keeps running, of course the total storage space drops to 8TB. In a way CEPH provides RAID but in host level.


Did you add bonding too ?
I have not done bonding yet. Thats the next step now that i see everything working. :) I got all the necessary hardware for it just need to configure it. I have not done any bonding before, so this is going to be another learning opportunity. During regular operation the cluster does not use lot of Network bandwidth, but when it goes into Recovery mode because a hdd or host died, thats when i have seen most of the bandwidth getting used up. I am hoping with bonded higher bandwidth, the recovery will happen faster.
 
If you want to play with bonding you should definitively ensure you use managed switches which supports LACP (802.3ad)

Both of my switches are Netgear Smartswitch GS724T. I found LACP option but not sure 802.3ad. Can bonding be done with these switches? This is a new region for me. Any pointers you can give would be much appreciated.
 
Got all the info i needed. Now i just have to dive in.

If you can put the Switches in "stack" mode, you will have high availability in the Switches, and the PVE nodes with two NICs, each one connected to each Switch in LACP mode. LACP also is known as "link aggregation" or "port channel".

If the setup above is for the Switches of servers, with this strategy, i also add the Switches for workstations in LACP mode connected to both Switches of servers, the advantage of this setup is obtain more bandwidth total between the Switches of servers and the Switches of workstations, and keep working in case that a Switch of servers dies. Typically you can connect up 8 ports in LACP mode.

Please see your manual of the Switch for know that you can do with these Switches (Not all switches can work in "Stack" mode).
 
Last edited:
This is an old thread I started way back when I first stumbled upon Ceph. Felt a bit nostalgic reading it. :)
The cluster I started back them has grown into an oak tree. Proxmox did not have any Ceph support on the GUI. My initial cluster was on Ubuntu. was trying to find a better option to backup.

Our cluster today has passed 1 Petabyte and growing. 40Gbps Infiniband is used as a storage backend for 150 OSDs. It's not a huge cluster but definitely much bigger than what I started testing with.

For backup, we are leveraging Ceph rbd, eve4pve-autosnap, and backy2. Of course, it bypasses Proxmox builtin backup. But for obvious reason, vzdump Proxmox just cannot keep up with our backup needs. Backup is still not quite where we really want it. For now, a combination of these tools is working. Next phase would be to replicate the main ceph cluster with a backup ceph cluster.

This thread also made me look back how far Proxmox came in the last few years. I am thankful to Proxmox devs for what they have created. Every new release has made me fall in love with Proxmox all over again. And also thanks to the wonderful Proxmox community who never let me down when I needed an answer!
 
Hi, symmcom!
Thanks for sharing your experience. We are using eve4pve-barc for ceph backup purposes!
Can you share your experience with backy2? It installs on host OS?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!