Search for better KVM Backup solution

kev974

Active Member
Jul 12, 2016
31
1
28
35
Hello,

First of all sorry if i make some english mistakes (i'm french and not so good at english)
We are currently looking for a better backup solution for our cluster.
Indeed we have a five nodes cluster on proxmox running about 25 qemu VMs.We actually use the integrated Vzdump Backup solution for all Vms at night but it's taking too much time for vm with large disks (about 15 hours to do the whole backups).Some Vms With about 500 GB diks can take 5 hours to backup

For more precision our VMs disks are on an equallogic SAN with 3 GBits ethernet links and we use LVM over the ISCSI connection.The backup are done on a NFS storage on a Synology.The total of VMs disks size is about 3To.

When looking at backup logs i can see for example a 500 GB vm is backuped in 4 hours in 50MB/S and the archive is 177 GB.

We also need to duplicate those backups to another site where we have a another proxmox cluster as a recovery site.So if doing full backup we have about 1,5To to transfert by our 100Mbps Fibers connexion so it would take about 41 hours (10 MN/S) to transfert it which is not possible.

So we are looking for a solution to be able to do backups and repplicate all in the off-site at 8 hours or less (zfs pve zsync?incremental backups like in veeam for vmware would be better but i didn't found anything doing the same for proxmox)

If someone have a solution it would be greatly apreciated.

Thanks all.
 
I can understand your points and we have solved the "big VM" issues internally with two disks per VM, one with the OS that is backed-up and one that is not backed-up with ZFS, so that we can do incremental backups ourself (and all the other features of ZFS like compression ,snapshots etc). This solved the time-issue for the big VMs and we can do consistent backups.

Other solution is to throw more hardware at it :-D
 
  • Like
Reactions: Pourya Mehdinejad
Hello,

Thanks for your answer.The problem is that lots of those VMs have about 150 GB OS and 250 GB for database or other data.
If I understand you do ZFS backups (snapshot) for the others disks?If this is right do you do incremental backups of your VM with pve-zsync?In those disks do you have databases ?

I've read ZFS do a lot of read and right activity so that it bring shorter disks lifetime so isn't it a problem for you?

I've done some tests moving vm disks to the host drive (local) and the backups are really fast! One backup taking 3h45 is done in about 15 mins!!!So i think the botleneck is reading performance via ISCSI.

What do mean when saying "to throw more hardware at it"?
 
I think the Mainproblem is your hardware. We also have a lot of VM's and a lot of datas. But we copy with a normal Gigabitnetwork with about 110mb/s. And you copy only with 50mb/s. So there is an HW or protocoll or configuration issue!
In us setup we use as backupstorage: Dell/HP Server with HWraid or ZFS, also Qnap. But we do us backup always over NFS. So first check this.

For the second, you can increase your networkspeed with (a little bit strange) cheap custom configuration. You can put for every important big VM an extra networkcard then you have for every VM gigbit fullspeed for the backup.

What i prefer is 10gbit. The interfaces are not so expensive (350Euro per Interface). An managet 10gbit Switch from netgear cost about 700 Euro. So it is really really cheap for 10gbit. If you would like enterprise hw, go to HP but... yes it costs more then 6000 Euro. So for backup this netgear should be enough.

But listen: 10gbit speed is really depending on you cpu. So for exampel an 6 years old xeon from example HP ML350G5/G6 can't accomplish this speed, the cpu is not fast enough. So for G5 about 300mb/s and G6 600-700mb/s.

Hope the information will help a little.
 
  • Like
Reactions: Pourya Mehdinejad
I think the Mainproblem is your hardware. We also have a lot of VM's and a lot of datas. But we copy with a normal Gigabitnetwork with about 110mb/s. And you copy only with 50mb/s. So there is an HW or protocoll or configuration issue!
In us setup we use as backupstorage: Dell/HP Server with HWraid or ZFS, also Qnap. But we do us backup always over NFS. So first check this.

For the second, you can increase your networkspeed with (a little bit strange) cheap custom configuration. You can put for every important big VM an extra networkcard then you have for every VM gigbit fullspeed for the backup.

What i prefer is 10gbit. The interfaces are not so expensive (350Euro per Interface). An managet 10gbit Switch from netgear cost about 700 Euro. So it is really really cheap for 10gbit. If you would like enterprise hw, go to HP but... yes it costs more then 6000 Euro. So for backup this netgear should be enough.

But listen: 10gbit speed is really depending on you cpu. So for exampel an 6 years old xeon from example HP ML350G5/G6 can't accomplish this speed, the cpu is not fast enough. So for G5 about 300mb/s and G6 600-700mb/s.

Hope the information will help a little.
Hi Fireo,

Thanks for your advices.A SCP over our nfs storage is 113MB/S so we have a 1GB/S (even if we have a 802.3ad bonding) link but when backup we have only 50/70MB/S (maybe because of compression) do you reach 110MBS during backups?If yes do you have bonding?

Your advice concerning 1 more interface for those specific VM but if understand backups doesn't depend of the VM network but on the Host nework so should i add one more network interface to my Dell R610 to another network (for example 192.168.2.0/24 and one of my nas interface in 192.168.2.0/24 too) and add this network export for Backups?

One question : You backup sotrage is in the same network as the production or the ISCSI network or do you have one more network only for backups?

Thanks in advance.
 
I would use NetApp as Storage and do Backups with Snapshots and use SnapMirror or SVM DR to backup site. This would be the fastest solution. SnapMirror is using network compression.
 
Hello,

Thanks for your answer.The problem is that lots of those VMs have about 150 GB OS and 250 GB for database or other data.
If I understand you do ZFS backups (snapshot) for the others disks?If this is right do you do incremental backups of your VM with pve-zsync?In those disks do you have databases ?

No databases, only files. But I'd backup a database with the consistent backup mechanism the database provides, e.g. RMAN on Oracle. That's much faster and less error prone.

We have our own snapshotting and transfer tool for that, but it is very, very simple to setup this environment.

I've read ZFS do a lot of read and right activity so that it bring shorter disks lifetime so isn't it a problem for you?

I never read that, but it could be true. ZFS is superior to any other filesystem you know, ZFS brings the I in RAID back to the real "inexpensive" disks, so let them fail.

I've done some tests moving vm disks to the host drive (local) and the backups are really fast! One backup taking 3h45 is done in about 15 mins!!!So i think the botleneck is reading performance via ISCSI.

What do mean when saying "to throw more hardware at it"?

What @fireon already said. Buy better network cards (e.g. offloading of iSCSI, more speed) and better CPUs. I'v also a big cluster with "only" 1GBit and I'm backing up with full 1GBit speed.
 
I would use NetApp as Storage and do Backups with Snapshots and use SnapMirror or SVM DR to backup site. This would be the fastest solution. SnapMirror is using network compression.

I still do not understand why all this storage-only replication is still hyped. It's so inconsistent .... With Proxmox/KVM and qemu-agent you can (and should) put your applications in "backup-mode".

Have you ever heard of a recovery scenario that actually worked after a switch? I only heard bad examples.
 
NetApp Snapshots has their own products for creating consistent Snapshots. These products are available for Exchange, SMSQL, SharePoint and Oracle. There is also a software for VMWare. As i know they have support for RHEV also. This is KVM based so it should be possible to do this with Proxmox too. Storage solutions work very well and you have a lot more features than simple backup like verification and cloning in some seconds. Also i can restore single mails in Exchange. This is not possible with a simple dump.
 
Most of these "solutions" are not really applicable to most situations. I don't know about the MS stuff, but for Oracle this only works for a very simple case. If you have a Oracle Real Application Cluster, you cannot use cloning because you have to use ASM. A simple clone of a DB on the same server is therefore not possible, backup neither. It does work for single instance DBs on a filesystem however.
 
NetApp Snapshots has their own products for creating consistent Snapshots. These products are available for Exchange, SMSQL, SharePoint and Oracle. There is also a software for VMWare. As i know they have support for RHEV also. This is KVM based so it should be possible to do this with Proxmox too. Storage solutions work very well and you have a lot more features than simple backup like verification and cloning in some seconds. Also i can restore single mails in Exchange. This is not possible with a simple dump.
Hi hec,

I was looking at Netapp and if i understand this solution involve to buy new Storage hardware (their FAS or E-series)?If yes i don't think to have the budget for it.The only way for us get budget for this project is to improve our RTO like a synchronous repplication or a 15 min RTO.But we have a 100 Mbits so a 10MB/S fibre link.
Do you use NetApp?What are your network link for Off-site?
 
If you like to use the features you need to buy a FAS. The E-series are only a very fast block storage.

We use a NetApp MetroCluster so write the date synchronous on both datacenters. We backup the data from the MetroCluster to the nearstore.

We can switch between the datacenters without any service outage. We have 20GBit/s Ethernet and FC too between the DCs.

For simple SnapMirror replication 20MBit should be enough - but this depends on changerate of the data.

best regards
Gregor
 
If you like to use the features you need to buy a FAS. The E-series are only a very fast block storage.

We use a NetApp MetroCluster so write the date synchronous on both datacenters. We backup the data from the MetroCluster to the nearstore.

We can switch between the datacenters without any service outage. We have 20GBit/s Ethernet and FC too between the DCs.

For simple SnapMirror replication 20MBit should be enough - but this depends on changerate of the data.

best regards
Gregor
It seems to be interesting but we need to buy 2 FAS and the dont indicates their prices.I may clntact them monday to get it and more informations.Do you have an idea of their cheapper FAS ?
I think it's difficult to do synchronous repplication with a 20Mbtis so 2MB/s link but i'll ask them.

Have you ever done a complete disaster recovery or test it entierely ?
 
I still do not understand why all this storage-only replication is still hyped. It's so inconsistent .... With Proxmox/KVM and qemu-agent you can (and should) put your applications in "backup-mode".

Have you ever heard of a recovery scenario that actually worked after a switch? I only heard bad examples.

I think proxmox backup are good but when you do offsite backup this is not really possible to get a good RPO .indeed i calcaluted the amount of my backup space and it is more than 500 GB every night.so with my 10/s link it will take more than 12 hours to transfer it.

Im looking at a network compression solution .Do you do offsite backups ?if yes what connection have you got ?
 
What performance you need and how many space do you need?

We did tests for DR. They are working but you need some time get all up and running. Lets say 1-2 hours. Depending on your infrastructure. If you script this maybe 30min. The cheap FAS are 25xx. They are ok but limited in connections. You can't do 10G and FC. You need to choose what you like to use. Also for DR you need not only to think about also for hypervisors and dc switches and so on.
 
I think proxmox backup are good but when you do offsite backup this is not really possible to get a good RPO .indeed i calcaluted the amount of my backup space and it is more than 500 GB every night.so with my 10/s link it will take more than 12 hours to transfer it.

Im looking at a network compression solution .Do you do offsite backups ?if yes what connection have you got ?
If backup speed is a factor you could connect your offsite backup through Infiniband ConnectX4 (100 Gb) which is fully supported in Linux (http://www.mellanox.com/page/products_dyn?product_family=204&mtag=connectx_4_en_card)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!