High server load during backup creation

escoreal · Nov 20, 2013

dietmar said:
Both things are fixable by using a temporary storage on the local hard disk (as LVM does).

Not with every setup

If one VM uses more space than free space is available than it could be a little bit tricky. Especially with expensive/fast storage it could be that you don't oversize that much.

esco

Nico Haase · Nov 20, 2013

dietmar said:
I really wonder why somebody claims such nonsense? Both things are fixable by using a temporary storage on the local hard disk (as LVM does).

Sorry to ask so silly, but you just claim that everything is alright and there can be no problem even if we have one? Of course, not everyone of us is a paying subscriber, but how can you be that little helpful for a problem like this which seems to affect some users seriously?

dietmar · Nov 20, 2013

escoreal said:
Not with every setup
If one VM uses more space than free space is available than it could be a little bit tricky.

But LVM have exaclty the same problems!

dietmar · Nov 20, 2013

Nico Haase said:
Sorry to ask so silly, but you just claim that everything is alright and there can be no problem even if we have one?

Please re-read my post carefully. I just told you that the issue is 'fixable', and also described a way to solve the issue. So feel free to send a patch to implement that functionality.

escoreal · Nov 20, 2013

dietmar said:
But LVM have exaclty the same problems!

For LVM you can make a smaller snapshot. If you have to store the whole backup(s) locally first you would need more space.

Example:

600 GB SSD
1 VM with 500 GB
Snapshot could be 100 GB

Complete local backup would be tricky?

dietmar · Nov 20, 2013

escoreal said:
For LVM you can make a smaller snapshot.

No, that does not work. Your LVM snapshot simply runs full, and whole IO is halted (many, many users run into that issue with previous LVM backup).

Nico Haase · Nov 20, 2013

dietmar said:
Please re-read my post carefully. I just told you that the issue is 'fixable', and also described a way to solve the issue. So feel free to send a patch to implement that functionality.

Sorry, even if I am a student of computer science, I have no time to get into your stuff. Telling people who have a problem with your software they can fix it themself if they want to see it fixed sounds weird. If I wanted to do it myself, there would be no need to use your software...

escoreal · Nov 20, 2013

What? I thought the snapshot gets invalid if it runs out of space and the original volume is accessible?

But even with high write load the snapshot can be smaller then the complete virtual disk during backup. Only if you have to write to the whole disk during backup (I think that is a rare case

) you would need a snapshot with the same size as the disk.

esco

e100 · Nov 20, 2013

dietmar said:
I really wonder why somebody claims such nonsense? Both things are fixable by using a temporary storage on the local hard disk (as LVM does).

My fast storage is on LVM, so you are proposing that to keep the speed up you would create an LVM volume to use for temporary storage so write speed is not limited by the backup device.
Sounds like you want to re-invent what LVM snapshots do.

As I have said before I like the concept behind KVM Live Backup, but it is (currently) very flawed.
Having VMs hang because a backup device failed is unacceptable.
Limiting write speed is unacceptable.

Do you plan to fix these problems?
If so, when can we expect a fix?

dietmar · Nov 20, 2013

escoreal said:
What? I thought the snapshot gets invalid if it runs out of space and the original volume is accessible?

yes. The snapshot gets unusable.

escoreal said:
But even with high write load the snapshot can be smaller then the complete virtual disk during backup. Only if you have to write to the whole disk during backup (I think that is a rare case ) you would need a snapshot with the same size as the disk.

You can implement exactly the same with the new backup method. I think a simple (mmapped) ring buffer can do it - I will try that when I touch the backup code next time.

dietmar · Nov 20, 2013

e100 said:
Sounds like you want to re-invent what LVM snapshots do.

You seem to be totally unaware of all the problems with LVM (search the forum).

e100 said:
Do you plan to fix these problems?
If so, when can we expect a fix?

I think you should use a fast local disk for such backups, and use hook script to transfer result to slow storage.

e100 · Nov 20, 2013

dietmar said:
Please re-read my post carefully. I just told you that the issue is 'fixable', and also described a way to solve the issue. So feel free to send a patch to implement that functionality.

Dietmar,

Many of us here are not great developers like you and could not even attempt to fix these issues in KVM Live Backup.
That is what upsets us, you took a perfectly good working feature away and replaced it with a flawed feature.

What is wrong with allowing LVM or KVM Live Backup?

I also have a hypothesis as to why KVM Live Backup increases load more than LVM Snapshots did.
KVM Live Backup needs to move data around inside the KVM process itself.
All of this data moving is increasing cache misses, especially for the virtual server or maybe just the additional memory copies in the KVM process is to blame.
Now I have no idea how to proove or disprove this but this is what I suspect is causing the increased load numerous people are complaining about.
What is your opinion on this hypothesis?

dietmar · Nov 20, 2013

e100 said:
Now I have no idea how to proove or disprove this but this is what I suspect is causing the increased load numerous people are complaining about.
What is your opinion on this hypothesis?

The new backup avoids additional read/write cycles. So you need to provide a test case to show your claims. Maybe it is just a small bug somewhere, and we can fix it fast if we have a test case.

Note: We can only fix things if we have a test case.

e100 · Nov 20, 2013

Maybe I did not quite explain my hypothesis.
I agree that KVM Live Backup avoids additional IO related read/write cycles, this is the advantage that it has.
But since it packs all of the data moving into the KVM process itself it has a negative impact on CPU performance of the KVM process.

With LVM, the kernel is doing the extra IO for the COW to make the snapshot work.
Some other process is reading from the snapshow and writing out the backup file.
All of this is likely happening on a different CPU/Cores than where the KVM process is running too.

With KVM Live Backup the COW happens in the KVM process itself.
The KVM Process is also doing the reading from the disk, not some other process.
The KVM Process is then sending the read data to the actual backup process.
Putting all of those things into the process running the VM itself has to have a negative impact on the operation of the VM, there is no free lunch here.
KVM is suddenly moving around massive amounts of data that it normally does not touch, there must be a negative impact on the CPU cache where the KVM process is running.
Cache misses increase in the KVM Process and load rises.

Essentially KVM Live Backup reduces IO at the cost of CPU effencicy and that is what is causing the increased load people are seeing/complaining about.

dietmar · Nov 21, 2013

e100 said:
Essentially KVM Live Backup reduces IO at the cost of CPU effencicy and that is what is causing the increased load people are seeing/complaining about.

So why is it fast then when they use local storage as target?

e100 · Nov 21, 2013

dietmar said:
So why is it fast then when they use local storage as target?

Less latency is a good explanation.

No one is complaining about the speed of the backup. I am pointing out how the new backup process has a negative impact on the performance of the guest VM.

I have not noticed the KVM live backup being much, if any, faster than the LVM snapshot method. I have noticed KVM live backup having an extremely negative impact on performance.

e100 · Nov 21, 2013

dietmar said:
You seem to be totally unaware of all the problems with LVM (search the forum).

You seem to be unwilling to admit that your invention is not perfect. Your cure to the LVM problem creates a new set of problems.
One user reported how the new method corrupted his guest filesystem, LVM snapshot never did this as far as I am aware.
Never had a LVM snapshot method cause the guest to stall when backups had issues, new method does this.
Numerous people have observed increased load with new method, this I predict will not be resolved with patches to KVM live backup.

dietmar said:
I think you should use a fast local disk for such backups, and use hook script to transfer result to slow storage.

This was not necessary with LVM snapshot backup. Yet somehow you expect me to believe that this new method is vastly superior.

Both methods are flawed and have their issues, neither is a perfect solution in all situations. One can stall the VM when things go wrong where the other does not, one uses more disk IO than the other, one requires writing external scripts to avoid decreased IO performance the other does not.

This is why I advocate allowing users to pick the method that fits their needs the best.

Maybe at some point in the future the new method will be much better and LVM snapshot will not be needed. In the meantime people need to be able to reliably make backups without risking a stalled VM or causing the VM to run slow.

dietmar · Nov 21, 2013

e100 said:
You seem to be unwilling to admit that your invention is not perfect.

Sorry, I can't remember that I claimed the new backup is perfect.

e100 said:
Maybe at some point in the future the new method will be much better and LVM snapshot will not be needed. In the meantime people need to be able to reliably make backups without risking a stalled VM or causing the VM to run slow.

If somebody find a bug, he should try to provide a test case to reproduce it. We can then try to fix it. Maintaining old code forever is not an option.

escoreal · Nov 21, 2013

dietmar said:
If somebody find a bug, he should try to provide a test case to reproduce it.

But this isn't a bug? It is per design?

Test would be making a backup to a slow storage while the VM writes to virtual disk

For example use local storage and throttle IO with cgroup to 1 MB/s..

esco

felipe · Nov 21, 2013

hi,

did you install the system with the baremetal iso or with debian wheezy?
i run into this problem after installing debian wheezy..
maybe for some other reason you dont have the right scheduler set...

check cat /sys/block/YOURDISKS/queue/scheduler where YOURDIKS = sda etc...
it should say cfq. any other scheduler would cause the problems you have....

to the sysadming here. can i change the wiki?
the description how to install proxmox via debian wheezy is perfect. i just miss this VERY IMPORTANT step at the end:

echo cfq > /sys/block/DISKS/queue/scheduler to all of your disks
find GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub
and add "... elevator=cfq"
run update-grub

otherwise you will run for sure in the same problem as above...
this would actually also happen with LVM snapshots.....

regards
philipp

High server load during backup creation

Renowned Member

Member

Proxmox Staff Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Member

Renowned Member

Renowned Member

Proxmox Staff Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Well-Known Member

We value your privacy