High server load during backup creation

Here's the output of pveperf:
Code:
root@proxmox:~# pveperf
CPU BOGOMIPS:      31997.12
REGEX/SECOND:      817650
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    112.32 MB/sec
AVERAGE SEEK TIME: 9.16 ms
FSYNCS/SECOND:     2411.21
DNS EXT:           33.75 ms
DNS INT:           1.01 ms (mydomain)
Our server is a HP Proliant DL180G6 with three drives for the proxmox section and nine drives for the VMs, each 500GB large (this one), both in a RAID 5. CPU: Intel E5504, quad-core at 2 GHz. 48 GB RAM. It is connected through Gigabit to our backup server with these mount options:
Code:
10.162.32.7:/backup/hd2/proxmox on /mnt/pve/ninja type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.162.32.7,mountvers=3,mountport=39276,mountproto=udp,local_lock=none,addr=10.162.32.7)

Does it help if you decrease chunk size (rsize and wsize)?
http://nfs.sourceforge.net/nfs-howto/ar01s05.html
 
Sorry for the delay, I got other problems to work on. I just wanted to see the problem arising while looking at it. I started a new backup without any further modifications, the load within the VM got up far beyond 10, the VM-internal disk got into read-only mode, applications crashed. On reboot, fsck runs to find problems...

So, some reboots later, I set rsize and wsize down to 8192 both, without resolving the internal problems. The VM still gets locked, error message: "kernel: journal commit I/O error". Even if someone sees hard drive problems in this error, I don't see any problems in Proxmox, only in the VM...
 
Last edited:
<rant>
Yea, and if your backup destination fails for some reason, IO gets stalled in the guest. Example, backing up to NFS server, NFS server dies during backup.
The backup never stops, times out or realizes "Hey, the backup destination is not accepting my data anymore, time to give up on this backup!"

Result is the same as what Nico reports, IO stalls in the VM and the VMs load skyrockets.

A backup system that crashes the systems it is backing up is a useless backup system.

The write IO in the VM is limited to the speed of the backup device when a backup is in progress.
I am sure that alone will cause someone a problem, a high IOPS system might not be able to use the Proxmox backup method.

Until this new method is properly vetted and all the kinks worked out I think it would be best to allow users to optionally use the LVM backup method.
Clearly it is causing problems for people and needs fixed, in the meantime we all need to be able to make backups and that is impossible without risking downtime.

Let me explain it like this:
We should not have to choose between:
A) Backup our virtual servers and risk unexpected downtime
or
B) Make no backups

For some people and some configurations that is the only two options the new backup method gives us.

This problem is making Proxmox look bad.
Give us back LVM snapshot backups until you have managed to resolve every issue with the KVM Live Backup including the fact that IO Writes in the VM are limited to the backup disks IO speed.


If anyone thinks this is an isolated issue, think again:
http://forum.proxmox.com/threads/14173-Proxmox-3-0-backup-issue
http://forum.proxmox.com/threads/9922-NFS-Backup-Failed-Cluster-unusable
http://forum.proxmox.com/threads/14075-Backup-very-slow-on-Proxmox-3
http://forum.proxmox.com/threads/15930-NFS-backups-lock-up-VMs
http://forum.proxmox.com/threads/16437-Proxmox-hypervisor-crash-during-backup
http://forum.proxmox.com/threads/15156-Host-crash-during-backups-to-NFS
http://forum.proxmox.com/threads/15438-URGENT-High-Disk-Write-Spike-Causes-whole-system-to-crash

</rant>

<fact>
There is a problem and it can easily be solved by giving us back LVM Snapshot backup option.
</fact>
 
Just to be clear:

write to backup target: 30 MB/s

Does this mean write performance in the VM being backed up decreases to 30 MB/s or is it all VM's on the same host?
 
Just curios: Is all problems related to backing up to a NFS share?
Has somebody with problems tried to backup to an iSCSI LUN?

You cannot backup to an iSCSI lun, must be a filesystem for backup target.
I have personally seen this issue with NFS and Samba, created some pissed off customers too.
You try explaining how a BACKUP system caused their VM to go down, only then will you understand that this Live KVM Backup feature sucks.

Most of my systems use a local SATA II attached disk that is encrypted using LUKS.
I have noticed that load averages are higher since around June when performing backups, been unable to identify specifically what caused it tho. Maybe the NSA has taped my SATA II link:p


Give us back and option to use LVM Snapshots for backup!
 
Just brain storming.

I am using AMD CPU and AMD chipsets and do not see this problem.

CPU Phenom II X6
Chipset AMD A55
Ram 16 GB
Nics: 2 x Intel 82574L
Backup storage: NFS on Qnap Raid 1
Nic: 1 x Intel 82574L

cat /etc/vzdump.conf
bwlimit: 100000
Backup takes place Saturday 5 am.

5 VM's totaling approximately 20 GB

Network load
network.png

Memory usage
memory.png

Server load
server.png

CPU usage
cpu.png
 
The VM being backed up would be limited to 30MB/sec Write speed
So write speed on VM being backed up is limited to the write speed of the backup storage?

When I do backup to my rather slow backup storage (Max is 60-70 MB/s) I have no issue since 60-70 MB/s write speed is acceptable and since backup only takes =~ 10 min. people hardly notice. Perhaps this would be a problem given much larger backups.

BTW. Have you tried different settings for bwlimit in /etc/vzdump.conf?
 
Once more: actions in the VM host should not affect any VM internals. The backup happening in the host should be invisible for the virtual machine. Otherwise, something is very wrong...

I elect your post for quote of the day.

This highlights exactly why KVM Live Backup is flawed. Even if it worked perfect it is flawed, it will limit writes in the VM to the speed of the backup media.

KVM Live Backup should be an option not a mandate.
 
Maybe you are onto something.
The two systems that had major issues with Samba and NFS were Intel Xeon CPUs.
That might help solve why there are issues with NFS/Samba but it does not fix KVM Live Backup.

There are still two fundamental flaws in KVM Live Backup:
1. If the backup process IO stalls, IO in the VM stalls.
2. The write IO of a VM is limited to the speed of the backup media when writing to any un-archived block.

I am sure the developers could fix issue #1, but there is nothing they can do to fix issue #2.
I have some VMs that need extremly high IOPS, I should be allowed to choose to use LVM or KVM Live Backup based on my needs.
But I can't, the developers have mandated an unvetted technology upon us and it is causing issues for numerous people.
 
There are still two fundamental flaws in KVM Live Backup:
1. If the backup process IO stalls, IO in the VM stalls.
2. The write IO of a VM is limited to the speed of the backup media when writing to any un-archived block.

I am sure the developers could fix issue #1, but there is nothing they can do to fix issue #2.

I really wonder why somebody claims such nonsense? Both things are fixable by using a temporary storage on the local hard disk (as LVM does).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!