Backing up my KVMs makes the system very slow

poka

Active Member
Jul 12, 2017
9
0
41
34
When i'm running backup snapshots in fast LZO compression of my KVM via proxmox, I'm seeing very significant slowdowns, not only of the system being backed up, but also of other KVMs on the same server.
These systems then no longer become usable during backup.
This is the case for all KVMs with disks larger than 60GB (sometimes 200GB).
The problem does not seem to occur on KVMs with a 30GB disk

I'm making backup on local disk.
My KVM are using VirtIO SCSI driver.
Server load increases to 8 during backups.

I'm using ProxMox VE 5.1-41.
My server spec:
- X2 500Go SSD SAMSUNG MZ7LN512HCHP
- 12 Core CPU 3.5GHz
- 100Go RAM

The /boot partition is on a RAID 1 and / partition is on RAID 5
The host is using ext4, as the KVMs.

This is the result of a few commands:
Code:
cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

Code:
pveperf
CPU BOGOMIPS: 83900.76
REGEX/SECOND: 3170830
HD SIZE: 936.10 GB (/dev/md1)
BUFFERED READS: 1034.51 MB/sec
AVERAGE SEEK TIME: 0.14 ms
FSYNCS/SECOND: 37.45
DNS EXT: 20.46 ms

Same command during a backup:
CPU BOGOMIPS: 83900.76
REGEX/SECOND: 3087087
HD SIZE: 936.10 GB (/dev/md1)
BUFFERED READS: 53.97 MB/sec
AVERAGE SEEK TIME: 11.27 ms
FSYNCS/SECOND: 0.05
DNS EXT: 17.49 ms

I don't know what kind of information I could approach in addition.
Is this comportment normal ?

----------------

Next to that, I wonder if this is not due to the type of controller I use for my KVMs?

What is the difference between VirtIO SCSI and LSI 53C895A?
Also I see that the cache is disabled on my KVM disks, where there is the possibility to configure it in Direct Sync, Write Through, ect ...
Would that change anything?

I just followed the advice on your wiki to configure theses things ... But this one seems to give another reality.

I also read this topic but I couldn't find solution. It's a pretty old topic and the stated configurations are getting away from mine.
 
Last edited:
Hi,

As your pveperf report tells you, your disk is to slow for your workload.
If the fsync on a normal running system is 37 you are on working minimum.
Sure if you make a backup on a system what is on the working minimum it will become unstable.

For a proper working system with backup option, you should have over 100 fsync.
 
I asked to my host to test the disk. There are no speed problems.

Strangely enough, after restarting the host, the fsyncs/s went to just over 100. But that doesn't seem to solve the problem, always problems of instabilities during backups.

Code:
pveperf
CPU BOGOMIPS: 83898.72
REGEX/SECOND: 2116843
HD SIZE: 936.10 GB (/dev/md1)
BUFFERED READS: 1120.59 MB/sec
AVERAGE SEEK TIME: 0.12 ms
FSYNCS/SECOND: 108.89
DNS EXT: 23.99 ms
 
As PM staff told you, your storage is simply to slow.

If you want to increase performance with your system, switch to RAID 10.
Also HW RAID cache could help a lot. In my systems, using HW RAID with cache, we get 3000 + fsyncs, without it less than 200.

Even better could be using ZFS with small SSDs for SLOG and L2ARC.
Especially if you need fast backups and replication.
Here I have two consumer grade 2.5" disks, fronted with small intel SSDs:

Code:
root@p24:~# pveperf
CPU BOGOMIPS:      79995.00
REGEX/SECOND:      1858335
HD SIZE:           890.46 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     2375.07

root@p24:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0h3m with 0 errors on Sun May 13 00:27:19 2018
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
        logs
          sdc1      ONLINE       0     0     0
          sdd1      ONLINE       0     0     0
        cache
          sdc2      ONLINE       0     0     0
          sdd2      ONLINE       0     0     0

errors: No known data errors
root@p24:~# dmesg | grep D2010
[    1.851979] ata3.00: ATA-9: INTEL SSDSC1NB080G4, D2010370, max UDMA/133
[    2.167570] ata4.00: ATA-9: INTEL SSDSC1NB080G4, D2010370, max UDMA/133
 
When backup is running, if a new write is coming on a block which is not yet backuped, the block first be first copy to backup storage, before replaced by the new block. (this occur only at the first replace of the block).

Now, I don't known how to done le fsync bench of pveperf, but if it's writing always a different block, the result is almost the performance of the backup storage.

So depending of your real workload, if you have a lot of write on a lot o differents block, you'll need a fast backup storage or you'll have slow down.

(BTW, what is your backup storage ? 0.05 fsync seem really really slow ...)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!