KVM machines very slow /unreachable during vmtar backup

obrienmd · Dec 5, 2011

Come to think of it, we never had this problem w/ 2.6.35 either. Someone w/ 2.6.35: Can you post the output of lvs --version? For 2.6.32-6-53, it's:
LVM version: 2.02.39 (2008-06-27)
Library version: 1.02.27 (2008-06-25)
Driver version: 4.20.6

Apparently, this is a pretty new driver (for instance, Ubuntu 11.04 came with 4.19.1), and I don't think LVM and Library version would differ between 2.6.32 and .35...

atran · Dec 6, 2011

obrienmd said:
Come to think of it, we never had this problem w/ 2.6.35 either. Someone w/ 2.6.35: Can you post the output of lvs --version? For 2.6.32-6-53, it's:
LVM version: 2.02.39 (2008-06-27)
Library version: 1.02.27 (2008-06-25)
Driver version: 4.20.6

Apparently, this is a pretty new driver (for instance, Ubuntu 11.04 came with 4.19.1), and I don't think LVM and Library version would differ between 2.6.32 and .35...

2.3.35:
LVM version: 2.02.39 (2008-06-27)
Library version: 1.02.27 (2008-06-25)
Driver version: 4.17.0

obrienmd · Dec 6, 2011

Looks like there is a regression of some sort, perhaps between LVM drivers?

atran · Dec 7, 2011

if the problem is not going to be fixed soon, i'll upgrade to 2.6.35 to see if that help.

obrienmd · Dec 7, 2011

2.6.35 is not really an upgrade (2.6.32 proxmox kernel has lots of improvements / fixes over 2.6.35 proxmox kernel which has not been maintained for some time), but as a test it would be worthwhile to see if that helps.

obrienmd · Dec 11, 2011

Just tested on a few pve 2.0 beta boxes, completely stock (no vzdump tweaks), on same fast hardware, running backup w/ no gzip:

pve-manager: 2.0-14 (pve-manager/2.0/6a150142)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 2.0-54
pve-kernel-2.6.32-6-pve: 2.6.32-54
lvm2: 2.02.86-1pve2
clvm: 2.02.86-1pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-1
libqb: 0.6.0-1
redhat-cluster-pve: 3.1.7-1
pve-cluster: 1.0-12
qemu-server: 2.0-11
pve-firmware: 1.0-13
libpve-common-perl: 1.0-10
libpve-access-control: 1.0-3
libpve-storage-perl: 2.0-9
vncterm: 1.0-2
vzctl: 3.0.29-3pve7
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-1
ksm-control-daemon: 1.1-1

And had the exact same issue.

obrienmd · Dec 13, 2011

Does everyone just use gzip and thus not have this problem?

atran · Dec 13, 2011

this weekend i'll try to upgrade to 2.6.35 kernel to see if its a kernel problem.

obrienmd · Dec 13, 2011

Looking forward to your results!

obrienmd · Dec 24, 2011

atran, any results?

Any more ideas from the proxmox folks on this?

dietmar · Dec 24, 2011

obrienmd said:
Any more ideas from the proxmox folks on this?

Are there any stale snapshots around?

# dmsetup table

obrienmd · Dec 25, 2011

dietmar said:
Are there any stale snapshots around?

# dmsetup table

That's really interesting - I have no idea what this means, but a few of my disks have multiple entries:

Code:

array-vm--107--disk--1: 0 419430400 linear 8:0 5460984320
array-vm--106--disk--1: 0 209715200 linear 8:0 4538237440
array-vm--105--disk--1: 0 209715200 linear 8:0 5251269120
[B]array-vm--104--disk--2: 0 1258291200 linear 8:0 1367343616[/B]
[B]array-vm--104--disk--2: 1258291200 452984832 linear 8:0 209715712[/B]
[B]array-vm--104--disk--2: 1711276032 385875968 linear 8:0 4093641216[/B]
array-vm--104--disk--1: 0 209715200 linear 8:0 4747952640
array-vm--103--disk--1: 0 209715200 linear 8:0 3883926016
pve-swap: 0 9699328 linear 8:18 2048
pve-root: 0 19398656 linear 8:18 9701376
[B]array-vm--109--disk--4: 0 1094778880 linear 8:0 6719275520[/B]
[B]array-vm--109--disk--4: 1094778880 163512320 linear 8:0 662700544[/B]
pve-data: 0 39632896 linear 8:18 29100032
array-vm--109--disk--3: 0 1258291200 linear 8:0 2625634816
array-vm--100--disk--1: 0 41943040 linear 8:0 4479517184
array-vm--109--disk--1: 0 25165824 linear 8:0 1342177792
array-vm--108--disk--1: 0 419430400 linear 8:0 6299845120

dietmar · Dec 25, 2011

obrienmd said:
That's really interesting - I have no idea what this means, but a few of my disks have multiple entries:

AFAIK those are just VM disks. What is the output of lvs?

obrienmd · Dec 26, 2011

dietmar said:
AFAIK those are just VM disks. What is the output of lvs?

No stale snapshots in lvs.

obrienmd · Dec 26, 2011

dietmar said:
AFAIK those are just VM disks. What is the output of lvs?

No stale snapshots in lvs.

jollyscots · Feb 6, 2012

I have similar issue.

I am runing on an SSD.

I started vmdump about 10mins ago, and vmtar is still pulling 21,000 iops/100megs a sec from the SSD device.

If this was a hard drive.... my system would grind to a halt.

The thing is this: the data is going no where. There is no IO on my backup drive (which is local)

vmtar command line:
/usr/lib/qemu-server/vmtar/mnt/local_backups/vzdump-qemu-101-2012_02_07-02_08_44.tmp/qemu-server.confqemu-server.conf/dev/SSD2LVM/vzsnap-virthost-0vm-disk-ide0.raw

What this means for me is that snapshot backups dont actually work in any useful fashion.

Edit: Actually, the backups did make it to the local backup store. Not sure why nothing was showing in iostat.

BUT, can Proxmox please take note: 21,000 IOPS!! Obvioulsy vmtar is using to small of a block size?
No wonder SATA drives or network drives are running slow

Code:

sdb           25618.67       176.17         0.04        528          0
sdb1          25618.67       176.17         0.04        528          0

Cheers

sirmikealot · Feb 6, 2012

I've found that there must be a memory leak in proxmox. If you simply shut down your vm's, and restart the vm server, everything comes up real fast like the day it was installed. Not kidding.

jollyscots · Feb 6, 2012

Ok so after 10mins of the SSD being thrased with the data going no where, we finally start seeing data getting writed to the local backup drive:

Code:

sdb           24468.00       147.75         0.00        443          0
sdb1          24468.00       147.75         0.00        443          0
sdc             179.00         0.00        88.87          0        266
sdc1            179.00         0.00        88.87          0        266

So for 10mins, the source is trashed, and now, for another 10mins, the source continues to be trashed, but at least finally the data is getting stored.

Something is clearly wrong here.

Cheers

jollyscots · Feb 6, 2012

And now vmtar is running at 10 percent CPU, and crawling through 1 meg a sec on the backup drive!

Code:

sdc             129.00         0.00         1.14          0          1
sdc1            129.00         0.00         1.14          0          1

If proxmox could themselves boot up a few sata drives, they would find the bottlenecks and inefficiencies.

As long as proxmox continue to blame user hardware, they are missing out on a good opportunity to tune their tools!

Edit: 130 IOPS on a SATA drive: That poor thing is being thrased!
130 IOPS, and only 1megs a second: Bad software!

dietmar · Feb 6, 2012

jollyscots said:
Ok so after 10mins of the SSD being thrased with the data going no where

My guess is the vzdump simply scans the data for holes (first pass ==> sparse file scan).

KVM machines very slow /unreachable during vmtar backup

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

New Member

New Member

New Member

New Member

Proxmox Staff Member