KVM machines very slow /unreachable during vmtar backup

e100 · Feb 7, 2012

dietmar said:
My guess is the vzdump simply scans the data for holes (first pass ==> sparse file scan).

It would make more sense for vmtar to only use sparse file handling when needed.
I would prefer to turn sparse handling off entirely because the majority of my backups are LVM volumes.
An option in /etc/vzdump.conf to turn it off would be great.

Benefits of sparse file handling in vmtar:
1. Only backup the data stored, not the empty space you have not used in your VM.
2. Potentially smaller backups, only ends up smaller if there is lots of sparse space.

Cost for using sparse file handling in vmtar:
1. Need to read the data twice once to find the sparse areas and once to backup the data.
2. If you have few sparse areas or none, you wasted lots of time and IO for no benefit.
3. My LVM volumes are not sparse files but vmtar still scans them like they are....

If you are using compression sparse areas of all zero's compress really well.
Why waste the IO to scan for the sparse areas when we could just read them once and compress the sparse areas?

I suspect, for most users, that turning off sparse handling in vmtar would benefit them more than it would hurt them.

Are the files restored as sparse files when using vzrestore/qmrestore?

jollyscots · Feb 7, 2012

Cost for using sparse backup? There should be no cost, it should be a speed benefit!

vmtar should NOT need to read twice, just to support sparse. Sorry but IMO that is a ridiculous assumption!

dietmar, how do you explain 21,000 IOPS? Is it reading 1 byte at a time?

More to the point, please take my findings and test for your self. There is DEFINITELY room for improvement here.

e100 · Feb 7, 2012

sirmikealot said:
I've found that there must be a memory leak in proxmox. If you simply shut down your vm's, and restart the vm server, everything comes up real fast like the day it was installed. Not kidding.

I've never seen any evidence of a memory leak anywhere.
I have however seen Linux move some seldom accessed areas of RAM to swap even when there is free RAM it could use.
Got some VM that is idle all night long? You might find it gets swapped to disk because it is idle, I personally have seen this happen more than once.
If whatever you need to access is in swap the performance WILL suck.

If you do not like the default behaviour:
http://en.wikipedia.org/wiki/Swappiness

If you are wondering why Linux might do this? RAM is an asset to performance, if you have idle data it will be moved to swap so the fast RAM can be used for buffers/cache increasing overall system responsiveness.

I've recovered from this issue without rebooting by turning off swap(after ensuring I had enough RAM free to do so):
swapoff -a

To re-enable swap:
swapon -a

It might take a little time to read the data and get it into RAM, but once it is done you will find the speed comes right back.

e100 · Feb 7, 2012

jollyscots said:
Cost for using sparse backup? There should be no cost, it should be a speed benefit!

Tar is known to be slow when dealing with sparse files.
A quick search on google can take you to dozens of references.
This is not a Proxmox problem, it is a tar problem.
http://www.gnu.org/software/tar/manual/html_node/sparse.html

jollyscots said:
vmtar should NOT need to read twice, just to support sparse. Sorry but IMO that is a ridiculous assumption!

Not an assumption, that IS what the code does.
Not only that, vmtar is based on GNU tar and GNU tar also reads sparse files twice too because that is how the algorithm works.

First read identifies the blank sections.
Second read adds the non-blank blocks to the tar output.

jollyscots said:
dietmar, how do you explain 21,000 IOPS? Is it reading 1 byte at a time?

Looks like it reads 512Bytes at a time.

jollyscots said:
More to the point, please take my findings and test for your self. There is DEFINITELY room for improvement here.

I agree there is room for improvement but the only improvement that can be made is to disable sparse file handling.
For people using gzip/pigz the resulting backup file will be about the same size since sparse areas of zeros will compress well.
For people using sparse files and not using compression their backups will be HUGE(actual allocated size huge) without sparse file handling.

Like I suggested earlier I think the best improvement that could be made is to allow individual users to turn off the sparse file handling.
Some people need it, some people would benefit from turning it off.

dietmar · Feb 7, 2012

jollyscots said:
vmtar should NOT need to read twice, just to support sparse. Sorry but IMO that is a ridiculous assumption!

Unfortunately, there is not a single backup format out there which supports writing spares files with one pass. So we need
to read 2 time in order to generate a correct tar file.

jollyscots said:
dietmar, how do you explain 21,000 IOPS? Is it reading 1 byte at a time?

Please try to debug (strace).

dietmar · Feb 7, 2012

e100 said:
Like I suggested earlier I think the best improvement that could be made is to allow individual users to turn off the sparse file handling.
Some people need it, some people would benefit from turning it off.

Maybe lz4/lzo is fast enough that we can always use compression - that would solve the whole issue.

dietmar · Feb 7, 2012

e100 said:
3. My LVM volumes are not sparse files but vmtar still scans them like they are....

vmtar scan for zeros, so even LVM devices can benefit from that.

dietmar · Feb 7, 2012

e100 said:
Like I suggested earlier I think the best improvement that could be made is to allow individual users to turn off the sparse file handling.

See: https://git.proxmox.com/?p=qemu-server.git;a=summary
What do you think?

e100 · Feb 7, 2012

dietmar said:
See: https://git.proxmox.com/?p=qemu-server.git;a=summary
What do you think?

I like that as a solution but people not using compression would still benefit from the sparse scan.
Do you have a compiled vmtar I can download or do I need to patch and compile myself to test this?

jollyscots · Feb 8, 2012

dietmar said:
Maybe lz4/lzo is fast enough that we can always use compression - that would solve the whole issue.

Well im back, I guess when their is tech issues at hand, I just can't help myself. And that why my job is head Technical Analyst.

Anyway.

I've just done a test run using 7zip, and I set the format to gzip, fastest.

This uses just the one core, here on my 2500 mips laptop, and I get 42 mb/s

In my opinion, there should be two compression options

1) Fast: Potential to actually improve backup performance, because it reduces bandwidth required to the backup device. (Especially true if two backups were running at once, provided multiple cores are available.)
2) Good: For those more concerned about storage space on the backup device.

As for vmtar needing to scan twice: If you get compression going full speed, then you dont need to treat the file as sparse, because compression will simply wrap up all the zeroes.

And again, I really cant see what possibly is the technical need for vmtar to scan twice to handle sparse files. Perhaps this is needed for tape destinations? But I am not at all familar with the linux structure of tools used for vm backup.

e100 · Feb 8, 2012

dietmar said:
See: https://git.proxmox.com/?p=qemu-server.git;a=summary
What do you think?

I admit my C is VERY rusty, but I did try using your new code with no success.

Below I point out what I *think* is causing it not to work for me.

Code:

@@ -525,9 +530,16 @@ main (int argc, char **argv)
     time_t ctime = fs.st_mtime;
 
     struct sp_array *ma = sparray_new();
-    if (!scan_sparse_file (fd, ma)) {
-      fprintf (stderr, "scanning '%s' failed\n", source); 
-      exit (-1);
+    if (sparse) {
+           if (!scan_sparse_file (fd, ma)) {
+                   fprintf (stderr, "scanning '%s' failed\n", source); 
+                   exit (-1);
+           }
+    } else {
[B]+           off_t  file_size = fs.st_size;[/B] <- only works for files, wrong size for block devices
+           sparray_add (ma, 0, file_size);  
+           ma->real_size = file_size;
+           ma->effective_size = file_size;
     }
 
     dump_header (wbuf, archivename, ctime, ma);

Thanks for working towards eliminating this sparse scan in some instances.

e100 · Feb 8, 2012

jollyscots said:
Well im back, I guess when their is tech issues at hand, I just can't help myself. And that why my job is head Technical Analyst.

Anyway.

I've just done a test run using 7zip, and I set the format to gzip, fastest.

This uses just the one core, here on my 2500 mips laptop, and I get 42 mb/s

tar + gzip is a known standard
tar + 7zip is not a standard

I'm not against using some other compression format but using a standard format does have some benefits.
Fast is nice but it is not the only item to take into consideration.

I can take my Proxmox backup file over to any other virtualization platform and use readily available utilities to read the data and restore vms.
Sure it might be difficult and clunky, but the fact I CAN do it is a big benefit.

jollyscots said:
In my opinion, there should be two compression options

1) Fast: Potential to actually improve backup performance, because it reduces bandwidth required to the backup device. (Especially true if two backups were running at once, provided multiple cores are available.)
2) Good: For those more concerned about storage space on the backup device.

If you need speed there are currently other solutions such as pigz.
I've been using pigz for over a year now and my backups complete in a timely manner.
Yes more improvements can be made and the Proxmox team has been looking into other ideas for some time.

jollyscots said:
As for vmtar needing to scan twice: If you get compression going full speed, then you dont need to treat the file as sparse, because compression will simply wrap up all the zeroes.

The purpose of the scan in vmtar is to identify the sparse areas.
Once identified it creates the tar archive in a manner that allows the tar utility to restore the file including the sparse areas.
This way the restored file is the same size as the original with the sparse areas remaining intact.
Also, this reduces the size of the archive WITHOUT needing compression.

That being said I am not 100% sure if the Proxmox utilities used to restore actually use this feature.

jollyscots said:
And again, I really cant see what possibly is the technical need for vmtar to scan twice to handle sparse files. Perhaps this is needed for tape destinations? But I am not at all familar with the linux structure of tools used for vm backup.

Speculating as to why this is needed adds nothing to the conversation.
You could find the answer in about 2 minutes if you bothered to search google.
Scroll to the bottom of the page and read the last few paragraphs:
http://sunsite.ualberta.ca/Documentation/Gnu/tar-1.13/html_node/tar_125.html

dietmar · Feb 8, 2012

e100 said:
I like that as a solution but people not using compression would still benefit from the sparse scan.

They do - sparse scan is enabled for files if you do not use compression.

e100 said:
Do you have a compiled vmtar I can download or do I need to patch and compile myself to test this?

Sorry, you need to compile yourself.

dietmar · Feb 8, 2012

jollyscots said:
I've just done a test run using 7zip, and I set the format to gzip, fastest.

gzip is much slower than lzo, so we plan to use lzo instead (already implemented).

dietmar · Feb 8, 2012

e100 said:
Below I point out what I *think* is causing it not to work for me.

Oh, good catch - just fixed that.

jollyscots · Feb 8, 2012

e100 said:
tar + gzip is a known standard
tar + 7zip is not a standard

I set 7zip to use gzip, 'fastest'. And got 42mb a sec at ~2500mips

My test server is an absolute bonkers E3, those things absolutely fly. 10MB/s was a bit of a shock.

The purpose of the scan in vmtar is to identify the sparse areas.
Once identified it creates the tar archive in a manner that allows the tar utility to restore the file including the sparse areas.
This way the restored file is the same size as the original with the sparse areas remaining intact.
Also, this reduces the size of the archive WITHOUT needing compression.

Again, still cant see why need twin scan. Unless maybe for tape which has media catalog/header at the start.

So I read about why tar must scan twice. It seems tar uses twin scan, for compatibility issues. Again, I speculate: It should not be needed.

So the reason why the twin scan? Because proxmox uses tar, for reason of standards and portability. So twin scans, for now, are here to stay.

jollyscots · Feb 8, 2012

The purpose of the scan in vmtar is to identify the sparse areas.
Once identified it creates the tar archive in a manner that allows the tar utility to restore the file including the sparse areas.
This way the restored file is the same size as the original with the sparse areas remaining intact.
Also, this reduces the size of the archive WITHOUT needing compression.

Here's the thing about that: The restore utility should be able to translate the zeroes on the fly, as it restores them, and make it sparse again.

Speculating as to why this is needed adds nothing to the conversation.

Next time I go to think outside the box, Ill make sure to double check the internet and check how everyone else thinks.

Geez, im making a lot of friends round here!

dietmar · Feb 8, 2012

jollyscots said:
So the reason why the twin scan? Because proxmox uses tar, for reason of standards and portability. So twin scans, for now, are here to stay.

Tar stores the list of block at the beginning of the file. But any other archive formats to store sparse files have the same restriction (let me know if you find one without).

dietmar · Feb 8, 2012

jollyscots said:
Here's the thing about that: The restore utility should be able to translate the zeroes on the fly, as it restores them, and make it sparse again.

Sure, that is how it works (if the archive is saved as sparse file).

e100 · Feb 8, 2012

jollyscots said:
Here's the thing about that: The restore utility should be able to translate the zeroes on the fly, as it restores them, and make it sparse again.

Next time I go to think outside the box, Ill make sure to double check the internet and check how everyone else thinks.

Geez, im making a lot of friends round here!

The reason for the scan has been pointed out to you numerous times yet you still insist that it is not needed.
Maybe you are smarter that all the people who have worked on the standard tar format and code for the last 22 years, if so submit a patch to GNU and fix this problem.

It has already been established that using the sparse scanning is likely a waste of resources IF compression is being used.
The Proxmox developers have already made code changes twice to implement that change and I will test the 2nd set of changes later today.

I do thank you for adding to the conversation because you did bring this issue to the attention of others leading to the above noted potential improvement.
Your attitude is why you are not making friends, sadly this too has already been pointed out to you numerous times and apparently has fallen on deaf ears.

Years ago I worked as a mechanic and I was working on an old 1972 Pontiac Gran Prix owned by a very elderly woman who loved to "blow the doors off those young punks and their slow sports cars"
Myself and the other mechanics were joking around and this lady, who is older than my grandmother, tells me: "Everyone likes to get a piece of a$$ but no one likes a smart ass"

KVM machines very slow /unreachable during vmtar backup

Famous Member

New Member

Famous Member

Famous Member

Proxmox Staff Member

Proxmox Staff Member

Proxmox Staff Member

Proxmox Staff Member

Famous Member

New Member

Famous Member

Famous Member

Proxmox Staff Member

Proxmox Staff Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

Proxmox Staff Member

Famous Member

We value your privacy