Proxmox 1.4 beta2: vzdump freez machines ....

valshare

Renowned Member
Jun 2, 2009
257
2
83
Germany
Hello,

now my backup problems are solved, new problems arrived. I have 2 time critical kvm machines running. If i make a backup with vzdump --snapshot the machine will freeze for 300 seconds. In this time, cirital data will become a time out. How can i reach, that there is no timeout if the critical machines are running?

The machines are stored on an iscsi storage.

Regards, Valle
 
There should not be any timeout in -snapshot mode - please can you post the backup logs?

The Logfile looks normal for me

Code:
Oct 17 16:32:02 INFO: Starting Backup of VM 112 (qemu)
Oct 17 16:32:02 INFO: running
Oct 17 16:32:02 INFO: status = running
Oct 17 16:32:02 INFO: backup mode: snapshot
Oct 17 16:32:02 INFO: bandwidth limit: 10240 KB/s
Oct 17 16:32:04 INFO:   Logical volume "vzsnap-proxmox01-0" created
Oct 17 16:32:05 INFO: creating archive '/mnt/pve/backup/vzdump-qemu-112.tgz'
Oct 17 16:32:05 INFO: adding '/mnt/pve/backup/vzdump-qemu-112.tmp/qemu-server.conf' to archive ('qemu-server.conf')
Oct 17 16:32:05 INFO: adding '/mnt/vzsnap0/images/112/vm-112-disk-1.raw' to archive ('vm-disk-ide0.raw')
Oct 17 16:44:09 INFO: Total bytes written: 8274932736 (10.90 MiB/s)
Oct 17 16:44:12 INFO: archive file size: 3.03GB
Oct 17 16:44:14 INFO:   Logical volume "vzsnap-proxmox01-0" successfully removed
Oct 17 16:44:14 INFO: Finished Backup of VM 112 (00:12:12)
 
I have done 3 tests..

1. Ping the machine ... didn response if the backup starts an the snapshot creates. Machine answer if the snapshot is done

2. There are 10 serial ports, that become every second data from subsystems. The ports becomes errors / timeouts if the snapshot will create

3. VNC from Proxmox didnt response ... after the snapshot is done, vnc will function again.

I have setup yesterday the test iscsi storage from blockdevice 4096 to 512. I will test it today that the snapshot will create without timeouts.
 
Last edited:
Think, this is a problem of the blocksize and qmigrate can´t handle it. Can this be? I can create a second volume on the thecus with blocksize of 4096, if you want.

If i have a volume with a blocksize greater than 512, all KVMs on this partition works fine. Only vzdump and qmigrate makes trouble.
 
at the moment not with blocksize of 512. With blocksize greater than 512, i have get I/O errors.

So there is at least something strange - please can you contact the storage vendor and ask what those I/O errors are - is it a known bug?
 
i can´t test openfiler at the moment because i am on holiday. I didnt have access from here to the openfiler box only to the TheCus Storage.

The i/o errors comes only, if i make a vzdump from an kvm on the TheCus Storage AND! the blocksize is greater than 512. The there are many login/logoff messages from the proxmox server on the TheCus Storage.

But in the normal working mode all installed kvms works normal and didn´t have any I/O errors. The System ist fast and response direct.

I have now created a second storage with blocksize of 4096. I will do some tests with it.
 
Ok. createt a new partition on the TheCus Storage with blocksize of 4096. Now the Login/Logoff messages on the I4500r comes again an the I/O errors on Kernel log on the Proxmox Server.

Code:
proxmox02:~# vzdump --snapshot --compress 9999
INFO: starting new backup job: vzdump --snapshot --compress 9999
INFO: Starting Backup of VM 9999 (qemu)
INFO: stopped
INFO: status = stopped
INFO: backup mode: stop
INFO: bandwidth limit: 10240 KB/s
INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-9999.tgz'
INFO: adding '/var/lib/vz/dump/vzdump-qemu-9999.tmp/qemu-server.conf' to archive ('qemu-server.conf')
INFO: adding '/dev/pve-i4500r-test/vm-9999-disk-1' to archive ('vm-disk-ide0.raw')
ERROR: Backup of VM 9999 failed - interrupted by signal
and syslog says:

Code:
Oct 19 19:19:30 proxmox02 kernel: sd 3:0:0:1: [sdc] 26222592 4096-byte hardware sectors (107408 MB)
Oct 19 19:19:30 proxmox02 kernel: sd 3:0:0:1: [sdc] Write Protect is off
Oct 19 19:19:30 proxmox02 kernel: sd 3:0:0:1: [sdc] Mode Sense: bb 00 00 00
Oct 19 19:19:30 proxmox02 kernel: sd 3:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 19 19:19:48 proxmox02 kernel: connection1:0: iscsi: detected conn error (1011)
Oct 19 19:19:50 proxmox02 kernel: iscsi: host reset succeeded
Oct 19 19:20:30 proxmox02 kernel: connection1:0: iscsi: detected conn error (1011)
Oct 19 19:20:33 proxmox02 kernel: iscsi: host reset succeeded
Oct 19 19:21:13 proxmox02 kernel: connection1:0: iscsi: detected conn error (1011)
Oct 19 19:21:17 proxmox02 kernel: iscsi: host reset succeeded
Oct 19 19:21:57 proxmox02 kernel: connection1:0: iscsi: detected conn error (1011)
Oct 19 19:22:00 proxmox02 kernel: iscsi: host reset succeeded
Oct 19 19:22:40 proxmox02 kernel: connection1:0: iscsi: detected conn error (1011)
Oct 19 19:22:43 proxmox02 kernel: iscsi: host reset succeeded
Oct 19 19:23:23 proxmox02 kernel: connection1:0: iscsi: detected conn error (1011)
Oct 19 19:23:26 proxmox02 kernel: iscsi: host reset succeeded
Oct 19 19:23:36 proxmox02 kernel: sd 3:0:0:1: [sdc] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
Oct 19 19:23:36 proxmox02 kernel: end_request: I/O error, dev sdc, sector 2025192
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253085
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253086
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253087
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253088
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253089
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253090
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253091
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253092
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253093
Oct 19 19:23:36 proxmox02 kernel: Buffer I/O error on device dm-6, logical block 253094

Backup never finished and i interrupt it.

Ok, then i done a copy of the i4500r-test/vm-9999-disk-1 device with the command dd

Code:
proxmox02:~# dd if=/dev/pve-i4500r-test/vm-9999-disk-1 of=/var/lib/vz/dump/test bs=4096
2099200+0 records in
2099200+0 records out
8598323200 bytes (8.6 GB) copied, 173.671 s, 49.5 MB/s
proxmox02:~#
No Kernel message in the copy time! All went fine. So i give up testing and think this is NOT a problem from the ISCSI Storage! I am frustrated now.

Regards, Valle
 
So i give up testing and think this is NOT a problem from the ISCSI Storage! I am frustrated now.

Again, we need to track down the problem. Please test with openfiler when you are back from holidays. I will also do some tests with block size 4096 using SCST iSCSI target.
 
Hi, sorry i am not frustrated about proxmox. I am frustrated about the situation. Sorry if i didn´t write it clear. I have now remove access to the openfile box and will do some tests tonight.
 
Does the problem occur when you do a offline backup (when the VM is stopped)? Or does it relate to 'snapshot' mode?
 
Does the problem occur when you do a offline backup (when the VM is stopped)? Or does it relate to 'snapshot' mode?


Offline and Snapshot .... that makes no difference.

I have checked Openfiler. But looks that openfiler only handles blocksize of 512. There are no problems. I think about it, to change the blocksize of the TheCus to 512 for all iscsi partitions. But that will worse the performance.
 
Last edited: