Proxmox Backups - An option to not use HCP Snapshots

kofik

Member
Aug 5, 2011
34
1
8
As short addition HCP from R1Soft and its license terms where they write:

"Can I Redistribute Hot Copy as Part of a Product?

No you may not redistribute Hot Copy without permission. If you are interested in incorporating R1Soft Linux Snapshot technology in your product contact us for an OEM agreement."
(
http://www.r1soft.com/tools/linux-hot-copy/license/)

Which basically means that HCP can't be bundled with Proxmox as such - you can use it on Proxmox for sure but need to install it on your own. :)
 

udi

Member
Apr 1, 2011
73
0
6
yes, and nobody asked to do so.
just to customize vzdump to work together with hcp, if somebody wants to install and use it.
 

udi

Member
Apr 1, 2011
73
0
6
@marotori
i cannot start hcp, it says:
ERROR: please check that the device (253,13) is mounted!
ERROR: could not create new session for device:(253,13).

what's wrong?
 

marotori

Member
Jun 17, 2009
161
1
16
Have you run: hcp-setup --get-module ?

And also installed your kernel headers?


Sent from my iPhone using Tapatalk
 

udi

Member
Apr 1, 2011
73
0
6
yes, that went fine.

# hcp -m /var/ehh/ /dev/mapper/vgvirt-vm--110--disk--2
...
Starting Hot Copy: /dev/mapper/vgvirt-vm--110--disk--2.
hcp: an error occurred while starting the Hot Copy of device '/dev/mapper/vgvirt
-vm--110--disk--2', please check the system logs for further information.

and the error messages above are in the syslog.


# hcp -v

R1Soft Hot Copy 3.18.2 build 16285 (http://www.r1soft.com)
Documentation http://wiki.r1soft.com
Forums http://forum.r1soft.com

Thank you for using Hot Copy!
R1Soft makes the only Continuous Data Protection software for Linux.

hcp driver module: 4.2.1 build: 16433
 

marotori

Member
Jun 17, 2009
161
1
16
What does the error log say?

I would assume something in /var/log/messages




Sent from my iPhone using Tapatalk
 

marotori

Member
Jun 17, 2009
161
1
16
I think you may be trying to snapshot a physical machines volgroup?

I have never tried this.

Try snapshot the volgroup for /var/lib/vz


Sent from my iPhone using Tapatalk
 

udi

Member
Apr 1, 2011
73
0
6
yeah, i found it..
i use lvm volumes for kvm disks, and these volumes cannot be used by hcp.

however, if i change the vms to use raw file storage i'll be able to snapshot the volume containing them, and do the backup.
but that's tomorrow's job :)

thank you for all your help
u.
 

tux

Member
Jul 21, 2009
54
0
6
I start to use hcp, because vzdumps causes crashes. Today the system gets hung ups after some backups. So the load increase up to 90 and a system restart was necessary. Can somebody help me?

Apr 10 05:13:36 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76032654
Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046418
Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046417
Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046416
Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046415
Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046414
Apr 10 05:13:37 system-3-de kernel: EXT3-fs (hcp2): 49 orphan inodes deleted
Apr 10 05:13:37 system-3-de kernel: EXT3-fs (hcp2): recovery complete
Apr 10 05:13:41 system-3-de kernel: EXT3-fs (hcp2): mounted filesystem with writeback data mode
Apr 10 05:16:43 system-3-de kernel: INFO: task kjournald:200421 blocked for more than 120 seconds.
Apr 10 05:16:43 system-3-de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 10 05:16:43 system-3-de kernel: kjournald D ffff8809c8d14f10 0 200421 2 0 0x00000000
Apr 10 05:16:43 system-3-de kernel: ffff880951311c40 0000000000000046 0000000000000000 ffff88081e1dc040
Apr 10 05:16:43 system-3-de kernel: ffffffff81413560 000000000000f6c8 ffff880951311fd8 ffff880951311fd8
Apr 10 05:16:43 system-3-de kernel: ffff8809c8d14f10 ffff88041e6ab450 ffff8809c8d154c8 00000001283a9d07
Apr 10 05:16:43 system-3-de kernel: Call Trace:
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81413560>] ? dm_request+0x0/0x1a0
Apr 10 05:16:43 system-3-de kernel: [<ffffffffa06cf440>] ? __virt_request+0x0/0x400 [hcpdriver]
Apr 10 05:16:43 system-3-de kernel: [<ffffffffa06cd73c>] ? do_request+0x1c/0x30 [hcpdriver]
Apr 10 05:16:43 system-3-de kernel: [<ffffffffa06cd7d7>] ? generic_request+0x87/0xa0 [hcpdriver]
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81012d36>] ? read_tsc+0x16/0x40
Apr 10 05:16:43 system-3-de kernel: [<ffffffff810a21d3>] ? ktime_get_ts+0xb3/0xe0
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81012d36>] ? read_tsc+0x16/0x40
Apr 10 05:16:43 system-3-de kernel: [<ffffffff810a21d3>] ? ktime_get_ts+0xb3/0xe0
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81511207>] io_schedule+0x87/0xe0
Apr 10 05:16:43 system-3-de kernel: [<ffffffff811c60f5>] sync_buffer+0x45/0x50
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81511bd2>] __wait_on_bit+0x62/0x90
Apr 10 05:16:43 system-3-de kernel: [<ffffffff811c60b0>] ? sync_buffer+0x0/0x50
Apr 10 05:16:43 system-3-de kernel: [<ffffffff811c60b0>] ? sync_buffer+0x0/0x50
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81511c79>] out_of_line_wait_on_bit+0x79/0x90
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81097310>] ? wake_bit_function+0x0/0x50
Apr 10 05:16:43 system-3-de kernel: [<ffffffff811c60a6>] __wait_on_buffer+0x26/0x30
Apr 10 05:16:43 system-3-de kernel: [<ffffffffa00accfa>] journal_commit_transaction+0x6aa/0x1410 [jbd]
Apr 10 05:16:43 system-3-de kernel: [<ffffffff8107fd7b>] ? lock_timer_base+0x3b/0x70
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81080a4c>] ? try_to_del_timer_sync+0xac/0xe0
Apr 10 05:16:43 system-3-de kernel: [<ffffffffa00b2a3d>] kjournald+0xed/0x240 [jbd]
Apr 10 05:16:43 system-3-de kernel: [<ffffffff810972d0>] ? autoremove_wake_function+0x0/0x40
Apr 10 05:16:43 system-3-de kernel: [<ffffffffa00b2950>] ? kjournald+0x0/0x240 [jbd]
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81096ca6>] kthread+0x96/0xb0
Apr 10 05:16:43 system-3-de kernel: [<ffffffff8100c34a>] child_rip+0xa/0x20
Apr 10 05:16:43 system-3-de kernel: [<ffffffff81096c10>] ? kthread+0x0/0xb0
Apr 10 05:16:43 system-3-de kernel: [<ffffffff8100c340>] ? child_rip+0x0/0x20
Apr 10 05:18:43 system-3-de kernel: INFO: task kjournald:200421 blocked for more than 120 seconds.
Apr 10 05:18:43 system-3-de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 10 05:18:43 system-3-de kernel: kjournald D ffff8809c8d14f10 0 200421 2 0 0x00000000
Apr 10 05:18:43 system-3-de kernel: ffff880951311c40 0000000000000046 0000000000000000 ffff88081e1dc040
Apr 10 05:18:43 system-3-de kernel: ffffffff81413560 000000000000f6c8 ffff880951311fd8 ffff880951311fd8
Apr 10 05:18:43 system-3-de kernel: ffff8809c8d14f10 ffff88041e6ab450 ffff8809c8d154c8 00000001283a9d07
Apr 10 05:18:43 system-3-de kernel: Call Trace:
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81413560>] ? dm_request+0x0/0x1a0
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa06cf440>] ? __virt_request+0x0/0x400 [hcpdriver]
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa06cd73c>] ? do_request+0x1c/0x30 [hcpdriver]
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa06cd7d7>] ? generic_request+0x87/0xa0 [hcpdriver]
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81012d36>] ? read_tsc+0x16/0x40
Apr 10 05:18:43 system-3-de kernel: [<ffffffff810a21d3>] ? ktime_get_ts+0xb3/0xe0
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81012d36>] ? read_tsc+0x16/0x40
Apr 10 05:18:43 system-3-de kernel: [<ffffffff810a21d3>] ? ktime_get_ts+0xb3/0xe0
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81511207>] io_schedule+0x87/0xe0
Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c60f5>] sync_buffer+0x45/0x50
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81511bd2>] __wait_on_bit+0x62/0x90
Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c60b0>] ? sync_buffer+0x0/0x50
Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c60b0>] ? sync_buffer+0x0/0x50
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81511c79>] out_of_line_wait_on_bit+0x79/0x90
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81097310>] ? wake_bit_function+0x0/0x50
Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c60a6>] __wait_on_buffer+0x26/0x30
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00accfa>] journal_commit_transaction+0x6aa/0x1410 [jbd]
Apr 10 05:18:43 system-3-de kernel: [<ffffffff8107fd7b>] ? lock_timer_base+0x3b/0x70
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81080a4c>] ? try_to_del_timer_sync+0xac/0xe0
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00b2a3d>] kjournald+0xed/0x240 [jbd]
Apr 10 05:18:43 system-3-de kernel: [<ffffffff810972d0>] ? autoremove_wake_function+0x0/0x40
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00b2950>] ? kjournald+0x0/0x240 [jbd]
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81096ca6>] kthread+0x96/0xb0
Apr 10 05:18:43 system-3-de kernel: [<ffffffff8100c34a>] child_rip+0xa/0x20
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81096c10>] ? kthread+0x0/0xb0
Apr 10 05:18:43 system-3-de kernel: [<ffffffff8100c340>] ? child_rip+0x0/0x20
Apr 10 05:18:43 system-3-de kernel: INFO: task tar:200426 blocked for more than 120 seconds.
Apr 10 05:18:43 system-3-de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 10 05:18:43 system-3-de kernel: tar D ffff8809f0f769c0 0 200426 200367 0 0x00000000
Apr 10 05:18:43 system-3-de kernel: ffff880829887a78 0000000000000086 0000000000000000 ffff8808298879a8
Apr 10 05:18:43 system-3-de kernel: ffff880a1e5f7fc0 000000000000f6c8 ffff880829887fd8 ffff880829887fd8
Apr 10 05:18:43 system-3-de kernel: ffff8809f0f769c0 ffff88081e5a8b50 ffff8809f0f76f78 00000001283b2225
Apr 10 05:18:43 system-3-de kernel: Call Trace:
Apr 10 05:18:43 system-3-de kernel: [<ffffffff8100992d>] ? __switch_to+0x18d/0x320
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00ac235>] do_get_write_access+0x3f5/0x560 [jbd]
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81097310>] ? wake_bit_function+0x0/0x50
Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c566d>] ? __getblk+0x2d/0x300
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00ac531>] journal_get_write_access+0x31/0x50 [jbd]
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00e93e4>] __ext3_journal_get_write_access+0x34/0x70 [ext3]
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00cf403>] ext3_reserve_inode_write+0x93/0xb0 [ext3]
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00cf631>] ? ext3_dirty_inode+0x61/0xa0 [ext3]
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00cf48e>] ext3_mark_inode_dirty+0x6e/0xa0 [ext3]
Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00cf631>] ext3_dirty_inode+0x61/0xa0 [ext3]
Apr 10 05:18:43 system-3-de kernel: [<ffffffff811bc27a>] __mark_inode_dirty+0x3a/0x180
Apr 10 05:18:43 system-3-de kernel: [<ffffffff811ab7d9>] touch_atime+0x129/0x170
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81126f49>] generic_file_aio_read+0x319/0x790
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81190749>] do_sync_read+0xf9/0x140
Apr 10 05:18:43 system-3-de kernel: [<ffffffff810972d0>] ? autoremove_wake_function+0x0/0x40
Apr 10 05:18:43 system-3-de kernel: [<ffffffff81190ef8>] vfs_read+0xc8/0x1a0
Apr 10 05:18:43 system-3-de kernel: [<ffffffff811910d5>] sys_read+0x55/0x90
Apr 10 05:18:43 system-3-de kernel: [<ffffffff8100b2c2>] system_call_fastpath+0x16/0x1b
Apr 10 05:22:12 system-3-de kernel: hcp: INFO: stopping hcp session hcp2.
Apr 10 05:22:31 system-3-de kernel: hcp: INFO: hcp session hcp2 stopped.
Apr 10 05:22:34 system-3-de kernel: hcp: INFO: starting new session on device:(253,3)
.....
Apr 10 08:26:10 system-3-de kernel: hcp: ERROR: hcp_watchdog: could not get session_list_lock!
Apr 10 08:27:16 system-3-de kernel: hcp: ERROR: hcp_watchdog: could not get session_list_lock!
Apr 10 08:28:23 system-3-de kernel: hcp: ERROR: hcp_watchdog: could not get session_list_lock!
 

udi

Member
Apr 1, 2011
73
0
6
something similar happened to me when i accidentally started hcp two (to say the truth three) times simultaneously on the same mount point and it started to eat all the free space.

there's in your log that you stop hcp2, then i think there was an earlier snapshot on hcp1 that you did not remove.

except this, i didn't have any problems so far.
 

marotori

Member
Jun 17, 2009
161
1
16
I am working on a script called hcpdump that should be stable.

As soon as I have it working I will post it

Rob
 

marotori

Member
Jun 17, 2009
161
1
16
My script (v1 nearly ready)

is essentically a basic re-implimentation of vzdump

So...

hcpdump <all|vmid>

it then creats a dump that can be restored using the regular proxmox tools.

The biggest issues appear in my opinion to be related to LSI cards; using a sata drive. I have found loads of problems with people reporting similar problems under recent redhat/centos distros.

I currently am running some custom drivers for the card (with non official patches) to see if they improve performance.

I will know in a week or two if this actually fixes the problems :)

Rob
 

udi

Member
Apr 1, 2011
73
0
6
i have serveraid 8k and since i use hcp instead of lvm snapshot i did not have a server hang - earlier it happened 1-2 times a week.
 

marotori

Member
Jun 17, 2009
161
1
16
Assuming you are talking IBM serveraid. To my knowledge this is yet another LSI megaraid reband. (please correct me if I am wrong)

I two have had no problems until I overstressed the IO.. and suddenly the same issues occured again! Even with HCP!

This leads me to believe that their is still something fundamentally wrong - just HCP is 'lighter' than LVM and causes the issue to not crop up so much.

This article has a patch that I am trying

Bugfixing the in-kernel megaraid_sas driver, from crash to patch | Anchor Web Hosting Blog

I am applying this to the LSI driver source that is released by LSI.

Maybe.. it will work. Regardless.. the article in the link is an interesting read!

Rob
 

udi

Member
Apr 1, 2011
73
0
6
as i know serveraid is adaptec.

i had the lvremove issue since the first beta, now with hcp everything's fine since weeks, except one time when i accidentally started hcp more than 1 time on the same mount point, that leaded to hung.
 

tux

Member
Jul 21, 2009
54
0
6
I use:

Linux system-3-de 2.6.32-7-pve #1 SMP Mon Feb 13 07:33:21 CET 2012 x86_64 GNU/Linux

system-3-de:~# lsmod |grep arc
arcmsr 30638 2



I got hangs with hcp.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!