Proxmox Backups - An option to not use HCP Snapshots

Discussion in 'Proxmox VE: Installation and configuration' started by marotori, Mar 25, 2012.

  1. kofik

    kofik Member

    Joined:
    Aug 5, 2011
    Messages:
    34
    Likes Received:
    1
    As short addition HCP from R1Soft and its license terms where they write:

    "Can I Redistribute Hot Copy as Part of a Product?

    No you may not redistribute Hot Copy without permission. If you are interested in incorporating R1Soft Linux Snapshot technology in your product contact us for an OEM agreement."
    (
    http://www.r1soft.com/tools/linux-hot-copy/license/)

    Which basically means that HCP can't be bundled with Proxmox as such - you can use it on Proxmox for sure but need to install it on your own. :)
     
  2. udi

    udi Member

    Joined:
    Apr 1, 2011
    Messages:
    73
    Likes Received:
    0
    yes, and nobody asked to do so.
    just to customize vzdump to work together with hcp, if somebody wants to install and use it.
     
  3. udi

    udi Member

    Joined:
    Apr 1, 2011
    Messages:
    73
    Likes Received:
    0
    @marotori
    i cannot start hcp, it says:
    ERROR: please check that the device (253,13) is mounted!
    ERROR: could not create new session for device:(253,13).

    what's wrong?
     
  4. marotori

    marotori Member

    Joined:
    Jun 17, 2009
    Messages:
    161
    Likes Received:
    1
    Have you run: hcp-setup --get-module ?

    And also installed your kernel headers?


    Sent from my iPhone using Tapatalk
     
  5. udi

    udi Member

    Joined:
    Apr 1, 2011
    Messages:
    73
    Likes Received:
    0
    yes, that went fine.

    # hcp -m /var/ehh/ /dev/mapper/vgvirt-vm--110--disk--2
    ...
    Starting Hot Copy: /dev/mapper/vgvirt-vm--110--disk--2.
    hcp: an error occurred while starting the Hot Copy of device '/dev/mapper/vgvirt
    -vm--110--disk--2', please check the system logs for further information.

    and the error messages above are in the syslog.


    # hcp -v

    R1Soft Hot Copy 3.18.2 build 16285 (http://www.r1soft.com)
    Documentation http://wiki.r1soft.com
    Forums http://forum.r1soft.com

    Thank you for using Hot Copy!
    R1Soft makes the only Continuous Data Protection software for Linux.

    hcp driver module: 4.2.1 build: 16433
     
  6. marotori

    marotori Member

    Joined:
    Jun 17, 2009
    Messages:
    161
    Likes Received:
    1
    What does the error log say?

    I would assume something in /var/log/messages




    Sent from my iPhone using Tapatalk
     
  7. marotori

    marotori Member

    Joined:
    Jun 17, 2009
    Messages:
    161
    Likes Received:
    1
    I think you may be trying to snapshot a physical machines volgroup?

    I have never tried this.

    Try snapshot the volgroup for /var/lib/vz


    Sent from my iPhone using Tapatalk
     
  8. udi

    udi Member

    Joined:
    Apr 1, 2011
    Messages:
    73
    Likes Received:
    0
    yeah, i found it..
    i use lvm volumes for kvm disks, and these volumes cannot be used by hcp.

    however, if i change the vms to use raw file storage i'll be able to snapshot the volume containing them, and do the backup.
    but that's tomorrow's job :)

    thank you for all your help
    u.
     
  9. tux

    tux Member

    Joined:
    Jul 21, 2009
    Messages:
    54
    Likes Received:
    0
    I start to use hcp, because vzdumps causes crashes. Today the system gets hung ups after some backups. So the load increase up to 90 and a system restart was necessary. Can somebody help me?

    Apr 10 05:13:36 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76032654
    Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046418
    Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046417
    Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046416
    Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046415
    Apr 10 05:13:37 system-3-de kernel: ext3_orphan_cleanup: deleting unreferenced inode 76046414
    Apr 10 05:13:37 system-3-de kernel: EXT3-fs (hcp2): 49 orphan inodes deleted
    Apr 10 05:13:37 system-3-de kernel: EXT3-fs (hcp2): recovery complete
    Apr 10 05:13:41 system-3-de kernel: EXT3-fs (hcp2): mounted filesystem with writeback data mode
    Apr 10 05:16:43 system-3-de kernel: INFO: task kjournald:200421 blocked for more than 120 seconds.
    Apr 10 05:16:43 system-3-de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Apr 10 05:16:43 system-3-de kernel: kjournald D ffff8809c8d14f10 0 200421 2 0 0x00000000
    Apr 10 05:16:43 system-3-de kernel: ffff880951311c40 0000000000000046 0000000000000000 ffff88081e1dc040
    Apr 10 05:16:43 system-3-de kernel: ffffffff81413560 000000000000f6c8 ffff880951311fd8 ffff880951311fd8
    Apr 10 05:16:43 system-3-de kernel: ffff8809c8d14f10 ffff88041e6ab450 ffff8809c8d154c8 00000001283a9d07
    Apr 10 05:16:43 system-3-de kernel: Call Trace:
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81413560>] ? dm_request+0x0/0x1a0
    Apr 10 05:16:43 system-3-de kernel: [<ffffffffa06cf440>] ? __virt_request+0x0/0x400 [hcpdriver]
    Apr 10 05:16:43 system-3-de kernel: [<ffffffffa06cd73c>] ? do_request+0x1c/0x30 [hcpdriver]
    Apr 10 05:16:43 system-3-de kernel: [<ffffffffa06cd7d7>] ? generic_request+0x87/0xa0 [hcpdriver]
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81012d36>] ? read_tsc+0x16/0x40
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff810a21d3>] ? ktime_get_ts+0xb3/0xe0
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81012d36>] ? read_tsc+0x16/0x40
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff810a21d3>] ? ktime_get_ts+0xb3/0xe0
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81511207>] io_schedule+0x87/0xe0
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff811c60f5>] sync_buffer+0x45/0x50
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81511bd2>] __wait_on_bit+0x62/0x90
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff811c60b0>] ? sync_buffer+0x0/0x50
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff811c60b0>] ? sync_buffer+0x0/0x50
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81511c79>] out_of_line_wait_on_bit+0x79/0x90
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81097310>] ? wake_bit_function+0x0/0x50
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff811c60a6>] __wait_on_buffer+0x26/0x30
    Apr 10 05:16:43 system-3-de kernel: [<ffffffffa00accfa>] journal_commit_transaction+0x6aa/0x1410 [jbd]
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff8107fd7b>] ? lock_timer_base+0x3b/0x70
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81080a4c>] ? try_to_del_timer_sync+0xac/0xe0
    Apr 10 05:16:43 system-3-de kernel: [<ffffffffa00b2a3d>] kjournald+0xed/0x240 [jbd]
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff810972d0>] ? autoremove_wake_function+0x0/0x40
    Apr 10 05:16:43 system-3-de kernel: [<ffffffffa00b2950>] ? kjournald+0x0/0x240 [jbd]
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81096ca6>] kthread+0x96/0xb0
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff8100c34a>] child_rip+0xa/0x20
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff81096c10>] ? kthread+0x0/0xb0
    Apr 10 05:16:43 system-3-de kernel: [<ffffffff8100c340>] ? child_rip+0x0/0x20
    Apr 10 05:18:43 system-3-de kernel: INFO: task kjournald:200421 blocked for more than 120 seconds.
    Apr 10 05:18:43 system-3-de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Apr 10 05:18:43 system-3-de kernel: kjournald D ffff8809c8d14f10 0 200421 2 0 0x00000000
    Apr 10 05:18:43 system-3-de kernel: ffff880951311c40 0000000000000046 0000000000000000 ffff88081e1dc040
    Apr 10 05:18:43 system-3-de kernel: ffffffff81413560 000000000000f6c8 ffff880951311fd8 ffff880951311fd8
    Apr 10 05:18:43 system-3-de kernel: ffff8809c8d14f10 ffff88041e6ab450 ffff8809c8d154c8 00000001283a9d07
    Apr 10 05:18:43 system-3-de kernel: Call Trace:
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81413560>] ? dm_request+0x0/0x1a0
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa06cf440>] ? __virt_request+0x0/0x400 [hcpdriver]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa06cd73c>] ? do_request+0x1c/0x30 [hcpdriver]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa06cd7d7>] ? generic_request+0x87/0xa0 [hcpdriver]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81012d36>] ? read_tsc+0x16/0x40
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff810a21d3>] ? ktime_get_ts+0xb3/0xe0
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81012d36>] ? read_tsc+0x16/0x40
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff810a21d3>] ? ktime_get_ts+0xb3/0xe0
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81511207>] io_schedule+0x87/0xe0
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c60f5>] sync_buffer+0x45/0x50
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81511bd2>] __wait_on_bit+0x62/0x90
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c60b0>] ? sync_buffer+0x0/0x50
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c60b0>] ? sync_buffer+0x0/0x50
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81511c79>] out_of_line_wait_on_bit+0x79/0x90
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81097310>] ? wake_bit_function+0x0/0x50
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c60a6>] __wait_on_buffer+0x26/0x30
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00accfa>] journal_commit_transaction+0x6aa/0x1410 [jbd]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff8107fd7b>] ? lock_timer_base+0x3b/0x70
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81080a4c>] ? try_to_del_timer_sync+0xac/0xe0
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00b2a3d>] kjournald+0xed/0x240 [jbd]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff810972d0>] ? autoremove_wake_function+0x0/0x40
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00b2950>] ? kjournald+0x0/0x240 [jbd]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81096ca6>] kthread+0x96/0xb0
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff8100c34a>] child_rip+0xa/0x20
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81096c10>] ? kthread+0x0/0xb0
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff8100c340>] ? child_rip+0x0/0x20
    Apr 10 05:18:43 system-3-de kernel: INFO: task tar:200426 blocked for more than 120 seconds.
    Apr 10 05:18:43 system-3-de kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Apr 10 05:18:43 system-3-de kernel: tar D ffff8809f0f769c0 0 200426 200367 0 0x00000000
    Apr 10 05:18:43 system-3-de kernel: ffff880829887a78 0000000000000086 0000000000000000 ffff8808298879a8
    Apr 10 05:18:43 system-3-de kernel: ffff880a1e5f7fc0 000000000000f6c8 ffff880829887fd8 ffff880829887fd8
    Apr 10 05:18:43 system-3-de kernel: ffff8809f0f769c0 ffff88081e5a8b50 ffff8809f0f76f78 00000001283b2225
    Apr 10 05:18:43 system-3-de kernel: Call Trace:
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff8100992d>] ? __switch_to+0x18d/0x320
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00ac235>] do_get_write_access+0x3f5/0x560 [jbd]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81097310>] ? wake_bit_function+0x0/0x50
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff811c566d>] ? __getblk+0x2d/0x300
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00ac531>] journal_get_write_access+0x31/0x50 [jbd]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00e93e4>] __ext3_journal_get_write_access+0x34/0x70 [ext3]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00cf403>] ext3_reserve_inode_write+0x93/0xb0 [ext3]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00cf631>] ? ext3_dirty_inode+0x61/0xa0 [ext3]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00cf48e>] ext3_mark_inode_dirty+0x6e/0xa0 [ext3]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffffa00cf631>] ext3_dirty_inode+0x61/0xa0 [ext3]
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff811bc27a>] __mark_inode_dirty+0x3a/0x180
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff811ab7d9>] touch_atime+0x129/0x170
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81126f49>] generic_file_aio_read+0x319/0x790
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81190749>] do_sync_read+0xf9/0x140
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff810972d0>] ? autoremove_wake_function+0x0/0x40
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff81190ef8>] vfs_read+0xc8/0x1a0
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff811910d5>] sys_read+0x55/0x90
    Apr 10 05:18:43 system-3-de kernel: [<ffffffff8100b2c2>] system_call_fastpath+0x16/0x1b
    Apr 10 05:22:12 system-3-de kernel: hcp: INFO: stopping hcp session hcp2.
    Apr 10 05:22:31 system-3-de kernel: hcp: INFO: hcp session hcp2 stopped.
    Apr 10 05:22:34 system-3-de kernel: hcp: INFO: starting new session on device:(253,3)
    .....
    Apr 10 08:26:10 system-3-de kernel: hcp: ERROR: hcp_watchdog: could not get session_list_lock!
    Apr 10 08:27:16 system-3-de kernel: hcp: ERROR: hcp_watchdog: could not get session_list_lock!
    Apr 10 08:28:23 system-3-de kernel: hcp: ERROR: hcp_watchdog: could not get session_list_lock!
     
  10. udi

    udi Member

    Joined:
    Apr 1, 2011
    Messages:
    73
    Likes Received:
    0
    something similar happened to me when i accidentally started hcp two (to say the truth three) times simultaneously on the same mount point and it started to eat all the free space.

    there's in your log that you stop hcp2, then i think there was an earlier snapshot on hcp1 that you did not remove.

    except this, i didn't have any problems so far.
     
  11. marotori

    marotori Member

    Joined:
    Jun 17, 2009
    Messages:
    161
    Likes Received:
    1
    I am working on a script called hcpdump that should be stable.

    As soon as I have it working I will post it

    Rob
     
  12. tux

    tux Member

    Joined:
    Jul 21, 2009
    Messages:
    54
    Likes Received:
    0
    Do you have any more issues reported? What will your script do?
     
  13. marotori

    marotori Member

    Joined:
    Jun 17, 2009
    Messages:
    161
    Likes Received:
    1
    My script (v1 nearly ready)

    is essentically a basic re-implimentation of vzdump

    So...

    hcpdump <all|vmid>

    it then creats a dump that can be restored using the regular proxmox tools.

    The biggest issues appear in my opinion to be related to LSI cards; using a sata drive. I have found loads of problems with people reporting similar problems under recent redhat/centos distros.

    I currently am running some custom drivers for the card (with non official patches) to see if they improve performance.

    I will know in a week or two if this actually fixes the problems :)

    Rob
     
  14. udi

    udi Member

    Joined:
    Apr 1, 2011
    Messages:
    73
    Likes Received:
    0
    i have serveraid 8k and since i use hcp instead of lvm snapshot i did not have a server hang - earlier it happened 1-2 times a week.
     
  15. marotori

    marotori Member

    Joined:
    Jun 17, 2009
    Messages:
    161
    Likes Received:
    1
    Assuming you are talking IBM serveraid. To my knowledge this is yet another LSI megaraid reband. (please correct me if I am wrong)

    I two have had no problems until I overstressed the IO.. and suddenly the same issues occured again! Even with HCP!

    This leads me to believe that their is still something fundamentally wrong - just HCP is 'lighter' than LVM and causes the issue to not crop up so much.

    This article has a patch that I am trying

    Bugfixing the in-kernel megaraid_sas driver, from crash to patch | Anchor Web Hosting Blog

    I am applying this to the LSI driver source that is released by LSI.

    Maybe.. it will work. Regardless.. the article in the link is an interesting read!

    Rob
     
  16. udi

    udi Member

    Joined:
    Apr 1, 2011
    Messages:
    73
    Likes Received:
    0
    as i know serveraid is adaptec.

    i had the lvremove issue since the first beta, now with hcp everything's fine since weeks, except one time when i accidentally started hcp more than 1 time on the same mount point, that leaded to hung.
     
  17. marotori

    marotori Member

    Joined:
    Jun 17, 2009
    Messages:
    161
    Likes Received:
    1
  18. udi

    udi Member

    Joined:
    Apr 1, 2011
    Messages:
    73
    Likes Received:
    0
    # lsmod |grep mega

    nothing there

    # lsmod |grep aac
    aacraid 82047 7
     
  19. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,454
    Likes Received:
    310
    And everyone tested with the latest kernel?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  20. tux

    tux Member

    Joined:
    Jul 21, 2009
    Messages:
    54
    Likes Received:
    0
    I use:

    Linux system-3-de 2.6.32-7-pve #1 SMP Mon Feb 13 07:33:21 CET 2012 x86_64 GNU/Linux

    system-3-de:~# lsmod |grep arc
    arcmsr 30638 2



    I got hangs with hcp.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice