1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

VZDump with flashcache on top of LVM

Discussion in 'Proxmox VE: Installation and configuration' started by check-ict, Oct 8, 2012.

  1. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    77
    Likes Received:
    0
    Hello,

    I use flashcache to speed up my storage. It works great, however I noticed that VZDump can't make a LVM snapshot.

    This is my setup:
    MDADM 4x 2TB eco disk in RAID5 = /dev/md0
    MDADM 2x 160GB SSD in RAID1 = /dev/md1

    Created a LVM VG "PVE" on /dev/md0
    Created a LVM LV "DATA" = /dev/mapper/pve-data
    Created a ext3 filesystem on /dev/mapper/pve-data
    (till now it works if I use snapshot back-ups)

    Now I setup flashcache
    flashcache -p thru ssd /dev/md1 /dev/md0
    Flashcache is loaded and a new disk name appears = /dev/mapper/ssd

    If I mount /dev/mapper/ssd it won't let me create snapshots. If I create a snapshot on /dev/mapper/pve-data (even when SSD is mounted) it works.

    Is there some way I can trigger vzdump to use /dev/mapper/pve-data (the underlying LVM volume of flashcache)?
     
  2. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,210
    Likes Received:
    13
    Would you please share how you went about compiling flashcache for the Proxmox kernel?
    I may have some SSDs and time to test this myself in the next few weeks.

    This is just a guess but I think you need to do things more like this:
    Before creating the flashcache device you should unmount /var/lib/vz (/dev/mapper/pve-data)
    Next you need to edit lvm.conf and create a filter telling LVM to ignore the underlying device (/dev/md0)
    Something like this should work:
    Code:
    filter = [ "r|/dev/md0|", "a/.*/" ] 
    
    Now create the flash cache device:
    Code:
    flashcache_create -p thru ssd /dev/md1 /dev/md0
    
    run pvscan
    Verify that the pve group is only showing up on the flash cache device and not the underlying device.
    Now you can mount /var/lib/vz (/dev/mapper/pve-data)
    I would also expect that the snapshots in vzdump will work ok now.
     
    #2 e100, Oct 13, 2012
    Last edited: Oct 13, 2012
  3. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    77
    Likes Received:
    0
    Hello e100

    This is my history:
    43 apt-get install dkms build-essential git
    44 git clone git://github.com/facebook/flashcache.git
    45 ls
    46 cd flashcache/
    47 ls
    48 make
    49 make install
    50 modprobe flashcache

    After this, you can create a flashcache device:
    flashcache_create -p thru ssd /dev/md1 /dev/mapper/pve-data
    ssd = just a name
    /dev/md1 = SSD (mine in raid1 for security)
    /dev/mapper/pve-data = the slow disk storage
    -p thru = writethrough

    After this you can mount /dev/mapper/ssd to /var/lib/vz

    You can remove flashcache by doing:
    umount /var/lib/vz
    dmsetup remove ssd
    mount /dev/mapper/pve-data /var/lib/vz
    Now you have all data again but without flashcache.

    Writeback is also possible, but there are a few things to know.
    flashcache_create -p back ssd /dev/md1 /dev/mapper/pve-data
    mount /dev/mapper/ssd /var/lib/vz
    umount /var/lib/vz
    dmsetup remove ssd (can take a while, all dirty data gets written to disk)
    -flashcache_destroy /dev/md1 (if you want to remove writeback ssd)
    -flashcache_load /dev/md1 (if you want to restore the config again)

    Writeback has a problem when rebooting if MDADM/software raid is used. I gives a kernel panic when trying to reboot or shutdown. You can use a init script at shutdown to resolve this:

    Code:
    # Start or stop Flashcache
    
    ### BEGIN INIT INFO
    # Provides:          flashcache
    # Default-Start:     2 3 4 5
    # Default-Stop:      0 1 6
    # Short-Description: Flashcache SSD caching
    # Description:       Flashcache SSD caching
    ### END INIT INFO
    
    PATH=/bin:/usr/bin:/sbin:/usr/sbin
    
    flashcache_start() {
    if df -h | grep /var/lib/vz > /dev/null
    then
    echo "Flashcache allready running"
    else
    flashcache_load /dev/md1
    mount /dev/mapper/ssd /var/lib/vz
    echo 1 > /proc/sys/dev/flashcache/md1+pve-data/fast_remove
    echo "Flashcache started"
    fi
    }
    
    flashcache_stop() {
    if df -h | grep /var/lib/vz > /dev/null
    then
    umount /var/lib/vz
    dmsetup remove ssd
    echo "Flashcache stopped"
    else
    echo "Flashcache not running"
    fi
    }
    
    
    case "$1" in
        start)
    flashcache_start
        ;;
    
        stop)
    flashcache_stop
        ;;
    
        restart)
            $0 stop
            $0 start
        ;;
    esac
    
    exit 0
     
  4. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,210
    Likes Received:
    13
    Seems pretty simple.

    Now that I have a better understanding of how this works I edited my post above adding the proper flashcache_create command to make snapshots possible.

    For snapshots to work the whole physical volume must be on the flashcache.
    Before flashcache pvscan should output something like this:
    Code:
    PV /dev/md0   VG pve             lvm2
    
    After flashcache pvscan should output something like this:
    Code:
    PV /dev/mapper/ssd   VG pve             lvm2
    
    That is why it is necessary to set the filter in lvm.conf so pvscan sees the volume at /dev/mapper/ssd and not /dev/md0
    You would still mount /dev/mapper/pve-data to /var/lib/vz.
    The difference is that /dev/mapper/pve-data is inside the pve physical volume that is on /dev/mapper/ssd instead of /dev/md0

    If your system is like a typical Proxmox setup it will not be possible to do this since pve-root and pve-swap are also on the pve volume unless you can get the flashcache setup applied on bootup before any volumes are mounted.
     
  5. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    77
    Likes Received:
    0
    Flashcache on bootup is too risky, so I will reinstall my server with 2 MDADM RAID sets, 1 for OS and other for data.

    Thanks for the info, I will try it next week.
     
  6. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,125
    Likes Received:
    68
    Hi e100,
    do you have any performance-info between with and without flashcache (with HW-raid)?
    Perhaps pveperf or more??

    Udo
     
  7. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    77
    Likes Received:
    0
    Without Flashcache bonnie++ gives me 400 IOps. With Flashcache, it gives between 5000 and 10000 IOps. Depends on the SSD speed and RAID or single SSD.

    Sequential write speed is sometimes slower because of the RAID penalty on the SSD's. Sequential read is very fast, specially if it's cached on the SSD. Bonnie++ gives nice results.
     
  8. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,125
    Likes Received:
    68
    Hi,
    that's sound good. I guess that facebook used this on many servers, so it should be production-safe, or not??

    Udo
     
  9. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,210
    Likes Received:
    13
    Not yet.
    *If* a project I quoted gets approved I will have dozens of SSD disks and some areca 1882ix-24 controllers to play with for a couple of days.
    I will try to get some benchmarks with and without flashcache on top of hardware raid.

    What sort of benchmarks do you suggest?

    Do you think DRBD would be happy using a flash cache device as the underlying device?
    Maybe I can test that too if this project gets approved.
     
  10. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,210
    Likes Received:
    13
  11. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    15,030
    Likes Received:
    142
  12. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,210
    Likes Received:
    13
    https://raw.github.com/facebook/flashcache/master/doc/flashcache-doc.txt
    "It is important to note that in the first cut, cache writes are non-atomic, ie, the "Torn Page Problem" exists. In the event of a power failure or a failed write, part of the block could be written, resulting in a partial write. We have ideas on how to fix this and provide atomic cache writes (see the Futures section)."

    Maybe you uncover some corner case in your usage that facebook does not have.

    Looks like using DRBD8.4 would be preferred to get the best performance from flashcache and DRBD, see video dietmar found.

    There seem to be a number of issues logged:
    https://github.com/facebook/flashcache/issues
     
  13. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    77
    Likes Received:
    0
    Flashcache with writethrough has passed all my tests (power failures, RAID degration/rebuild etc.). Writeback however is very risky, it corrupts data easily and has problems with rebooting on Debian 5/6 and Ubuntu 10.04 LTS (12.04 works fine out of the box).

    So I use writethrough for production use now and still experimenting with writeback. With some tweaks writeback is very stable, however it's allways a higher risk offcourse.
     
  14. mir

    mir Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 14, 2012
    Messages:
    3,314
    Likes Received:
    74
    Another interesting solution which is part of the mainline kernel tree is bcache. bcache serves the same purpose as flashcache but been part of the mainline kernel tree is a big advantage. Jonathan Corbet has written about bcache in May of this year: http://lwn.net/Articles/497024/

    Edit: Forgot link to the wiki: http://bcache.evilpiepirate.org/
     
    #14 mir, Oct 14, 2012
    Last edited: Oct 14, 2012
  15. check-ict

    check-ict Member

    Joined:
    Apr 19, 2011
    Messages:
    77
    Likes Received:
    0
    Hello E100,

    LVM is working with flashcache now. It can now create snapshots when the pvscan shows /dev/mapper/ssd.

    However I found a problem...
    How can I make sure my flashcache script shuts down after Proxmox forced all VM's to shutdown?

    If I don't unmount Flashcache, it will give a kernel panic when trying to restart/shutdown. Only Ubuntu 12.04 is able to reboot/shutdown without any script.

    My script:
    Code:
    #!/bin/sh
    
    # Start or stop Flashcache
    
    ### BEGIN INIT INFO
    # Provides:          flashcache
    # Required-Start:
    # Required-Stop:     $remote_fs $network pvedaemon
    # Default-Start:     2 3 4 5
    # Default-Stop:      0 1 6
    # Short-Description: Flashcache SSD caching
    # Description:       Flashcache SSD caching
    ### END INIT INFO
    
    PATH=/bin:/usr/bin:/sbin:/usr/sbin
    
    flashcache_start() {
    if df -h | grep /var/lib/vz > /dev/null
    then
    echo "Flashcache allready running"
    else
    flashcache_load /dev/md2
    mount /dev/mapper/pve-data /var/lib/vz
    mount /dev/mapper/pve-backup /mnt/backup
    echo 1 > /proc/sys/dev/flashcache/md2+md1/fast_remove
    echo "Flashcache started"
    fi
    }
    
    flashcache_stop() {
    if df -h | grep /var/lib/vz > /dev/null
    then
    umount /mnt/backup
    umount /var/lib/vz
    dmsetup remove ssd
    echo "Flashcache stopped"
    else
    echo "Flashcache not running"
    fi
    }
    
    
    case "$1" in
        start)
    flashcache_start
        ;;
    
        stop)
    flashcache_stop
        ;;
    
        restart)
            $0 stop
            $0 start
        ;;
    esac
    
    exit 0
    
    
    I want to keep Flashcache as simple as possible, so I hope it can all work without too many modifications.
     
    #15 check-ict, Oct 21, 2012
    Last edited: Oct 21, 2012
  16. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,210
    Likes Received:
    13
    You need to adjust the order of when things start and stop by editing the data in the init script.
    Likely adding things to these lines is what you need to do:
    # Required-Start:
    # Required-Stop: $remote_fs $network pvedaemon

    Then use update-rc.d to remove the old data and run it again setting the defaults.

    I noticed you edited your post, originally you mentioned some LVM problems.
    I assume you fixed those, if not and for the benefit of other readers, you need to set a filter in lvm.conf telling lvm to not look at the underlying device.
    That way lvm will only see the volume on the flashcache device and everything will then work well.
     
  17. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,210
    Likes Received:
    13
    I did some benchmarks using a Crucial M4 256GB SSD and an Areca 1880 with 6 WD RE4 disks in RAID5.
    flashcache in that configuration was slower than using the Areca array directly.

    I also tested using the same areca array but with a ram disk for the flashcache device.
    This was to test the overhead of flash cache since RAM is the fastest SSD that is possible.
    The overhead of flashcache is significant, my writes were much slower with flashcache even with write around mode.

    It is my opinion, based on a few hours of benchmarks, that flashcache is not worth the hassle if you already have a good hardware raid controller.

    Flashcache seems more suited for mostly read heavy applications.

    --The following is speculation since I lack the necessary hardware to test---

    I estimate that from a performance standpoint this is likely to be true:
    A PCIe SSD + mdadm + few mechanical disks + flashcache == few mechanical disks + good RAID controller with BBU

    flashcache would have an advantage of more cache vs the raid card, likely a longer burst of peak random IOPS.
    Raid card would have an advantage of availability/reliability/ease of use.
    PCIe SSD are costly so I suspect that the cost would be roughly identical.
     
  18. gkovacs

    gkovacs Member

    Joined:
    Dec 22, 2008
    Messages:
    463
    Likes Received:
    22
    Something is not right with flashcache. According to this test done on both RAID10 and RAID0 arrays, bcache is vastly superior in all use cases.
    There are many times when flashcache is actually slower than the un-cached RAID array:
    http://www.accelcloud.com/2012/04/18/linux-flashcache-and-bcache-performance-testing/

    As flashcache has probably matured since last April, someone should rerun these benchmarks, preferably on a Proxmox VE server, from under a VM.

    Also it would be interesting to see if bcache can be loaded as a module into the Proxmox kernel.
     
    #18 gkovacs, Jun 17, 2013
    Last edited: Jun 17, 2013

Share This Page