Migrate VM's to Ceph storage

optim · Aug 15, 2014

I know that the Ceph storage is relatively new, so I'm probably missing something basic, but I can't seem to figure out how to migrate my existing VM's into a new Ceph RBD storage cluster.

I've setup the Ceph storage as per the wiki, and I can create new VM's in the storage pool successfully. But if I select a VM that is sitting on a local (directory) storage, I am not able to select the Ceph RBD under "Move disk" in hardware settings. If I try and migrate the vm, I can pick the node, but not the storage to place it on. And I can't see a mount point anywhere for the Ceph pool (I think I need CephFS for that).

I still have to try and upload the RAW images under the content tab of the Ceph storage pool, then manipulate the .conf file of a dummy vm to reflect the uploaded image, but I'm waiting for qemu-img to finish converting first. Maybe I can VZDump the vm's and then restore to Ceph?

I just thought there might be an easier way of saying "migrate this VM from local folder to ceph pool" in one swoop. Again, I realize it is still early days in the Ceph addition, so please don't consider this criticism!

I did search the wiki and forum, but can't seem to find any real discussions on migrating existing vm's.

Thanks for any help/guidance.

Daniel

wahmed · Aug 16, 2014

optim said:
I know that the Ceph storage is relatively new, so I'm probably missing something basic, but I can't seem to figure out how to migrate my existing VM's into a new Ceph RBD storage cluster.

If you meant Ceph is new in Proxmox, yes it is. But Ceph itself has been around enough to be fully matured and enterprise environment ready.

Ceph RBD storage can only support RAW disk image. The VM you are trying to move is it anything other than RAW?

optim · Aug 16, 2014

Sorry for the confusion, but I meant Ceph was a new addition to Proxmox.

The disk image is in RAW format, after being converted from QCOW2.

I even created a new test vm of 4 GB in RAW format on a local drive and tried to migrate it to Ceph, but the Proxmox gui doesn't allow me to. For my home use I'm starting to question if I should stick with my ZFS exported across NFS software SAN. I can only dedicate about 3 spindles to the Ceph pool, and the ZFS has performed well with the L2 and ZIL cache.

Thanks for replying.

wahmed · Aug 16, 2014

optim said:
I can only dedicate about 3 spindles to the Ceph pool, and the ZFS has performed well with the L2 and ZIL cache.

What is the health of your Ceph cluster?

How many Proxmox nodes do you have ?Are all 3 spindles are in one node?

spirit · Aug 16, 2014

Hi,
This is really strange.

If you can create a new vm on the ceph storage, you should be able to move disk from any storage to the ceph storage directly.

Can you post your vm conf file and also /etc/pve/storage.cfg ?

(BTW, you don't need any mountpoint/cephfs to ceph in host, qemu connect directly to ceph through rbd protocol)

optim · Aug 16, 2014

Wasim/Spirit,

I figured it out. For refernce, here's the health of the pool:

Code:

root@pve-desktop:/mnt/migrate/images/103# ceph health
HEALTH_OK
root@pve-desktop:/mnt/migrate/images/103# ceph status
    cluster c7116adc-4c18-4523-992a-6a85df0985b2
     health HEALTH_OK
      monmap e5: 3 mons at  {0=192.168.10.14:6789/0,1=192.168.10.15:6789/0,2=192.168.10.16:6789/0},  election epoch 4942, quorum 0,1,2 0,1,2
     osdmap e175: 3 osds: 3 up, 3 in
      pgmap v1343: 192 pgs, 3 pools, 32535 MB data, 8339 objects
            95733 MB used, 8270 GB / 8364 GB avail
                 192 active+clean
  client io 19624 kB/s rd, 18756 kB/s wr, 24 op/s
root@pve-desktop:

As background, I have 3 nodes (2 x 32 GB, 1 x 16 GB, all 8 core, all connected via 4 x 1GB LAN LACP bonds), and each has an OSD setup on a 3TB disk.

To make a long story short, I had one node in the cluster with an IP address that was already on the network. When I would select the storage, the dropdown was probably timing out (I'm guessing its AJAX), so the storage pools were not showing. Once I corrected the IP address issue, and another related to one node having time drift, everything came alive. I am now in the process of migrating the VM's from a local storage definition to the Ceph pool.

I've also brushed up on my Ceph skills, so I'm going to give it a fair chance at running in my environment. If performance is good, I can shut down my software SAN which was serving VM's via NFS. Reducing the number of running systems at home is always good...

Thank you both for your help!

Daniel

optim · Aug 18, 2014

Just wanted to follow up on my Ceph trial.

Looks like Ceph is an amazingly robust piece of technology, but it doesn't perform as well as I'd hoped for small installations. I can see Ceph doing well with a decent number of drives (OSD's) thrown at it, but in my case of sharing 3 drives across 3 nodes it just doesn't perform as well as my ZFS NAS. Everything runs sluggish, and when I took a node offline to simulate recovery, the VM's would stutter and pause (timeout's shown in guest's dmesg). But even though it was slow, the recovery worked without a hitch and no data was ever lost. Migration of VM's also worked really well, with instantaneous switching of nodes.

I used same three drives in a 3-way mirror on Freenas with a 240 GB SSD (220GB L2ARC, 20 GB ZIL split), and they easily keep up to same pool of VM's over NFS. Plus I have the benefit of thin provisioning, ZFS snapshots and ZFS compression back.

I'm starting to play with the idea of using ZFS on Linux on one of the nodes to eliminate the need for Freenas altogether. Might be time to play around with that scenario.

Does anyone know when the ZFS functionality in Proxmox is slated for release?

Daniel

mir · Aug 18, 2014

ZFS should be officially released in the coming release of Proxmox. Everything is part of 3.2 but the GUI supporting creation is commented out in the current release.

A new thing in the next release will be the following:
1) Support creating thin provisioning
2) Support to enable/disable write cache
3) Support for target and host group

wahmed · Aug 18, 2014

optim said:
Just wanted to follow up on my Ceph trial.

Looks like Ceph is an amazingly robust piece of technology, but it doesn't perform as well as I'd hoped for small installations. I can see Ceph doing well with a decent number of drives (OSD's) thrown at it, but in my case of sharing 3 drives across 3 nodes it just doesn't perform as well as my ZFS NAS. Everything runs sluggish, and when I took a node offline to simulate recovery, the VM's would stutter and pause (timeout's shown in guest's dmesg). But even though it was slow, the recovery worked without a hitch and no data was ever lost. Migration of VM's also worked really well, with instantaneous switching of nodes.

I used same three drives in a 3-way mirror on Freenas with a 240 GB SSD (220GB L2ARC, 20 GB ZIL split), and they easily keep up to same pool of VM's over NFS. Plus I have the benefit of thin provisioning, ZFS snapshots and ZFS compression back.

Thats exactly how Ceph performs with very very small number of OSDs. Even doubling the number of OSDs to 6, 2 OSDs per node with Replica 2 will speed things up somewhat. There are ways to config Ceph further so that VMs does not slow down. But in your case unless your needs are going to grow significantly and level of redundancy requirement, ZFS may be the logical choice. ZFS and Ceph just cannot be compared side by side. Way too different. In a large Ceph cluster it can actually "outperforms" (ZFS lover dont hit me........) ZFS. When i say outperform i actually mean overall taking redundancy, cost, performance and managebility into consideration. In a very small environment ZFS will outperform Ceph every time.

mir · Aug 18, 2014

My rule of thump:

Storage nodes > 2 and disks per node >= 2 => Ceph

Storage nodes = 1 and disks per node >= 2 => ZFS

If money licenses is not a problem and storage nodes >= 2 and disks per node >= 2 I would consider Nexenta with cluster option or RFS-1 (http://www.high-availability.com/zfs-ha-plugin/) as a replacement for Ceph.

mir · Aug 18, 2014

Just a thought. Could it be an idea for a large Ceph cluster to use last say 6 of these: http://www.asus.com/Motherboards/E35M1I_DELUXE/
and add 4-6 SATA3 disks to each storage node? Cheap and low power usage.

mo_ · Aug 18, 2014

optim said:
Just wanted to follow up on my Ceph trial.

The timeouts you describe sound like your network couldnt handle the load plus you might not have separated ceph-cluster and VM networking (to prevent busy ceph recoveries being able to interfere with VM traffic).

What you have described as far as performance goes is expected behaviour though. Youre unlikely to get more than 60MB/s out of your small scale ceph installation if you run 1Gbit networking, and 100 (ish) MB/s for 2GB bonded networking (the loss is a result of the 802.3ad bonding algorhithm). Meaning that if you expect more than 100MB/s (which I find more than acceptable) you need to lay down some serious cash for 10GBE.

What you get is a storage that you just cant kill (unless the whole datacenter goes dark obviously) whereas ZFS has this glaring SPOF of only running on one system (unless you like giving money to Oracle).

mir · Aug 18, 2014

mo_ said:
than acceptable) you need to lay down some serious cash for 10GBE.

Or Infiniband. Comes cheap these days.

Just saw these on ebay: 2 x dual port 10GB 4 x PCIe for 30£
http://www.ebay.co.uk/itm/Voltaire-...Components_InterfaceCards&hash=item19d2d9e836

mo_ · Aug 18, 2014

Yea but thats ebay of all places... Most likely second hand or refurbished, most likely doesnt have any warranty, no vendor support plus you cant even properly file ebay purchases with accounting, theyd just laugh at you (well at least over here they would).

You really cant buy enterprise equipment off of Ebay I'm afraid.

Also those are just the HBAs. On top of those you need switches, which still (IB is outdated tech) are expensive and cables, which due to the high amounts of copper are also quite expensive afaik

Lastly, thats PCIe... don't you need PCIx for servers?

mir · Aug 18, 2014

You don't need switches since the driver supports cross-over connection.

PCIx? Is that not outdated? I don't recall seeing any server board for the last 3-5 years clamming PCIx suckets.

mo_ · Aug 18, 2014

Can you really daisy-chain IB HBAs together in a token-ring-like setup? I thought you could only have point to point connections...?

Search

Search

Migrate VM's to Ceph storage

optim

Member

wahmed

Famous Member

optim

Member

wahmed

Famous Member

spirit

Distinguished Member

optim

Member

optim

Member

mir

Famous Member

wahmed

Famous Member

mir

Famous Member

mir

Famous Member

mo_

Renowned Member

mir

Famous Member

mo_

Renowned Member

mir

Famous Member

mo_

Renowned Member

We value your privacy