Ceph + Proxmox HA

Vsadnik · Jun 23, 2014

Hi everyone, i try to setup HA on proxmox cluster, i used Ceph storage to store VM images. I setup ceph storage with pveceph(thx proxmox 3.2!). Now i have this configuration: 3 nodes in cluster, 3 ceph mons and 6 osd. I set up HA on my cluster with this config:

<?xml version="1.0"?>
<cluster config_version="18" name="Testclus">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="xxx.xxx.xxx.xxx" login="root" name="node-1" passwd="xxxxxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="xxx.xxx.xxx.xxx" login="root" name="node-2" passwd="xxxxxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="xxx.xxx.xxx.xxx" login="root" name="node-3" passwd="xxxxxx" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="node-1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node-1"/>
</method>
</fence>
</clusternode>
<clusternode name="node-2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node-2"/>
</method>
</fence>
</clusternode>
<clusternode name="node-3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="node-3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="100"/>
</rm>
</cluster>

Now when i try "fence_node" it return "success", VM move to another node, but don't start with error:

TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -name vm100 -smp 'sockets=1,cores=1' -nodefaults -boot 'menu=on' -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep -k en-us -m 1024 -cpuunits 1000 -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=rbd:datapool2/vm-100-disk-1:mon_host=10.2.58.2 10.2.58.3 10.2.58.4:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/ceph.keyring,if=none,id=drive-ide0,aio=native,cache=none' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge' -device 'e1000,mac=02:3F:02:F4:CE:91,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: got timeout

When I try to do this with Ceph storage build on other cluster and add to this, everything fine.
I think there is some problems with Ceph(because when third node became online after reboot VM start with no error). Can you help me?
Sorry for my bad english(

mo_ · Jun 23, 2014

There are multiple possible reasons for the "got timeout" response you're seing:

- The cluster has lost quorum (<50% of monitors up)
- one (or more) of the OSDs are full (ceph health detail will tell)
(- the monitor IPs are wrong / not reachable, probably doesnt apply here)

uhm... well there might be more, but you may want to check for these

Vsadnik · Jun 23, 2014

Maby this can help, this messages i have in log file when try ro start VM

2014-06-23 18:29:56.609595 osd.0 10.2.58.2:6805/3443 1 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.415961 secs
2014-06-23 18:29:56.609603 osd.0 10.2.58.2:6805/3443 2 : [WRN] slow request 30.415961 seconds old, received at 2014-06-23 18:29:26.193543: osd_op(client.7615.0:43006 rbd_data.186d2ae8944a.0000000000000e9d [set-alloc-hint object_size 4194304 write_size 4194304,write 3035136~4096] 4.2f9ea29f ack+ondisk+write e95) v4 currently reached pg
2014-06-23 18:30:26.615011 osd.0 10.2.58.2:6805/3443 3 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.421418 secs
2014-06-23 18:30:26.615022 osd.0 10.2.58.2:6805/3443 4 : [WRN] slow request 60.421418 seconds old, received at 2014-06-23 18:29:26.193543: osd_op(client.7615.0:43006 rbd_data.186d2ae8944a.0000000000000e9d [set-alloc-hint object_size 4194304 write_size 4194304,write 3035136~4096] 4.2f9ea29f ack+ondisk+write e95) v4 currently reached pg
2014-06-23 18:31:26.625711 osd.0 10.2.58.2:6805/3443 5 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.432114 secs
2014-06-23 18:31:26.625726 osd.0 10.2.58.2:6805/3443 6 : [WRN] slow request 120.432114 seconds old, received at 2014-06-23 18:29:26.193543: osd_op(client.7615.0:43006 rbd_data.186d2ae8944a.0000000000000e9d [set-alloc-hint object_size 4194304 write_size 4194304,write 3035136~4096] 4.2f9ea29f ack+ondisk+write e95) v4 currently reached pg
2014-06-23 18:31:37.990526 osd.4 10.2.58.4:6805/3496 1 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.418198 secs
2014-06-23 18:31:37.990533 osd.4 10.2.58.4:6805/3496 2 : [WRN] slow request 30.418198 seconds old, received at 2014-06-23 18:31:07.572261: osd_op(client.8697.0:1 vm-100-disk-1.rbd [stat] 4.7512de4b ack+read e99) v4 currently reached pg

mo_ · Jun 23, 2014

according to the ceph docs:

Possible causes include:

A bad drive (check dmesg output)
A bug in the kernel file system bug (check dmesg output)
An overloaded cluster (check system load, iostat, etc.)
A bug in the ceph-osd daemon.

Vsadnik · Jun 24, 2014

Hello, i checked dmesg output - everything ok, cluster have only 1 vm, and really not loaded, how i can find bug in ceph osd-daemon?

mo_ · Jun 25, 2014

Thats the least likely cause. If you believe that to be the case you might need to gather as much details as possible and send it to the ceph mailing list / bugtracker. Before that you may want to check if you're running a recent enough version though. Please be advised that the ceph-server from proxmox is just a tech demo and as such not ready for production use quite yet (afaik). Your problem might also be related to that, (cant comment on that though).

Vsadnik · Jun 30, 2014

Hi again, i find a problem: ceph wait ~5mins to start move pgs, when pgs start moving Vm start ok, now i try to find way to reduce this delay, any suggestions?

Vsadnik · Jun 30, 2014

Ok i resolve this problem, it fixed buy set this parametr: "mon osd down out interval" with 10 seconds(default 300), this doc helps https://ceph.com/docs/master/rados/configuration/mon-osd-interaction/

mo_ · Jun 30, 2014

This delay is there because a ceph cluster doesnt need the data to be instantly moved when a node goes down because the others will continue to serve the data. The cluster will rebalance data after the timeout has been reached. If your ceph isn't continuing to serve data when one of the nodes goes down, theres most likely an underlying issue there.

Vsadnik · Jun 30, 2014

my ceph is continuing serve data when one of th nodes go down(if i mount ceph storage to another server and put some data to it, i can reach this data when one node down), but vm on proxmox dont start properly. Maby there is some problem with connection proxmox+ceph

mo_ · Jun 30, 2014

did you enter all the monitor addresses to proxmox? like so:

/etc/pve/storage.cfg said:

Code:

rbd: mycephcluster
       monhost 192.168.0.1:6789;192.168.0.2:6789;192.168.0.3:6789
       pool rbd  (optional, default =r rbd)
       username admin (optional, default = admin)
       content images

because if your proxmox only knows of 1 monitor and you take that one down for testing... well

Vsadnik · Jul 1, 2014

Yes i did, there is my /etc/pve/storage.cfg:
rbd: ceph
monhost 10.2.58.2:6789; 10.2.58.3:6789; 10.2.58.4:6789; 10.2.58.5:6789;
pool testpool
content images
username admin

And i turn off second monitor in this list

mo_ · Jul 1, 2014

hm... weird. Maybe you can try this:

get ceph.conf and ceph.client.admin.keyring from /etc/ceph of the ceph cluster (or ceph-deploy) and put it into /etc/ceph on the proxmox node.

With that you can use the ceph CLI from the proxmox node and see if that connects okay when you turn off .2

So you can turn off .2 and then run "ceph health" or "ceph -s" or something on another node.

you can skip the "copy stuff from/to /etc/ceph"-part if you are actually running ceph on your proxmox nodes

Vsadnik · Jul 1, 2014

My ceph cluster runing on my proxmox cluster nodes, and i install and set up ceph cluster with proxmox commands like it say in this manual: http://pve.proxmox.com/wiki/Ceph_Server, when i turn off one of proxmox and ceph node vm move to other node but didn't start with error i show in my first post, but immediately after ceph confirms that node is dead, and start move data aсross other nodes vm start normal, VM don't wait before data spread to other nodes, VM start normal right after ceph confirm dead node. I can see it in ceph logs - ceph confirm osd's die, vm start. And yes, ceph connects okay when i turn off one node, other vm's on others nodes work fine

Vsadnik · Jul 1, 2014

if it helps i can show you ceph or proxmox log messages when i do this

Search

Search

Ceph + Proxmox HA

Vsadnik

New Member

mo_

Active Member

Vsadnik

New Member

mo_

Active Member

Vsadnik

New Member

mo_

Active Member

Vsadnik

New Member

Vsadnik

New Member

mo_

Active Member

Vsadnik

New Member

mo_

Active Member

Vsadnik

New Member

mo_

Active Member

Vsadnik

New Member

Vsadnik

New Member