Ceph + Proxmox HA

Vsadnik

New Member
Jun 23, 2014
9
0
1
Hi everyone, i try to setup HA on proxmox cluster, i used Ceph storage to store VM images. I setup ceph storage with pveceph(thx proxmox 3.2!). Now i have this configuration: 3 nodes in cluster, 3 ceph mons and 6 osd. I set up HA on my cluster with this config:
<?xml version="1.0"?>
<cluster config_version="18" name="Testclus">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="xxx.xxx.xxx.xxx" login="root" name="node-1" passwd="xxxxxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="xxx.xxx.xxx.xxx" login="root" name="node-2" passwd="xxxxxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="xxx.xxx.xxx.xxx" login="root" name="node-3" passwd="xxxxxx" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="node-1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node-1"/>
</method>
</fence>
</clusternode>
<clusternode name="node-2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node-2"/>
</method>
</fence>
</clusternode>
<clusternode name="node-3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="node-3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="100"/>
</rm>
</cluster>
Now when i try "fence_node" it return "success", VM move to another node, but don't start with error:
TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -name vm100 -smp 'sockets=1,cores=1' -nodefaults -boot 'menu=on' -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep -k en-us -m 1024 -cpuunits 1000 -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=rbd:datapool2/vm-100-disk-1:mon_host=10.2.58.2 10.2.58.3 10.2.58.4:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/ceph.keyring,if=none,id=drive-ide0,aio=native,cache=none' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge' -device 'e1000,mac=02:3F:02:F4:CE:91,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: got timeout
When I try to do this with Ceph storage build on other cluster and add to this, everything fine.
I think there is some problems with Ceph(because when third node became online after reboot VM start with no error). Can you help me?
Sorry for my bad english(
 
There are multiple possible reasons for the "got timeout" response you're seing:

- The cluster has lost quorum (<50% of monitors up)
- one (or more) of the OSDs are full (ceph health detail will tell)
(- the monitor IPs are wrong / not reachable, probably doesnt apply here)

uhm... well there might be more, but you may want to check for these
 
Maby this can help, this messages i have in log file when try ro start VM
2014-06-23 18:29:56.609595 osd.0 10.2.58.2:6805/3443 1 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.415961 secs
2014-06-23 18:29:56.609603 osd.0 10.2.58.2:6805/3443 2 : [WRN] slow request 30.415961 seconds old, received at 2014-06-23 18:29:26.193543: osd_op(client.7615.0:43006 rbd_data.186d2ae8944a.0000000000000e9d [set-alloc-hint object_size 4194304 write_size 4194304,write 3035136~4096] 4.2f9ea29f ack+ondisk+write e95) v4 currently reached pg
2014-06-23 18:30:26.615011 osd.0 10.2.58.2:6805/3443 3 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.421418 secs
2014-06-23 18:30:26.615022 osd.0 10.2.58.2:6805/3443 4 : [WRN] slow request 60.421418 seconds old, received at 2014-06-23 18:29:26.193543: osd_op(client.7615.0:43006 rbd_data.186d2ae8944a.0000000000000e9d [set-alloc-hint object_size 4194304 write_size 4194304,write 3035136~4096] 4.2f9ea29f ack+ondisk+write e95) v4 currently reached pg
2014-06-23 18:31:26.625711 osd.0 10.2.58.2:6805/3443 5 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.432114 secs
2014-06-23 18:31:26.625726 osd.0 10.2.58.2:6805/3443 6 : [WRN] slow request 120.432114 seconds old, received at 2014-06-23 18:29:26.193543: osd_op(client.7615.0:43006 rbd_data.186d2ae8944a.0000000000000e9d [set-alloc-hint object_size 4194304 write_size 4194304,write 3035136~4096] 4.2f9ea29f ack+ondisk+write e95) v4 currently reached pg
2014-06-23 18:31:37.990526 osd.4 10.2.58.4:6805/3496 1 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.418198 secs
2014-06-23 18:31:37.990533 osd.4 10.2.58.4:6805/3496 2 : [WRN] slow request 30.418198 seconds old, received at 2014-06-23 18:31:07.572261: osd_op(client.8697.0:1 vm-100-disk-1.rbd [stat] 4.7512de4b ack+read e99) v4 currently reached pg
 
according to the ceph docs:

Possible causes include:

A bad drive (check dmesg output)
A bug in the kernel file system bug (check dmesg output)
An overloaded cluster (check system load, iostat, etc.)
A bug in the ceph-osd daemon.
 
Hello, i checked dmesg output - everything ok, cluster have only 1 vm, and really not loaded, how i can find bug in ceph osd-daemon?
 
Thats the least likely cause. If you believe that to be the case you might need to gather as much details as possible and send it to the ceph mailing list / bugtracker. Before that you may want to check if you're running a recent enough version though. Please be advised that the ceph-server from proxmox is just a tech demo and as such not ready for production use quite yet (afaik). Your problem might also be related to that, (cant comment on that though).
 
Hi again, i find a problem: ceph wait ~5mins to start move pgs, when pgs start moving Vm start ok, now i try to find way to reduce this delay, any suggestions?
 
Last edited:
This delay is there because a ceph cluster doesnt need the data to be instantly moved when a node goes down because the others will continue to serve the data. The cluster will rebalance data after the timeout has been reached. If your ceph isn't continuing to serve data when one of the nodes goes down, theres most likely an underlying issue there.
 
my ceph is continuing serve data when one of th nodes go down(if i mount ceph storage to another server and put some data to it, i can reach this data when one node down), but vm on proxmox dont start properly. Maby there is some problem with connection proxmox+ceph
 
did you enter all the monitor addresses to proxmox? like so:

/etc/pve/storage.cfg said:
Code:
rbd: mycephcluster
       monhost 192.168.0.1:6789;192.168.0.2:6789;192.168.0.3:6789
       pool rbd  (optional, default =r rbd)
       username admin (optional, default = admin)
       content images

because if your proxmox only knows of 1 monitor and you take that one down for testing... well
 
Yes i did, there is my /etc/pve/storage.cfg:
rbd: ceph
monhost 10.2.58.2:6789; 10.2.58.3:6789; 10.2.58.4:6789; 10.2.58.5:6789;
pool testpool
content images
username admin

And i turn off second monitor in this list
 
hm... weird. Maybe you can try this:

get ceph.conf and ceph.client.admin.keyring from /etc/ceph of the ceph cluster (or ceph-deploy) and put it into /etc/ceph on the proxmox node.

With that you can use the ceph CLI from the proxmox node and see if that connects okay when you turn off .2

So you can turn off .2 and then run "ceph health" or "ceph -s" or something on another node.

you can skip the "copy stuff from/to /etc/ceph"-part if you are actually running ceph on your proxmox nodes
 
My ceph cluster runing on my proxmox cluster nodes, and i install and set up ceph cluster with proxmox commands like it say in this manual: http://pve.proxmox.com/wiki/Ceph_Server, when i turn off one of proxmox and ceph node vm move to other node but didn't start with error i show in my first post, but immediately after ceph confirms that node is dead, and start move data aсross other nodes vm start normal, VM don't wait before data spread to other nodes, VM start normal right after ceph confirm dead node. I can see it in ceph logs - ceph confirm osd's die, vm start. And yes, ceph connects okay when i turn off one node, other vm's on others nodes work fine
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!