Proxmox4-HA-not-working...Feedback

badji

Renowned Member
Jan 14, 2011
236
35
93
I try to use the new HA, but it never work completely.the vm, never migrate to the another host...
1-HA-vm1-proxmox4.0.png

2-Status-HA-proxmox4.png

3-HA-vm-freeze.png

thank's.
 
It works ;)
As you see your service is freezed, this means you made an graceful shutdown for pve-ceph1. This doesn't not migrates any VM or fences the node as it is not a fault!
A graceful shutdown stops all services gracefully and tells the cluster that it went down by itself and planned by freezing its services. You need to migrate the services by hand.

This is clearly documented behaviour, see http://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#HA_Cluster_Maintenance_.28node_reboots.29

btw. when adding screenshots it would be helpful to set the language to english, this helps others to understand you and you normally get a faster response.
 
It works ;)
As you see your service is freezed, this means you made an graceful shutdown for pve-ceph1. This doesn't not migrates any VM or fences the node as it is not a fault!
A graceful shutdown stops all services gracefully and tells the cluster that it went down by itself and planned by freezing its services. You need to migrate the services by hand.

This is clearly documented behaviour, see http://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#HA_Cluster_Maintenance_.28node_reboots.29

btw. when adding screenshots it would be helpful to set the language to english, this helps others to understand you and you normally get a faster response.

Thank you for these explanations.
I'll retest , but there is something that escapes me, as GILOU elsewhere :-) !!!

The fact that the VM is managed by the HA , it will not be migrated to another server as server1 is off?

Sorry for french-screenshots :-)

Merci.
 
Try to pull the network plug, that should be a good test. Then after 120 seconds the services will be migrated, if not please tell us.

I'll quote myself from a mailing list post:
We already suspected that there would be some outcries on graceful shutdowns, and it's also a little bit an opinion thing if the service should migrate.
But as an graceful shutdown is by no ways an fault, we defined that the HA stack shouldn't interfere here. Also making an shutdon and waiting for the HA Manager would cost you atleast 120 seconds downtime, but live migrating them before only some milliseconds... what is better?

It could generate also some problems if at a time all services get migrated and restarted, and no one wants a out of control feedback loop.

No, at the moment we defined it that way that graceful shutdowns don't trigger an migration. HA should trigger on faults not on planned action like maintenance.
 
Try to pull the network plug, that should be a good test. Then after 120 seconds the services will be migrated, if not please tell us.

I'll quote myself from a mailing list post:


No, at the moment we defined it that way that graceful shutdowns don't trigger an migration. HA should trigger on faults not on planned action like maintenance.

120 seconds after no migration . I had to reboot my server and then restarts the vm also on the same server . So the migration of vm does not work, even after 120 seconds . The network works well , bridge and bonding , I tested well . Ceph works as well .

Network-bond-bridge.pngceph-health.png


Another thing :

perhaps a bug : # pvesm status ( give no thing)


bug-pvesm-1.png

Merci.
 
120 seconds after no migration . I had to reboot my server and then restarts the vm also on the same server . So the migration of vm does not work, even after 120 seconds .

How did you test? And whats the output in the log? Interesting for this actions are the logs from 'pve-ha-crm' of the current master and corosync.
 
hi, No problem with : pve-ha-lrm, pve-ha-crm, watchdog.mux.service and corosync.service There is a problem with glusterfs-server As soon as the procedure starts the server glusterfs becomes disable, off line . The migration of vm does not happen . root@pve-ceph3:~# tail -f /var/log/syslog Oct 5 19:30:41 pve-ceph3 pveproxy[28156]: proxy detected vanished client connection Oct 5 19:30:41 pve-ceph3 pveproxy[27463]: proxy detected vanished client connection Oct 5 19:30:41 pve-ceph3 pveproxy[27463]: proxy detected vanished client connection Oct 5 19:30:47 pve-ceph3 pvedaemon[1863]: successful auth for user 'root@pam' Oct 5 19:30:47 pve-ceph3 pvedaemon[1863]: successful auth for user 'root@pam' Oct 5 19:30:47 pve-ceph3 pveproxy[27463]: proxy detected vanished client connection Oct 5 19:30:48 pve-ceph3 pvedaemon[1863]: successful auth for user 'root@pam' Oct 5 19:30:51 pve-ceph3 pveproxy[27463]: proxy detected vanished client connection Oct 5 19:30:56 pve-ceph3 pveproxy[28156]: proxy detected vanished client connection Oct 5 19:31:17 pve-ceph3 pveproxy[28156]: proxy detected vanished client connection Oct 5 19:32:11 pve-ceph3 pmxcfs[1034]: [status] notice: received log Oct 5 19:32:11 pve-ceph3 pmxcfs[1034]: [status] notice: received log Oct 5 19:32:25 pve-ceph3 bash[23300]: 2015-10-05 19:32:25.803674 7f977680e700 -1 osd.5 1866 heartbeat_check: no reply from osd.0 since back 2015-10-05 19:32:05.563777 front 2015-10-05 19:32:05.563777 (cutoff 2015-10-05 19:32:05.803673) Oct 5 19:32:25 pve-ceph3 bash[23300]: 2015-10-05 19:32:25.803695 7f977680e700 -1 osd.5 1866 heartbeat_check: no reply from osd.1 since back 2015-10-05 19:32:05.563777 front 2015-10-05 19:32:05.563777 (cutoff 2015-10-05 19:32:05.803673) Oct 5 19:32:26 pve-ceph3 bash[23300]: 2015-10-05 19:32:26.803905 7f977680e700 -1 osd.5 1866 heartbeat_check: no reply from osd.0 since back 2015-10-05 19:32:05.563777 front 2015-10-05 19:32:05.563777 (cutoff 2015-10-05 19:32:06.803903) Oct 5 19:32:26 pve-ceph3 bash[23300]: 2015-10-05 19:32:26.803930 7f977680e700 -1 osd.5 1866 heartbeat_check: no reply from osd.1 since back 2015-10-05 19:32:05.563777 front 2015-10-05 19:32:05.563777 (cutoff 2015-10-05 19:32:06.803903) Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: members: 2/1034, 3/4958 Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: starting data syncronisation Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [status] notice: members: 2/1034, 3/4958 Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [status] notice: starting data syncronisation Oct 5 19:32:27 pve-ceph3 corosync[1541]: [TOTEM ] A new membership (192.168.100.20:968) was formed. Members left: 1 Oct 5 19:32:27 pve-ceph3 corosync[1541]: [QUORUM] Members[2]: 3 2 Oct 5 19:32:27 pve-ceph3 corosync[1541]: [MAIN ] Completed service synchronization, ready to provide service. Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: received sync request (epoch 2/1034/0000000C) Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [status] notice: received sync request (epoch 2/1034/0000000C) Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: received all states Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: leader is 2/1034 Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: synced members: 2/1034, 3/4958 Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: start sending inode updates Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: sent all (0) updates Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: all data is up to date Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [dcdb] notice: dfsm_deliver_queue: queue length 2 Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [status] notice: received all states Oct 5 19:32:27 pve-ceph3 pmxcfs[1034]: [status] notice: all data is up to date . . . Oct 5 19:32:49 pve-ceph3 pveproxy[28156]: proxy detected vanished client connection Oct 5 19:33:06 pve-ceph3 pmxcfs[1034]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-node/pve-ceph2: -1 Oct 5 19:33:06 pve-ceph3 pmxcfs[1034]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-node/pve-ceph2: /var/lib/rrdcached/db/pve2-node/pve-ceph2: illegal attempt to update using time 1444069986 when last update time is 1444071064 (minimum one second step) Oct 5 19:33:06 pve-ceph3 pmxcfs[1034]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/101: -1 Oct 5 19:33:06 pve-ceph3 pmxcfs[1034]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/101: /var/lib/rrdcached/db/pve2-vm/101: illegal attempt to update using time 1444069986 when last update time is 1444071064 (minimum one second step) Oct 5 19:33:20 pve-ceph3 pvedaemon[1862]: storage 'glusterfs' is not online Thank's. Moula.
 
Please repost your post and use:

[code] log text here... [/code]

tags, else it's really hard do read the post.

Is your glusterfs on an other server, outside of your cluster? I did live migration on glusterFS, and that works for sure.
I cannot imagine that it's the ha-manager stacks fault as it doesn't interferes with any storage or things like that, looks like something with your gluster setup is not working.
 
Last edited:
Please repost your post and use:

[noparse]
Code:
 log text here...
[/noparse]

tags, else it's really hard do read the post.

Is your glusterfs on an other server, outside of your cluster? I did live migration on glusterFS, and that works for sure.
I cannot imagine that it's the ha-manager stacks fault as it doesn't interferes with any storage or things like that, looks like something with your gluster setup is not working.

Hi,

I test and the vm it never migrate.
I think that the pb is glusterfs-server brick ( NFS )

Message : proxy detected vanished client connection, i think it's glusterfs

root@pve-ceph2:~# tail -f /var/log/syslog
Oct 6 10:33:28 pve-ceph2 pveproxy[2117]: proxy detected vanished client connection
Oct 6 10:33:31 pve-ceph2 pveproxy[2118]: proxy detected vanished client connection
Oct 6 10:33:31 pve-ceph2 pveproxy[2117]: proxy detected vanished client connection
Oct 6 10:34:01 pve-ceph2 pveproxy[2118]: proxy detected vanished client connection
Oct 6 10:34:31 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:35:01 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:35:12 pve-ceph2 pvestatd[1620]: status update time (300.139 seconds)
Oct 6 10:35:31 pve-ceph2 pveproxy[2117]: proxy detected vanished client connection
Oct 6 10:36:01 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:36:31 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:37:01 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:37:05 pve-ceph2 kernel: [ 1954.549773] perf interrupt took too long (2510 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Oct 6 10:37:31 pve-ceph2 pmxcfs[992]: [status] notice: received log
Oct 6 10:38:02 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:38:32 pve-ceph2 pveproxy[2117]: proxy detected vanished client connection
Oct 6 10:38:53 pve-ceph2 pmxcfs[992]: [status] notice: received log
Oct 6 10:38:54 pve-ceph2 pmxcfs[992]: [status] notice: received log
Oct 6 10:39:02 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:39:20 pve-ceph2 systemd-timesyncd[692]: interval/delta/delay/jitter/drift 2048s/-0.020s/0.237s/0.017s/-13ppm
Oct 6 10:39:32 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:39:59 pve-ceph2 pmxcfs[992]: [status] notice: received log
Oct 6 10:40:02 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:40:12 pve-ceph2 pvestatd[1620]: status update time (300.137 seconds)
Oct 6 10:40:32 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:41:00 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:41:02 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:41:32 pve-ceph2 pveproxy[2118]: proxy detected vanished client connection
Oct 6 10:42:02 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:42:32 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:42:43 pve-ceph2 pveproxy[2118]: proxy detected vanished client connection
Oct 6 10:42:43 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:43:00 pve-ceph2 pve-firewall[1614]: restarting server after 231 cycles to reduce memory usage (free 11370496 (5246976) bytes)
Oct 6 10:43:00 pve-ceph2 pve-firewall[1614]: server shutdown (restart)
Oct 6 10:43:01 pve-ceph2 pve-firewall[1614]: restarting server
Oct 6 10:43:02 pve-ceph2 pveproxy[2118]: proxy detected vanished client connection
Oct 6 10:43:05 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:43:05 pve-ceph2 pveproxy[2117]: proxy detected vanished client connection
Oct 6 10:43:35 pve-ceph2 pveproxy[2116]: proxy detected vanished client connection
Oct 6 10:43:54 pve-ceph2 pmxcfs[992]: [status] notice: received log
Oct 6 10:44:01 pve-ceph2 pmxcfs[992]: [status] notice: received log
Oct 6 10:44:17 pve-ceph2 pvedaemon[1836]: <root@pam> successful auth for user 'root@pam'
Oct 6 10:45:12 pve-ceph2 pvestatd[1620]: status update time (300.140 seconds)
Oct 6 10:45:41 pve-ceph2 pvedaemon[1838]: <root@pam> successful auth for user 'root@pam'
Oct 6 10:47:01 pve-ceph2 pvedaemon[1839]: <root@pam> successful auth for user 'root@pam'
Oct 6 10:48:14 pve-ceph2 pvedaemon[1838]: <root@pam> successful auth for user 'root@pam'
Oct 6 10:50:12 pve-ceph2 pvestatd[1620]: status update time (300.136 seconds)
Oct 6 10:50:55 pve-ceph2 pmxcfs[992]: [status] notice: received log
Oct 6 10:50:55 pve-ceph2 pmxcfs[992]: [status] notice: received log
Oct 6 10:51:09 pve-ceph2 corosync[1473]: [TOTEM ] A new membership (192.168.100.20:992) was formed. Members left: 1
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [dcdb] notice: members: 2/1061, 3/992
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [dcdb] notice: starting data syncronisation
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [status] notice: members: 2/1061, 3/992
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [status] notice: starting data syncronisation
Oct 6 10:51:09 pve-ceph2 corosync[1473]: [QUORUM] Members[2]: 3 2
Oct 6 10:51:09 pve-ceph2 corosync[1473]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [dcdb] notice: received sync request (epoch 2/1061/00000002)
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [status] notice: received sync request (epoch 2/1061/00000002)
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [dcdb] notice: received all states
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [dcdb] notice: leader is 2/1061
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [dcdb] notice: synced members: 2/1061, 3/992
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [dcdb] notice: all data is up to date
Oct 6 10:51:09 pve-ceph2 bash[1878]: 2015-10-06 10:51:09.587718 7f867c1e5700 -1 osd.2 1997 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:49.587715)
Oct 6 10:51:09 pve-ceph2 bash[1878]: 2015-10-06 10:51:09.587746 7f867c1e5700 -1 osd.2 1997 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:49.587715)
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [status] notice: received all states
Oct 6 10:51:09 pve-ceph2 pmxcfs[992]: [status] notice: all data is up to date
Oct 6 10:51:10 pve-ceph2 bash[1878]: 2015-10-06 10:51:10.587921 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:50.587919)
Oct 6 10:51:10 pve-ceph2 bash[1878]: 2015-10-06 10:51:10.587943 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:50.587919)
Oct 6 10:51:11 pve-ceph2 bash[2162]: 2015-10-06 10:51:11.415193 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:51.415191)
Oct 6 10:51:11 pve-ceph2 bash[2162]: 2015-10-06 10:51:11.415208 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:51.415191)
Oct 6 10:51:11 pve-ceph2 bash[1878]: 2015-10-06 10:51:11.588137 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:51.588135)
Oct 6 10:51:11 pve-ceph2 bash[1878]: 2015-10-06 10:51:11.588160 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:51.588135)
Oct 6 10:51:11 pve-ceph2 bash[2162]: 2015-10-06 10:51:11.757613 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:51.757611)
Oct 6 10:51:11 pve-ceph2 bash[2162]: 2015-10-06 10:51:11.757647 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:51.757611)
Oct 6 10:51:11 pve-ceph2 bash[2162]: 2015-10-06 10:51:11.915759 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:51.915758)
Oct 6 10:51:11 pve-ceph2 bash[2162]: 2015-10-06 10:51:11.915768 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:51.915758)
Oct 6 10:51:12 pve-ceph2 bash[1878]: 2015-10-06 10:51:12.588317 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:52.588316)
Oct 6 10:51:12 pve-ceph2 bash[1878]: 2015-10-06 10:51:12.588334 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:52.588316)
Oct 6 10:51:12 pve-ceph2 bash[1878]: 2015-10-06 10:51:12.670729 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:52.670727)
Oct 6 10:51:12 pve-ceph2 bash[1878]: 2015-10-06 10:51:12.670739 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06
.
.
.
Oct 6 10:51:14 pve-ceph2 bash[1878]: 2015-10-06 10:51:14.588713 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:54.588697)
Oct 6 10:51:14 pve-ceph2 bash[2162]: 2015-10-06 10:51:14.758039 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:54.758037)
Oct 6 10:51:14 pve-ceph2 bash[2162]: 2015-10-06 10:51:14.758061 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:54.758037)
Oct 6 10:51:15 pve-ceph2 bash[1878]: 2015-10-06 10:51:15.588838 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:55.588836)
Oct 6 10:51:15 pve-ceph2 bash[1878]: 2015-10-06 10:51:15.588857 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:55.588836)
Oct 6 10:51:15 pve-ceph2 bash[2162]: 2015-10-06 10:51:15.758274 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:55.758272)
Oct 6 10:51:15 pve-ceph2 bash[2162]: 2015-10-06 10:51:15.758297 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:55.758272)
Oct 6 10:51:16 pve-ceph2 bash[1878]: 2015-10-06 10:51:16.588960 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:56.588959)
Oct 6 10:51:16 pve-ceph2 bash[1878]: 2015-10-06 10:51:16.588977 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:56.588959)
Oct 6 10:51:16 pve-ceph2 bash[2162]: 2015-10-06 10:51:16.616071 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:56.616071)
Oct 6 10:51:16 pve-ceph2 bash[2162]: 2015-10-06 10:51:16.616080 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:56.616071)
Oct 6 10:51:16 pve-ceph2 bash[2162]: 2015-10-06 10:51:16.758444 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:56.758443)
Oct 6 10:51:16 pve-ceph2 bash[2162]: 2015-10-06 10:51:16.758458 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:56.758443)
Oct 6 10:51:17 pve-ceph2 bash[1878]: 2015-10-06 10:51:17.589148 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:57.589146)
Oct 6 10:51:17 pve-ceph2 bash[1878]: 2015-10-06 10:51:17.589170 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:57.589146)
Oct 6 10:51:17 pve-ceph2 bash[2162]: 2015-10-06 10:51:17.716443 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:57.716441)
Oct 6 10:51:17 pve-ceph2 bash[2162]: 2015-10-06 10:51:17.716454 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:57.716441)
Oct 6 10:51:17 pve-ceph2 bash[2162]: 2015-10-06 10:51:17.758557 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:57.758556)
Oct 6 10:51:17 pve-ceph2 bash[2162]: 2015-10-06 10:51:17.758573 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:57.758556)
Oct 6 10:51:17 pve-ceph2 bash[1878]: 2015-10-06 10:51:17.871661 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:57.871660)
Oct 6 10:51:17 pve-ceph2 bash[1878]: 2015-10-06 10:51:17.871681 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:57.871660)
Oct 6 10:51:18 pve-ceph2 bash[1878]: 2015-10-06 10:51:18.589280 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:58.589278)
Oct 6 10:51:18 pve-ceph2 bash[1878]: 2015-10-06 10:51:18.589300 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:58.589278)
Oct 6 10:51:18 pve-ceph2 bash[2162]: 2015-10-06 10:51:18.758691 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:58.758689)
Oct 6 10:51:18 pve-ceph2 bash[2162]: 2015-10-06 10:51:18.758711 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:58.758689)
Oct 6 10:51:19 pve-ceph2 bash[1878]: 2015-10-06 10:51:19.572107 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:59.572105)
Oct 6 10:51:19 pve-ceph2 bash[1878]: 2015-10-06 10:51:19.572118 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:59.572105)
Oct 6 10:51:19 pve-ceph2 bash[1878]: 2015-10-06 10:51:19.590120 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:59.590119)
Oct 6 10:51:19 pve-ceph2 bash[1878]: 2015-10-06 10:51:19.590134 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:50:59.590119)
Oct 6 10:51:19 pve-ceph2 bash[2162]: 2015-10-06 10:51:19.758845 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:59.758844)
Oct 6 10:51:19 pve-ceph2 bash[2162]: 2015-10-06 10:51:19.758864 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:50:59.758844)
Oct 6 10:51:20 pve-ceph2 bash[1878]: 2015-10-06 10:51:20.590226 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:00.590224)
Oct 6 10:51:20 pve-ceph2 bash[1878]: 2015-10-06 10:51:20.590242 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:00.590224)
Oct 6 10:51:20 pve-ceph2 bash[2162]: 2015-10-06 10:51:20.759014 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:00.759012)
Oct 6 10:51:20 pve-ceph2 bash[2162]: 2015-10-06 10:51:20.759033 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:00.759012)
Oct 6 10:51:21 pve-ceph2 bash[1878]: 2015-10-06 10:51:21.272464 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:01.272437)
Oct 6 10:51:21 pve-ceph2 bash[1878]: 2015-10-06 10:51:21.272473 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:01.272437)
Oct 6 10:51:21 pve-ceph2 bash[1878]: 2015-10-06 10:51:21.590396 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:01.590395)
Oct 6 10:51:21 pve-ceph2 bash[1878]: 2015-10-06 10:51:21.590420 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:01.590395)
Oct 6 10:51:21 pve-ceph2 bash[2162]: 2015-10-06 10:51:21.759146 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:01.759144)
Oct 6 10:51:21 pve-ceph2 bash[2162]: 2015-10-06 10:51:21.759165 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:01.759144)
Oct 6 10:51:21 pve-ceph2 bash[2162]: 2015-10-06 10:51:21.816823 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:01.816821)
Oct 6 10:51:21 pve-ceph2 bash[2162]: 2015-10-06 10:51:21.816833 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:01.816821)
Oct 6 10:51:22 pve-ceph2 bash[1878]: 2015-10-06 10:51:22.590538 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:02.590536)
Oct 6 10:51:22 pve-ceph2 bash[1878]: 2015-10-06 10:51:22.590560 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:02.590536)
Oct 6 10:51:22 pve-ceph2 bash[2162]: 2015-10-06 10:51:22.759269 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:02.759268)
Oct 6 10:51:22 pve-ceph2 bash[2162]: 2015-10-06 10:51:22.759290 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:02.759268)
Oct 6 10:51:22 pve-ceph2 bash[1878]: 2015-10-06 10:51:22.972775 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:02.972774)
Oct 6 10:51:22 pve-ceph2 bash[1878]: 2015-10-06 10:51:22.972783 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:02.972774)
Oct 6 10:51:23 pve-ceph2 bash[1878]: 2015-10-06 10:51:23.590744 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:03.590742)
Oct 6 10:51:23 pve-ceph2 bash[1878]: 2015-10-06 10:51:23.590766 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:03.590742)
Oct 6 10:51:23 pve-ceph2 bash[2162]: 2015-10-06 10:51:23.759405 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:03.759404)
Oct 6 10:51:23 pve-ceph2 bash[2162]: 2015-10-06 10:51:23.759426 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:03.759404)
Oct 6 10:51:24 pve-ceph2 bash[1878]: 2015-10-06 10:51:24.590955 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:04.590953)
Oct 6 10:51:24 pve-ceph2 bash[1878]: 2015-10-06 10:51:24.590972 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:04.590953)
Oct 6 10:51:24 pve-ceph2 bash[2162]: 2015-10-06 10:51:24.759544 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:04.759542)
Oct 6 10:51:24 pve-ceph2 bash[2162]: 2015-10-06 10:51:24.759570 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:04.759542)
Oct 6 10:51:25 pve-ceph2 bash[2162]: 2015-10-06 10:51:25.317215 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:05.317214)
Oct 6 10:51:25 pve-ceph2 bash[2162]: 2015-10-06 10:51:25.317227 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:05.317214)
Oct 6 10:51:25 pve-ceph2 bash[1878]: 2015-10-06 10:51:25.591089 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:05.591087)
Oct 6 10:51:25 pve-ceph2 bash[1878]: 2015-10-06 10:51:25.591108 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:05.591087)
Oct 6 10:51:25 pve-ceph2 bash[2162]: 2015-10-06 10:51:25.759715 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:05.759714)
Oct 6 10:51:25 pve-ceph2 bash[2162]: 2015-10-06 10:51:25.759731 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:05.759714)
Oct 6 10:51:26 pve-ceph2 bash[1878]: 2015-10-06 10:51:26.473164 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:06.473163)
Oct 6 10:51:26 pve-ceph2 bash[1878]: 2015-10-06 10:51:26.473184 7f866345e700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:06.473163)
Oct 6 10:51:26 pve-ceph2 bash[1878]: 2015-10-06 10:51:26.591273 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:06.591271)
Oct 6 10:51:26 pve-ceph2 bash[1878]: 2015-10-06 10:51:26.591294 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:06.591271)
Oct 6 10:51:26 pve-ceph2 bash[2162]: 2015-10-06 10:51:26.759842 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:06.759840)
Oct 6 10:51:26 pve-ceph2 bash[2162]: 2015-10-06 10:51:26.759862 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:06.759840)
Oct 6 10:51:27 pve-ceph2 bash[2162]: 2015-10-06 10:51:27.017482 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:07.017481)
Oct 6 10:51:27 pve-ceph2 bash[2162]: 2015-10-06 10:51:27.017491 7f857022f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:07.017481)
Oct 6 10:51:27 pve-ceph2 bash[1878]: 2015-10-06 10:51:27.591401 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:07.591399)
Oct 6 10:51:27 pve-ceph2 bash[1878]: 2015-10-06 10:51:27.591417 7f867c1e5700 -1 osd.2 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:49.267595 front 2015-10-06 10:50:49.267595 (cutoff 2015-10-06 10:51:07.591399)
Oct 6 10:51:27 pve-ceph2 bash[2162]: 2015-10-06 10:51:27.759966 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.0 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:07.759965)
Oct 6 10:51:27 pve-ceph2 bash[2162]: 2015-10-06 10:51:27.759985 7f858913f700 -1 osd.3 1998 heartbeat_check: no reply from osd.1 since back 2015-10-06 10:50:51.012018 front 2015-10-06 10:50:51.012018 (cutoff 2015-10-06 10:51:07.759965)
Oct 6 10:53:10 pve-ceph2 pve-ha-crm[2074]: successfully acquired lock 'ha_manager_lock'
Oct 6 10:53:10 pve-ceph2 pve-ha-crm[2074]: watchdog active
Oct 6 10:53:10 pve-ceph2 pve-ha-crm[2074]: status change slave => master
Oct 6 10:53:10 pve-ceph2 pve-ha-crm[2074]: node 'pve-ceph1': state changed from 'online' => 'unknown'
Oct 6 10:55:14 pve-ceph2 pvestatd[1620]: status update time (302.124 seconds)
Oct 6 10:58:19 pve-ceph2 pmxcfs[992]: [status] notice: received log


And no thing else!!!

thank's.
Moula.
 
I reboot the first server and the vm it started.
I use cluster Gluster for nfs -server . It works well alone . Dice that combines iso glusterfs with HA , There are a problem.

root@pve-ceph1:~# gluster volume info

Volume Name: glusterstorage
Type: Replicate
Volume ID: c4d59173-0e90-400d-8344-0181d624486c
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: pve-ceph1:/var/lib/vz/glusterfs
Brick2: pve-ceph2:/var/lib/vz/glusterfs
Brick3: pve-ceph3:/var/lib/vz/glusterfs

root@pve-ceph1:~# gluster peer status
Number of Peers: 2

Hostname: pve-ceph2
Uuid: 0394c647-4fa8-4386-bb20-d43d3812c4b9
State: Peer in Cluster (Connected)

Hostname: pve-ceph3
Uuid: 25ac48d4-e77d-4811-bbdf-097b9d03788b
State: Peer in Cluster (Connected)

The commande # pvesm status
it give me nothing and look the images from glusterfs-cluster.

Gluster-content it's ok but Glusters-sammury give me no thing.


Glusters-Content.pngGluster-summary.png

Thank's.

Moula.
 
I'll quote myself from a mailing list post:
We already suspected that there would be some outcries on graceful shutdowns, and it's also a little bit an opinion thing if the service should migrate.
But as an graceful shutdown is by no ways an fault, we defined that the HA stack shouldn't interfere here. Also making an shutdon and waiting for the HA Manager would cost you atleast 120 seconds downtime, but live migrating them before only some milliseconds... what is better?

It could generate also some problems if at a time all services get migrated and restarted, and no one wants a out of control feedback loop.

This might not be the right thread to ask this.

Would it be possible to add a "toggle" switch to the Gui that allows the User to define this behaviour on their own ?

Ie.:
Toggle Off: Don't migrate on Graceful shutdown
Toggle On: Do migrate on graceful shutdown

Same thing for migration time, add a configurable time-field that will be taken into account for when deciding how long to wait after a node can not be reached anymore, before the service is restarted on the other nodes.

thoughts ?
 
Same thing for migration time, add a configurable time-field that will be taken into account for when deciding how long to wait after a node can not be reached anymore, before the service is restarted on the other nodes.

HA is already complex enough, and this would just adds more complexity. So this is not planned for now.
 
Hi Thomas,
I remade my POC fully with the stable version 4.0 with glusterfs, ceph and the HA works well.
I remove the cable from any server in the cluster and everything works. The vm is normally migrates .
See the images in order .
VM-HA-1.pngVM-HA-2.pngVM-HA-Fence-3.pngVM-HA-Started-4.pngSummary-Glusterfs.png

Thank's
Moula.
 
I have the same problem.

test environment: 4 nodes, pveversion -v:
Code:
# pveversion -v
proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)
pve-kernel-4.2.6-1-pve: 4.2.6-26
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-33
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

# ha-manager status
quorum OK
master pve21 (active, Wed Dec 16 16:19:33 2015)
lrm pve20 (active, Wed Dec 16 16:19:33 2015)
lrm pve21 (active, Wed Dec 16 16:19:33 2015)
lrm pve22 (active, Wed Dec 16 16:19:28 2015)
lrm pve23 (active, Wed Dec 16 16:19:33 2015)
service vm:101 (pve21, started)
service vm:102 (pve22, started)

gui showing vm managed by HA:

ha-1.JPG

dirty shutdown node with HA managed vm (pve22). node goes red, but vm stays in "frozen" status.

Code:
# ha-manager status
quorum OK
master pve21 (active, Wed Dec 16 16:37:33 2015)
lrm pve20 (active, Wed Dec 16 16:37:35 2015)
lrm pve21 (active, Wed Dec 16 16:37:33 2015)
lrm pve22 (old timestamp - dead?, Wed Dec 16 16:26:32 2015)
lrm pve23 (active, Wed Dec 16 16:37:35 2015)
service vm:101 (pve21, started)
service vm:102 (pve22, freeze)

it just stays that way until the node is powered back on. I have to imagine it has to do with fencing not working correctly, but I dont know how to test fencing with 4.x. Any suggestions?
 
it just stays that way until the node is powered back on. I have to imagine it has to do with fencing not working correctly, but I dont know how to test fencing with 4.x. Any suggestions?

Currently, the LRM puts the VMs into frozen state when you do a clean shutdown. We now decided to change that behavior, so that services stop on shutdown, then migrate to other nodes. So this will be fixed with the next package update.
 
I want to make it clear that I did not do a clean shutdown. I killed power from the bmc, which should have triggered a failover.

Doesn't sends that an acpi shutdown actions which results in a graceful shutdown? But since pve-ha-manager 1.0-15 (currently only available in pvetest repo) there should also happen a relocation (after about 2 minutes).
 
Hi Thomas-

No; doing a power off at the BMC is equivalent to flipping the power switch to the logical server hardware. There is an option to do an ACPI off, but I didnt use it. I can report that pulling a live blade out of the chassis did trigger a failover (why is it so slow?!) so there may be something to what you're suggesting, but I cant figure out how.

I'll enable the test repo and report back.
 
so this is strange.

I updated 4 nodes with the test rep and performed the same test (power off node from the BMC.)

Code:
pveversion -v
proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

The node goes down, and after ~3min ha-manager displays:

Code:
#ha-manager status
quorum OK
master pve20 (active, Mon Dec 21 10:51:14 2015)
lrm pve20 (active, Mon Dec 21 10:51:23 2015)
lrm pve21 (active, Mon Dec 21 10:51:20 2015)
lrm pve22 (old timestamp - dead?, Mon Dec 21 10:45:12 2015)
lrm pve23 (active, Mon Dec 21 10:51:20 2015)
service vm:101 (pve20, started)
service vm:102 (pve20, started)

Looks like a normal migration, right? but the VM remains dead and is not responding...

upload_2015-12-21_10-52-55.png

Even after bringing the downed node back up the vm remains off even though its listed as an HA managed service. attempting to start manually fails, as does attempts to move it to another node. its good and stuck.
 
Last edited:
Try:
Code:
ha-manager disable vm:102
ha-manager enable vm:102

Also can you please attach the logs from the CRM master at that time (from your post I guess its pve20).
Maybe filter it a bit, something like:
Code:
journalctl -u pve-ha-crm.service -u pve-ha-lrm.service -u pve-cluster.service > journal-`date +%Y-%m-%d-%H%M%S`.log

I cannot reproduce such issues so it's important to have the info so we can find and fix an eventual bug or help you with the configuration, thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!