[SOLVED] pve 3.1: Problem starting HA enabled CT

mir

Famous Member
Apr 14, 2012
3,585
142
133
Copenhagen, Denmark
Hi all,

Since upgrading to 3.1 I am no longer able to start HA enable CT's. Trying to start resolves to this error message:
Code:
Executing HA start for CT 101
Member esx2 trying to enable pvevm:101...Aborted; service failed
TASK ERROR: command 'clusvcadm -e pvevm:101 -m esx2' failed: exit code 254[COLOR=#000000][FONT=tahoma]
[/FONT][/COLOR]

The weird thing is that if I copy the command shown above to the command line on the host and execute it the CT start without any problems what so ever. Also if I disable HA for the CT I can start it from the GUI without any problems what so ever.

Any ideas where to search the problem?
 
Last edited:
Re: pve 3.1: Problem starting HA enabled CT

This fix from the thread you referred to does not fix it here. Also I have had the CT's in question stopped and migrated between different nodes both online and offline without HA enabled but the problem persists as soon as I enable HA.
 
Re: pve 3.1: Problem starting HA enabled CT

LOL.
My first thoguht was, well you are the "Senior Member" I though you would know. And you live in Denmark, you are right down the street from Proxmox. I'm in the US
But, OAN, I don't know, and all I can think of is either all machines are not on the same version, or maybe somethings up with the quorum.
 
Re: pve 3.1: Problem starting HA enabled CT

Is there any hint in /var/log/syslog or /var/log/cluster/rgmanager.log?
 
Re: pve 3.1: Problem starting HA enabled CT

Is there any hint in /var/log/syslog
Aug 24 13:14:42 esx1 pmxcfs[3447]: [dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
Aug 24 13:14:42 esx1 corosync[3604]: [QUORUM] Members[2]: 1 2
Aug 24 13:14:42 esx1 pmxcfs[3447]: [status] notice: update cluster info (cluster name midgaard, version = 104)
Aug 24 13:14:43 esx1 rgmanager[33034]: Reconfiguring
Aug 24 13:14:43 esx1 rgmanager[33034]: Loading Service Data
Aug 24 13:14:45 esx1 rgmanager[33034]: Stopping changed resources.
Aug 24 13:14:45 esx1 rgmanager[33034]: Restarting changed resources.
Aug 24 13:14:45 esx1 rgmanager[33034]: Starting changed resources.
Aug 24 13:14:53 esx1 rgmanager[258315]: [pvevm] VM 109 is running
Aug 24 13:14:53 esx1 rgmanager[258334]: [pvevm] VM 114 is running
Aug 24 13:14:53 esx1 rgmanager[258353]: [pvevm] VM 117 is running


or /var/log/cluster/rgmanager.log?
grep -i 'pvevm:101' /var/log/cluster/rgmanager.log
Aug 24 13:09:40 rgmanager Initializing pvevm:101
Aug 24 13:09:40 rgmanager pvevm:101 was added to the config, but I am not initializing it.
Aug 24 13:12:49 rgmanager Initializing pvevm:101
Aug 24 13:12:49 rgmanager pvevm:101 was added to the config, but I am not initializing it.
Aug 24 13:12:56 rgmanager #43: Service pvevm:101 has failed; can not start.
Aug 24 13:12:56 rgmanager #13: Service pvevm:101 failed to stop cleanly
Aug 24 13:14:17 rgmanager Initializing pvevm:101
Aug 24 13:14:17 rgmanager pvevm:101 was added to the config, but I am not initializing it.


This looks strange. This one line shows up in syslog every time I try to enable a new HA.
Aug 24 13:14:24 esx1 rgmanager[33034]: BUG! Attempt to forward to myself!
 
Last edited:
Re: pve 3.1: Problem starting HA enabled CT

The CT is stopped when you add it to HA?
 
Last edited:
Re: pve 3.1: Problem starting HA enabled CT

Also, this looks like a 2 node system, so how do you manage quorum?
 
Re: pve 3.1: Problem starting HA enabled CT

The CT is stopped when you add it to HA?
I am not sure. But initially when I upgraded the CT's was migrated away to another node and when this node was to be upgraded they were migrated to an upgraded note. The problems first occur when I tried to migrate from a 3.1 node.
 
Re: pve 3.1: Problem starting HA enabled CT

To answer myself. Apparently a CT needs to be off-line when adding it to HA to be able to do migration.