Proxmox 4.0 and 4.1 testing HA (pulled power cable, ipmitool power off) problem

dernikov1

New Member
Sep 4, 2013
16
0
1
Croatia
Hi,
I am testing 4.1 version on 3 node cluster with FC shared storage. I have problem with HA and i experienced sam behaviour on 4.0 version too.

If network cable removed from node, VM restart on other node.(OK)
If power cables pulled from node (hardware failure simulation),
or node shut down with "ipmitool power off", VM stay in started state on shut down node, or show freeze (web interface of other nodes, to clarify)
(NOT OK).
I use now 4.1-1/2f9650d4 version.
Can you help me ? Where to look for error, can i do something to fix that.
Thanks.
 
Looks you do a clean shutdown (acpid)? A normal shutdown simply puts the HA resources into freeze state, because it assumes that the node will restart soon.
 
Looks you do a clean shutdown (acpid)? A normal shutdown simply puts the HA resources into freeze state, because it assumes that the node will restart soon.

Literally for test i pulled power cables. I wouldn't call that "clean shutdown".
Ipmitool do the same, cut power off for server.
In 3.x versions with this test vm restarted on other nodes.
Only difference is that pulled network cable could not send anything, and in that case vm restarted on another nodes.
Maybe pulling power cables and ipmitool command is not fast enough and few network packet pass trough to other working nodes and that is cause of freeze or false status.
I will try to capture those packet.
In any case if server stay out of power it should restart HA vm on other nodes. That is HA key feature.
 
Literally for test i pulled power cables. I wouldn't call that "clean shutdown".[...]
Maybe pulling power cables and ipmitool command is not fast enough and few network packet pass trough to other working nodes and that is cause of freeze or false status.
I will try to capture those packet.
In any case if server stay out of power it should restart HA vm on other nodes. That is HA key feature.

Afaik and i'm not sure it relates here, but doesn't the Proxmox-HA using software watchdog take 300 seconds to do proper fencing/HA-Failover in some cases ???
 
Only difference is that pulled network cable could not send anything, and in that case vm restarted on another nodes.
Maybe pulling power cables and ipmitool command

So if you pull power cables it works a expected.
 
Afaik and i'm not sure it relates here, but doesn't the Proxmox-HA using software watchdog take 300 seconds to do proper fencing/HA-Failover in some cases ???

I didn't know that info. I think that i waited much more then 300 sec, 15 -30 minutes, but i did not watch on clock.
One test more,i will test tomorow with stopwatch to be sure.
Thanks for new testring vector.
 
I think it was Dietmar who answered in a thread regarding HA sometime around October. ...
lemme dig it up.

here it is:
https://forum.proxmox.com/threads/proxmox4-ha-not-working-feedback.23770/


edit: its 120 seconds not 300 (120 seconds is 3 Minutes - Time + Memory = Transposed Numbers)
Edit2: according to t.lamprecht a good test is to pull the Network-plug, then if its not migrated after 120 seconds its a good idea to report it as an issue :)
Edit3: there is also a answer to the "gracefull shutdown" question :)
 
So if you pull power cables it works a expected.

No, if i pull network cables it works as expected (VM restarted on other nodes).
If i pulled power cables, once happened freeze state and few (10 times) node icon become red,
but VM in HA part of web management page stay like (proxmox1,vm:100,started)
Am not sure for word order but relevant string is started.
To clear, node is dead power cables pulled, and VM od that node also.
 
I think it was Dietmar who answered in a thread regarding HA sometime around October. ...
lemme dig it up.

here it is:
https://forum.proxmox.com/threads/proxmox4-ha-not-working-feedback.23770/


edit: its 120 seconds not 300 (120 seconds is 3 Minutes - Time + Memory = Transposed Numbers)
Edit2: according to t.lamprecht a good test is to pull the Network-plug, then if its not migrated after 120 seconds its a good idea to report it as an issue :)
Edit3: there is also a answer to the "gracefull shutdown" question :)

I didn't know that info. I think that i waited much more then 300 sec, 15 -30 minutes, but i did not watch on clock.
One test more,i will test tomorow with stopwatch to be sure.
Thanks for new testring vector.

Hi,
Can you post result of
"ipmitool mc watchdog get" ?

(ipmitool package need to be installed)

what is your server model ? which watchdog do you use ?
 
Please can you post the CRM status after that happens?

# cat /etc/pve/ha/manager_status

and also the LRM status from all nodes:

# cat /etc/pve/nodes/<nodename>/lrm_status
 
Hi,
Can you post result of
"ipmitool mc watchdog get" ?

(ipmitool package need to be installed)

what is your server model ? which watchdog do you use ?

Hi
server model: it is rather old one IBM X3550 (M1 verzion) i setup configuration for use of ipmi_watchdog

# output from ipmitool mc watchdog get
Watchdog Timer Use: SMS/OS (0x44)
Watchdog Timer Is: Started/Running
Watchdog Timer Actions: Power Cycle (0x03)
Pre-timeout interval: 0 seconds
Timer Expiration Flags: 0x10
Initial Countdown: 10 sec
Present Countdown: 9 sec

I setup watchdog by instructions from url: https://pve.proxmox.com/wiki/High_A...PMI_Watchdog_.28module_.22ipmi_watchdog.22.29
 
Post also please:
# pvecm status

# cat /etc/pve/.members

after you plugged out the power and when the problem is happening.

This seems really strange to me, I also tried to reproduce it, but no success, I ran such scenarios surely over 100times (mostly with a virtual test cluster where i killed a Node, but nonetheless).
 
Please can you post the CRM status after that happens?

It is playing with my nerves :(. Yesterday for the first time, when pulled power cables VM restarted on other nodes.
And it worked 5 times. We tested all nodes with power cables pulled (eache time single node, to be clear).
Today i prepare to collect logs and decided to pulled power cables once more.
This time didn't work :). VM stay in mode started, node is red (old timestamp dead ?) power cables pulled.
After node came back on, VM stays offline even it shows that is started in HA tab of web gui.
I think that happend because cluster thinks that vm is started even node is dead, and when node came back on, he don't need to start VM because it is in "started" mode, it is "running" during test.
Here are captured logs from all nodes. I collect every log from all nodes, if they are the same ignore extra data.

Just one note. I just tried again ipmitool power off. By * iti s a hard power off just like pulled power cables , and node goes down, but VM move to freeze state.
So i have different behavior for power cables pulled and ipmitool, even it must be the same by documentation, because ipmitool just shuts power to motherboard.

ipmitool [chassis] power off # issue a hard power off
*https://support.pivotal.io/hc/en-us/articles/206396927-How-to-work-on-IPMI-and-IPMITOOL


[noparse]
proxmox1test
# cat /etc/pve/ha/manager_status

and also the LRM status from all nodes:
proxmox1test
l{"results":{"DdUF0XO+DIMAgWSvLXNw+g":{"exit_code":3,"sid":"vm:100","state":"started"}},"mode":"active","timestamp":1450261447}
proxmox2test
{"timestamp":1450261452,"results":{"SITDEDrBV1bVaa4jyRXFJg":{"exit_code":0,"sid":"vm:101","state":"started"}},"mode":"active"}
# cat /etc/pve/nodes/<nodename>/lrm_status
proxmox3test
Results
power cable pulled off

proxmox1test
/etc/pve/nodes/proxmox1test/lrm_status
{"results":{"DdUF0XO+DIMAgWSvLXNw+g":{"exit_code":3,"sid":"vm:100","state":"started"}},"mode":"active","timestamp":1450261447}
/etc/pve/nodes/proxmox2test/lrm_status
{"timestamp":1450261452,"results":{"SITDEDrBV1bVaa4jyRXFJg":{"exit_code":0,"sid":"vm:101","state":"started"}},"mode":"active"}
/etc/pve/nodes/proxmox3test/lrm_status
{"timestamp":1450261450,"mode":"active","results":{}}

/etc/pve/ha/manager_status
{"service_status":{"vm:100":{"uid":"DdUF0XO+DIMAgWSvLXNw+g","node":"proxmox1test","state":"started"},"vm:101":{"uid":"SITDEDrBV1bVaa4jyRXFJg","node":"proxmox2test","state":"started"}},"relocate_trial":{"vm:100":0,"vm:101":0},"master_node":"proxmox1test","timestamp":1450261451,"node_status":{"proxmox1test":"online","proxmox2test":"online","proxmox3test":"online"}}

proxmox2test
/etc/pve/nodes/proxmox1test/lrm_status
{"timestamp":1450261457,"results":{"DdUF0XO+DIMAgWSvLXNw+g":{"state":"started","sid":"vm:100","exit_code":3}},"mode":"active"}
/etc/pve/nodes/proxmox2test/lrm_status
{"timestamp":1450261452,"results":{"SITDEDrBV1bVaa4jyRXFJg":{"exit_code":0,"sid":"vm:101","state":"started"}},"mode":"active"}
/etc/pve/nodes/proxmox3test/lrm_status
{"results":{},"mode":"active","timestamp":1450261460}
/etc/pve/ha/manager_status
{"service_status":{"vm:100":{"uid":"DdUF0XO+DIMAgWSvLXNw+g","node":"proxmox1test","state":"started"},"vm:101":{"uid":"SITDEDrBV1bVaa4jyRXFJg","node":"proxmox2test","state":"started"}},"relocate_trial":{"vm:100":0,"vm:101":0},"master_node":"proxmox1test","timestamp":1450261461,"node_status":{"proxmox1test":"online","proxmox2test":"online","proxmox3test":"online"}}}

proxmox3test
/etc/pve/nodes/proxmox1test/lrm_status
{"timestamp":1450261457,"results":{"DdUF0XO+DIMAgWSvLXNw+g":{"state":"started","sid":"vm:100","exit_code":3}},"mode":"active"}
/etc/pve/nodes/proxmox2test/lrm_status
{"timestamp":1450261462,"results":{"SITDEDrBV1bVaa4jyRXFJg":{"state":"started","exit_code":0,"sid":"vm:101"}},"mode":"active"}
/etc/pve/nodes/proxmox3test/lrm_status
{"timestamp":1450261465,"mode":"active","results":{}}
/etc/pve/ha/manager_status
{"service_status":{"vm:100":{"uid":"DdUF0XO+DIMAgWSvLXNw+g","node":"proxmox1test","state":"started"},"vm:101":{"uid":"SITDEDrBV1bVaa4jyRXFJg","node":"proxmox2test","state":"started"}},"relocate_trial":{"vm:100":0,"vm:101":0},"master_node":"proxmox1test","timestamp":1450261461,"node_status":{"proxmox1test":"online","proxmox2test":"online","proxmox3test":"online"}}
[/noparse]
 

Attachments

  • proxmox-pulledpower-logs.txt
    2.5 KB · Views: 3
So this is a bit after you pulled the plug on node 3?

Okay, could you also post the info I requested above plus the logs/journals (at least from the master node) around the time you pulled the plug.
Maybe upload /var/log/syslog here, we need as much information as possible so we can reproduce and fix this.
 
Please can you post the CRM status after that happens?

# cat /etc/pve/ha/manager_status

and also the LRM status from all nodes:

# cat /etc/pve/nodes/<nodename>/lrm_status

Post also please:
# pvecm status

# cat /etc/pve/.members

after you plugged out the power and when the problem is happening.

This seems really strange to me, I also tried to reproduce it, but no success, I ran such scenarios surely over 100times (mostly with a virtual test cluster where i killed a Node, but nonetheless).


#collected output but this time power cables pulled VM restarted on other nodes..
I don't see rule, at start didn't work, then after 10 restart started to work, then after 5 restart that didn't wok, and now again work.
I don't know, i simple pull power cables, i can't pull cables wrong way.

I will try again till error.

Here is askede output maybe will help a little.

#cat /etc/pve/.members
------------------------
{
"nodename": "proxmox1test",
"version": 14,
"cluster": { "name": "DMZCLUSTER", "version": 3, "nodes": 3, "quorate": 1 },
"nodelist": {
"proxmox2test": { "id": 2, "online": 1, "ip": "192.168.102.22"},
"proxmox1test": { "id": 1, "online": 1, "ip": "192.168.102.21"},
"proxmox3test": { "id": 3, "online": 0, "ip": "192.168.102.23"}
}
}


#PVECM STATUS
---------------
Quorum information
------------------
Date: Wed Dec 16 15:27:51 2015
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1252
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.102.21 (local)
0x00000002 1 192.168.102.22
 
Post also please:
# pvecm status

# cat /etc/pve/.members

after you plugged out the power and when the problem is happening.

This seems really strange to me, I also tried to reproduce it, but no success, I ran such scenarios surely over 100times (mostly with a virtual test cluster where i killed a Node, but nonetheless).

#################################################
Here is output after pulled power cables with problem occured. After third time pulled power cables i got error.
Output.

#cat /etc/pve/.members
------------------------
{
"nodename": "proxmox1test",
"version": 16,
"cluster": { "name": "DMZCLUSTER", "version": 3, "nodes": 3, "quorate": 1 },
"nodelist": {
"proxmox2test": { "id": 2, "online": 1, "ip": "192.168.102.22"},
"proxmox1test": { "id": 1, "online": 1, "ip": "192.168.102.21"},
"proxmox3test": { "id": 3, "online": 0, "ip": "192.168.102.23"}
}
}


#PVECM STATUS
---------------
Quorum information
------------------
Date: Wed Dec 16 15:40:42 2015
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1260
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.102.21 (local)
0x00000002 1 192.168.102.22
problem_VM_started_on_failed_node.JPG
 
Yes but the Services are on the proxmox1test and proxmox2test NOT on the now offline proxmox3test node.
So everything is as expected I guess? The three view on the left side is outdated and should updated soon, if something is not correct please post the logs from the proxmox1test node here, that is needed to help you (if the machine really did not relocate, but it seems like it did)
 
So this is a bit after you pulled the plug on node 3?

Okay, could you also post the info I requested above plus the logs/journals (at least from the master node) around the time you pulled the plug.
Maybe upload /var/log/syslog here, we need as much information as possible so we can reproduce and fix this.

###########################################
Yes, in seconds after pulled plug.

I disable ipv6 so there are many messages about iptables6 . I clean sysslog from iptables6-restore messages like under.
Dec 16 15:40:10 proxmox1test pve-firewall[2397]: status update error: iptables_restore_cmdlist: Try `ip6tables-restore -h' or 'ip6tables-restore --help' for more information.

What else do you need from: logs/journals ?

Syslog is under:

clean syslog
-----------------

Dec 16 15:40:06 proxmox1test corosync[2461]: [TOTEM ] A processor failed, forming new configuration.
Dec 16 15:40:08 proxmox1test corosync[2461]: [TOTEM ] A new membership (192.168.102.21:1260) was formed. Members left: 3
Dec 16 15:40:08 proxmox1test corosync[2461]: [TOTEM ] Failed to receive the leave message. failed: 3
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: members: 1/2338, 2/2346
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: starting data syncronisation
Dec 16 15:40:08 proxmox1test corosync[2461]: [QUORUM] Members[2]: 1 2
Dec 16 15:40:08 proxmox1test corosync[2461]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: cpg_send_message retried 1 times
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [status] notice: members: 1/2338, 2/2346
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [status] notice: starting data syncronisation
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: received sync request (epoch 1/2338/0000000C)
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [status] notice: received sync request (epoch 1/2338/0000000C)
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: received all states
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: leader is 1/2338
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: synced members: 1/2338, 2/2346
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: start sending inode updates
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: sent all (0) updates
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: all data is up to date
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [dcdb] notice: dfsm_deliver_queue: queue length 2
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [status] notice: received all states
Dec 16 15:40:08 proxmox1test pmxcfs[2338]: [status] notice: all data is up to date
Dec 16 15:40:08 proxmox1test pve-ha-lrm[25758]: service 'vm:100' not on this node
Dec 16 15:40:12 proxmox1test pve-ha-crm[2476]: node 'proxmox3test': state changed from 'online' => 'unknown'
Dec 16 15:40:18 proxmox1test pve-ha-lrm[25772]: service 'vm:100' not on this node
Dec 16 15:40:28 proxmox1test pve-ha-lrm[25786]: service 'vm:100' not on this node
Dec 16 15:40:36 proxmox1test pveproxy[23820]: proxy detected vanished client connection
Dec 16 15:40:38 proxmox1test pve-ha-lrm[25800]: service 'vm:100' not on this node
Dec 16 15:40:48 proxmox1test pve-ha-lrm[25816]: service 'vm:100' not on this node
Dec 16 15:40:58 proxmox1test pve-ha-lrm[25836]: service 'vm:100' not on this node
Dec 16 15:41:08 proxmox1test pve-ha-lrm[25850]: service 'vm:100' not on this node
Dec 16 15:41:18 proxmox1test pve-ha-lrm[25864]: service 'vm:100' not on this node
Dec 16 15:41:19 proxmox1test pvedaemon[2472]: <root@pam> successful auth for user 'root@pam'
Dec 16 15:41:22 proxmox1test pvedaemon[2472]: <root@pam> successful auth for user 'root@pam'
Dec 16 15:41:23 proxmox1test pvedaemon[2473]: <root@pam> successful auth for user 'root@pam'
Dec 16 15:41:28 proxmox1test pve-ha-lrm[25878]: service 'vm:100' not on this node
Dec 16 15:41:30 proxmox1test pvedaemon[2472]: <root@pam> successful auth for user 'root@pam'
Dec 16 15:41:39 proxmox1test pve-ha-lrm[25892]: service 'vm:100' not on this node
Dec 16 15:41:48 proxmox1test pve-ha-lrm[25906]: service 'vm:100' not on this node
Dec 16 15:41:50 proxmox1test pvedaemon[2473]: <root@pam> successful auth for user 'root@pam'
Dec 16 15:41:58 proxmox1test pve-ha-lrm[25926]: service 'vm:100' not on this node
Dec 16 15:42:08 proxmox1test pve-ha-lrm[25940]: service 'vm:100' not on this node
Dec 16 15:42:18 proxmox1test pve-ha-lrm[25954]: service 'vm:100' not on this node
Dec 16 15:42:28 proxmox1test pve-ha-lrm[25968]: service 'vm:100' not on this node
Dec 16 15:42:38 proxmox1test pve-ha-lrm[25982]: service 'vm:100' not on this node
Dec 16 15:42:48 proxmox1test pve-ha-lrm[25996]: service 'vm:100' not on this node
Dec 16 15:42:56 proxmox1test pvedaemon[2473]: <root@pam> successful auth for user 'root@pam'
Dec 16 15:42:58 proxmox1test pve-ha-lrm[26016]: service 'vm:100' not on this node
Dec 16 15:43:08 proxmox1test pve-ha-lrm[26030]: service 'vm:100' not on this node
Dec 16 15:43:18 proxmox1test pve-ha-lrm[26044]: service 'vm:100' not on this node
Dec 16 15:43:28 proxmox1test pve-ha-lrm[26058]: service 'vm:100' not on this node
Dec 16 15:43:38 proxmox1test pve-ha-lrm[26072]: service 'vm:100' not on this node
Dec 16 15:43:48 proxmox1test pve-ha-lrm[26086]: service 'vm:100' not on this node
Dec 16 15:43:58 proxmox1test pve-ha-lrm[26106]: service 'vm:100' not on this node
Dec 16 15:44:08 proxmox1test pve-ha-lrm[26120]: service 'vm:100' not on this node
Dec 16 15:44:18 proxmox1test pve-ha-lrm[26134]: service 'vm:100' not on this node
Dec 16 15:44:28 proxmox1test pve-ha-lrm[26148]: service 'vm:100' not on this node
Dec 16 15:44:38 proxmox1test pve-ha-lrm[26162]: service 'vm:100' not on this node
Dec 16 15:44:48 proxmox1test pve-ha-lrm[26176]: service 'vm:100' not on this node
Dec 16 15:44:58 proxmox1test pve-ha-lrm[26196]: service 'vm:100' not on this node
Dec 16 15:45:01 proxmox1test CRON[26211]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 16 15:45:08 proxmox1test pve-ha-lrm[26213]: service 'vm:100' not on this node
Dec 16 15:45:18 proxmox1test pve-ha-lrm[26228]: service 'vm:100' not on this node
Dec 16 15:45:28 proxmox1test pve-ha-lrm[26242]: service 'vm:100' not on this node
Dec 16 15:45:38 proxmox1test pve-ha-lrm[26256]: service 'vm:100' not on this node
Dec 16 15:45:48 proxmox1test pve-ha-lrm[26270]: service 'vm:100' not on this node
Dec 16 15:45:58 proxmox1test pve-ha-lrm[26290]: service 'vm:100' not on this node
Dec 16 15:46:08 proxmox1test pve-ha-lrm[26304]: service 'vm:100' not on this node
Dec 16 15:46:09 proxmox1test pvedaemon[2473]: <root@pam> successful auth for user 'root@pam'
Dec 16 15:46:18 proxmox1test pve-ha-lrm[26318]: service 'vm:100' not on this node
Dec 16 15:46:28 proxmox1test pve-ha-lrm[26332]: service 'vm:100' not on this node
Dec 16 15:46:38 proxmox1test pve-ha-lrm[26346]: service 'vm:100' not on this node
Dec 16 15:46:48 proxmox1test pve-ha-lrm[26360]: service 'vm:100' not on this node
Dec 16 15:46:58 proxmox1test pve-ha-lrm[26381]: service 'vm:100' not on this node
Dec 16 15:47:08 proxmox1test pve-ha-lrm[26395]: service 'vm:100' not on this node
Dec 16 15:47:18 proxmox1test pve-ha-lrm[26409]: service 'vm:100' not on this node
Dec 16 15:47:28 proxmox1test pve-ha-lrm[26423]: service 'vm:100' not on this node
Dec 16 15:47:38 proxmox1test pve-ha-lrm[26439]: service 'vm:100' not on this node
Dec 16 15:47:48 proxmox1test pve-ha-lrm[26453]: service 'vm:100' not on this node
Dec 16 15:47:58 proxmox1test pve-ha-lrm[26473]: service 'vm:100' not on this node
Dec 16 15:48:08 proxmox1test pve-ha-lrm[26487]: service 'vm:100' not on this node
Dec 16 15:48:18 proxmox1test pve-ha-lrm[26501]: service 'vm:100' not on this node
Dec 16 15:48:28 proxmox1test pve-ha-lrm[26515]: service 'vm:100' not on this node
Dec 16 15:48:38 proxmox1test pve-ha-lrm[26529]: service 'vm:100' not on this node
Dec 16 15:48:48 proxmox1test pve-ha-lrm[26544]: service 'vm:100' not on this node
Dec 16 15:48:58 proxmox1test pve-ha-lrm[26564]: service 'vm:100' not on this node
Dec 16 15:49:08 proxmox1test pve-ha-lrm[26578]: service 'vm:100' not on this node
Dec 16 15:49:18 proxmox1test pve-ha-lrm[26592]: service 'vm:100' not on this node
Dec 16 15:49:28 proxmox1test pve-ha-lrm[26606]: service 'vm:100' not on this node
Dec 16 15:49:38 proxmox1test pve-ha-lrm[26620]: service 'vm:100' not on this node
Dec 16 15:49:48 proxmox1test pve-ha-lrm[26634]: service 'vm:100' not on this node
Dec 16 15:49:58 proxmox1test pve-ha-lrm[26654]: service 'vm:100' not on this node
 
Yes but the Services are on the proxmox1test and proxmox2test NOT on the now offline proxmox3test node.
So everything is as expected I guess? The three view on the left side is outdated and should updated soon, if something is not correct please post the logs from the proxmox1test node here, that is needed to help you (if the machine really did not relocate, but it seems like it did)

Sorry, i didn't see this post. Here is a picture which will explain everything.
This is status from one minute ago plus typing.
Node3 is down (power cabels pulled), VM100 is down, at same time on picture it state that vm is on proxmox1 node, started. But
Console output show that vm100 don't work. VM100 is W2008R2 copy from production, if it work it would be welcome screen form Windows, but is black.
node proxmox3 is down from 15:40 CET.
What else you need ?
 

Attachments

  • All_at_same_picture.JPG
    All_at_same_picture.JPG
    102.4 KB · Views: 13

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!