Message from failed node

yena

Renowned Member
Nov 18, 2011
373
4
83
Hello,
i read this morning this email form one of my nodes,
now,everithig is online, but vps flagged with HA are migrated.
My node "cvs2" show me an uptime of 2 hours ..
I have tried to migarte back on cvs2 and works well..
What's happened ?
----------------------------------------------------------------------------------------------------------
The node 'cvs2' failed and needs manual intervention.

The PVE HA manager tries to fence it and recover the
configured HA resources to a healthy node if possible.

Current fence status: SUCCEED
fencing: acknowledged - got agent lock for node 'cvs2'


Overall Cluster status:
-----------------------

{
"manager_status" : {
"master_node" : "cvs3",
"node_status" : {
"cvs1" : "online",
"cvs2" : "fence",
"cvs3" : "online",
"cvs4" : "online",
"cvs6" : "online",
"cvs7" : "online",
"cvs8" : "online"
},
"service_status" : {
"ct:100" : {
"failed_nodes" : null,
"node" : "cvs2",
"state" : "fence",
"uid" : "uu1Hu1M+rpI1b3WuygdFPg"
},
"ct:101" : {
"node" : "cvs1",
"running" : 1,
"state" : "started",
"uid" : "fMGswovyFFm1p1lkwBjMnw"
},
"ct:103" : {
"node" : "cvs1",
"running" : 1,
"state" : "started",
"uid" : "0GG2x7VewYfMm7agTV0J8A"
},
"ct:104" : {
"node" : "cvs1",
"running" : 1,
"state" : "started",
"uid" : "vXdWYueHPKMqNkmqvZzCzw"
},
"ct:110" : {
"node" : "cvs3",
"running" : 1,
"state" : "started",
"uid" : "XuhQeKNGpY4KqrS7rIGelg"
},
"ct:201" : {
"failed_nodes" : null,
"node" : "cvs2",
"state" : "fence",
"uid" : "iBGa39N6q1lNT38lbpmJuw"
},
"ct:503" : {
"node" : "cvs3",
"running" : 1,
"state" : "started",
"uid" : "C0yGS/LAvJFLiyLn0guL5w"
},
"ct:701" : {
"node" : "cvs7",
"running" : 1,
"state" : "started",
"uid" : "UwBZs2dQTwmSQJ+cjYsX9g"
},
"vm:105" : {
"failed_nodes" : null,
"node" : "cvs2",
"state" : "fence",
"uid" : "GwyV0DBy8O0hCbaKGfaeVw"
},
"vm:109" : {
"node" : "cvs6",
"running" : 1,
"state" : "started",
"uid" : "XY7LtAp6KP5sqOelO0IP1w"
},
"vm:112" : {
"node" : "cvs6",
"running" : 1,
"state" : "started",
"uid" : "m17YGLmTNNoqt/beKHLIJw"
},
"vm:305" : {
"node" : "cvs1",
"running" : 1,
"state" : "started",
"uid" : "NdD3FCyP04KPoDU1dDcr3w"
},
"vm:400" : {
"node" : "cvs4",
"running" : 1,
"state" : "started",
"uid" : "o1TYjJe1fVTiGUHA3df3Hg"
},
"vm:601" : {
"node" : "cvs6",
"running" : 1,
"state" : "started",
"uid" : "LE4FpCqEb7waHmtBn78LMQ"
},
"vm:997" : {
"node" : "cvs4",
"running" : 1,
"state" : "started",
"uid" : "Re6nmnOSU+e4rLHZ/TQkJQ"
}
},
"timestamp" : 1543209890
},
"node_status" : {
"cvs1" : "online",
"cvs2" : "unknown",
"cvs3" : "online",
"cvs4" : "online",
"cvs6" : "online",
"cvs7" : "online",
"cvs8" : "online"
}
}
----------------------------------------------------------------------------------------------------------

Thanks!
 
it seems something went wrong and you node cvs2 self-fenced -> vms get restarted on another node

check the syslog of cvs2 to see what exactly happened
 
I can see thys messages:

Nov 24 06:52:31 cvs2 pve-ha-crm[13970]: service 'ct:405' without node o accedere ai dati cifrati nel disco senza la password corretta anche se esso segue i passi precedenti).
Nov 24 21:32:14 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 47df2e 47df30
Nov 24 21:32:14 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 47df2e 47df30
Nov 24 21:32:14 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 47df2e 47df30 47df37
Nov 24 21:32:17 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 47df65 47df66 47df67 47df68 47df69 47df6a 47df6b 47df6c 47df6d 47df6e 47df74 47df75 47df76
Nov 24 21:32:17 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 47df65 47df66 47df67 47df68 47df69 47df6a 47df6b 47df6c 47df6d 47df6e 47df74 47df75 47df76
Nov 24 21:32:17 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 47df65 47df66 47df67 47df68 47df69 47df6a 47df6b 47df6c 47df6d 47df6e 47df74 47df75 47df76
Nov 24 21:32:49 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 163
Nov 24 21:32:49 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 163
Nov 24 21:32:49 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 163
Nov 26 06:21:26 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 218327 218328 218329 21832a 21832f
Nov 26 06:21:26 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 218327 218328 218329 21832a 21832f
Nov 26 06:21:26 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 218327 218328 218329 21832a 21832f
Nov 26 06:21:26 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 218327 218328 218329 21832a 21832f
Nov 26 06:22:27 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 79 7a
Nov 26 06:22:27 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 79 7a
Nov 26 06:22:27 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 79 7a
Nov 26 06:22:27 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 79 7a
Nov 26 06:22:32 cvs2 corosync[27677]: [TOTEM ] Retransmit List: eb8 eb9 eba ebb ebc ebd ebe ebf ec0 ec1
Nov 26 06:22:32 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: eb8 eb9 eba ebb ebc ebd ebe ebf ec0 ec1
Nov 26 06:22:32 cvs2 corosync[27677]: [TOTEM ] Retransmit List: eb8 eb9 eba ebb ebc ebd ebe ebf ec0 ec1
Nov 26 06:22:32 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: eb8 eb9
Nov 26 06:22:32 cvs2 corosync[27677]: [TOTEM ] Retransmit List: eb8 eb9
Nov 26 06:22:34 cvs2 pve-ha-crm[13970]: status change wait_for_quorum => slave
Nov 26 06:22:34 cvs2 pve-ha-crm[13970]: service 'ct:405' without node
Nov 26 06:24:21 cvs2 systemd-modules-load[1995]: Inserted module 'iscsi_tcp'
Nov 26 06:24:21 cvs2 systemd-modules-load[1995]: Inserted module 'ib_iser'
---- HERE THE REBOOT ------------
Nov 26 06:24:21 cvs2 kernel: [ 0.000000] Linux version 4.15.17-3-pve (tlamprecht@evita) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PVE 4.15.17-14 (Wed, 27 Jun 2018 17:18:05 +0200) ()
Nov 26 06:24:21 cvs2 systemd-modules-load[1995]: Inserted module 'vhost_net'
Nov 26 06:24:21 cvs2 kernel: [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.15.17-3-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet rootdelay=10
Nov 26 06:24:21 cvs2 kernel: [ 0.000000] KERNEL supported cpus:

Thanks!
 
Bongiorno @yena

Nov 24 06:52:31 cvs2 pve-ha-crm[13970]: service 'ct:405' without node o accedere ai dati cifrati nel disco senza la password corretta anche se esso segue i passi precedenti).

Because my italian languge is only from movies , as I can gues, your vm use a encrypted vdisk. When ha will try to restart this vm on another node, it need that somebody to wrote the password that will decrypt the disk!

Bouna fortuna !
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!