Message from failed node

yena · Nov 26, 2018

Hello,
i read this morning this email form one of my nodes,
now,everithig is online, but vps flagged with HA are migrated.
My node "cvs2" show me an uptime of 2 hours ..
I have tried to migarte back on cvs2 and works well..
What's happened ?
----------------------------------------------------------------------------------------------------------
The node 'cvs2' failed and needs manual intervention.

The PVE HA manager tries to fence it and recover the
configured HA resources to a healthy node if possible.

Current fence status: SUCCEED
fencing: acknowledged - got agent lock for node 'cvs2'

Overall Cluster status:
-----------------------

{
"manager_status" : {
"master_node" : "cvs3",
"node_status" : {
"cvs1" : "online",
"cvs2" : "fence",
"cvs3" : "online",
"cvs4" : "online",
"cvs6" : "online",
"cvs7" : "online",
"cvs8" : "online"
},
"service_status" : {
"ct:100" : {
"failed_nodes" : null,
"node" : "cvs2",
"state" : "fence",
"uid" : "uu1Hu1M+rpI1b3WuygdFPg"
},
"ct:101" : {
"node" : "cvs1",
"running" : 1,
"state" : "started",
"uid" : "fMGswovyFFm1p1lkwBjMnw"
},
"ct:103" : {
"node" : "cvs1",
"running" : 1,
"state" : "started",
"uid" : "0GG2x7VewYfMm7agTV0J8A"
},
"ct:104" : {
"node" : "cvs1",
"running" : 1,
"state" : "started",
"uid" : "vXdWYueHPKMqNkmqvZzCzw"
},
"ct:110" : {
"node" : "cvs3",
"running" : 1,
"state" : "started",
"uid" : "XuhQeKNGpY4KqrS7rIGelg"
},
"ct:201" : {
"failed_nodes" : null,
"node" : "cvs2",
"state" : "fence",
"uid" : "iBGa39N6q1lNT38lbpmJuw"
},
"ct:503" : {
"node" : "cvs3",
"running" : 1,
"state" : "started",
"uid" : "C0yGS/LAvJFLiyLn0guL5w"
},
"ct:701" : {
"node" : "cvs7",
"running" : 1,
"state" : "started",
"uid" : "UwBZs2dQTwmSQJ+cjYsX9g"
},
"vm:105" : {
"failed_nodes" : null,
"node" : "cvs2",
"state" : "fence",
"uid" : "GwyV0DBy8O0hCbaKGfaeVw"
},
"vm:109" : {
"node" : "cvs6",
"running" : 1,
"state" : "started",
"uid" : "XY7LtAp6KP5sqOelO0IP1w"
},
"vm:112" : {
"node" : "cvs6",
"running" : 1,
"state" : "started",
"uid" : "m17YGLmTNNoqt/beKHLIJw"
},
"vm:305" : {
"node" : "cvs1",
"running" : 1,
"state" : "started",
"uid" : "NdD3FCyP04KPoDU1dDcr3w"
},
"vm:400" : {
"node" : "cvs4",
"running" : 1,
"state" : "started",
"uid" : "o1TYjJe1fVTiGUHA3df3Hg"
},
"vm:601" : {
"node" : "cvs6",
"running" : 1,
"state" : "started",
"uid" : "LE4FpCqEb7waHmtBn78LMQ"
},
"vm:997" : {
"node" : "cvs4",
"running" : 1,
"state" : "started",
"uid" : "Re6nmnOSU+e4rLHZ/TQkJQ"
}
},
"timestamp" : 1543209890
},
"node_status" : {
"cvs1" : "online",
"cvs2" : "unknown",
"cvs3" : "online",
"cvs4" : "online",
"cvs6" : "online",
"cvs7" : "online",
"cvs8" : "online"
}
}
----------------------------------------------------------------------------------------------------------

Thanks!

dcsapak · Nov 26, 2018

it seems something went wrong and you node cvs2 self-fenced -> vms get restarted on another node

check the syslog of cvs2 to see what exactly happened

yena · Nov 26, 2018

I can see thys messages:

Nov 24 06:52:31 cvs2 pve-ha-crm[13970]: service 'ct:405' without node o accedere ai dati cifrati nel disco senza la password corretta anche se esso segue i passi precedenti).
Nov 24 21:32:14 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 47df2e 47df30
Nov 24 21:32:14 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 47df2e 47df30
Nov 24 21:32:14 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 47df2e 47df30 47df37
Nov 24 21:32:17 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 47df65 47df66 47df67 47df68 47df69 47df6a 47df6b 47df6c 47df6d 47df6e 47df74 47df75 47df76
Nov 24 21:32:17 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 47df65 47df66 47df67 47df68 47df69 47df6a 47df6b 47df6c 47df6d 47df6e 47df74 47df75 47df76
Nov 24 21:32:17 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 47df65 47df66 47df67 47df68 47df69 47df6a 47df6b 47df6c 47df6d 47df6e 47df74 47df75 47df76
Nov 24 21:32:49 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 163
Nov 24 21:32:49 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 163
Nov 24 21:32:49 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 163
Nov 26 06:21:26 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 218327 218328 218329 21832a 21832f
Nov 26 06:21:26 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 218327 218328 218329 21832a 21832f
Nov 26 06:21:26 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 218327 218328 218329 21832a 21832f
Nov 26 06:21:26 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 218327 218328 218329 21832a 21832f
Nov 26 06:22:27 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 79 7a
Nov 26 06:22:27 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 79 7a
Nov 26 06:22:27 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 79 7a
Nov 26 06:22:27 cvs2 corosync[27677]: [TOTEM ] Retransmit List: 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 79 7a
Nov 26 06:22:32 cvs2 corosync[27677]: [TOTEM ] Retransmit List: eb8 eb9 eba ebb ebc ebd ebe ebf ec0 ec1
Nov 26 06:22:32 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: eb8 eb9 eba ebb ebc ebd ebe ebf ec0 ec1
Nov 26 06:22:32 cvs2 corosync[27677]: [TOTEM ] Retransmit List: eb8 eb9 eba ebb ebc ebd ebe ebf ec0 ec1
Nov 26 06:22:32 cvs2 corosync[27677]: notice [TOTEM ] Retransmit List: eb8 eb9
Nov 26 06:22:32 cvs2 corosync[27677]: [TOTEM ] Retransmit List: eb8 eb9
Nov 26 06:22:34 cvs2 pve-ha-crm[13970]: status change wait_for_quorum => slave
Nov 26 06:22:34 cvs2 pve-ha-crm[13970]: service 'ct:405' without node
Nov 26 06:24:21 cvs2 systemd-modules-load[1995]: Inserted module 'iscsi_tcp'
Nov 26 06:24:21 cvs2 systemd-modules-load[1995]: Inserted module 'ib_iser'
---- HERE THE REBOOT ------------
Nov 26 06:24:21 cvs2 kernel: [ 0.000000] Linux version 4.15.17-3-pve (tlamprecht@evita) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PVE 4.15.17-14 (Wed, 27 Jun 2018 17:18:05 +0200) ()
Nov 26 06:24:21 cvs2 systemd-modules-load[1995]: Inserted module 'vhost_net'
Nov 26 06:24:21 cvs2 kernel: [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.15.17-3-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet rootdelay=10
Nov 26 06:24:21 cvs2 kernel: [ 0.000000] KERNEL supported cpus:

Thanks!

guletz · Nov 26, 2018

Bongiorno @yena

yena said:
Nov 24 06:52:31 cvs2 pve-ha-crm[13970]: service 'ct:405' without node o accedere ai dati cifrati nel disco senza la password corretta anche se esso segue i passi precedenti).

Because my italian languge is only from movies , as I can gues, your vm use a encrypted vdisk. When ha will try to restart this vm on another node, it need that somebody to wrote the password that will decrypt the disk!

Bouna fortuna !

Search

Search

Message from failed node

yena

Renowned Member

dcsapak

Proxmox Staff Member

yena

Renowned Member

guletz

Distinguished Member

We value your privacy