PVE 5.2-5 killed VM without any logs

Jan 21, 2016
97
8
73
44
Germany
www.pug.org
hi,

we had a very strange issue. A VM just disapears / stopped and the only thing we found, in kern.log

"vmbr101: port2(tap104i0) entered disabled state"

Nothing more, either in the remote syslog, nor in the local log files from the PVE 5.2-5 or in the VM itself.
The VM itself is the 2nd MariaDB (Galera) cluster member ... and a few minutes ago, we stopped the 3rd member.

All other VMs had no problems ...

Code:
pveversion
pve-manager/5.2-5/eb24855a (running kernel: 4.15.18-1-pve)


 pvecm status
Quorum information
------------------
Date:             Fri Nov  2 12:31:35 2018
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1/220
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.3.0.17
0x00000002          1 10.3.0.18 (local)

we have no clue, what happens.
 
Nothing more, either in the remote syslog, nor in the local log files from the PVE 5.2-5 or in the VM itself.
The VM itself is the 2nd MariaDB (Galera) cluster member ... and a few minutes ago, we stopped the 3rd member.
Is the Galera cluster capable of shutting down a member (eg. fencing)?
 
This seems then more like a normal shutdown. Maybe there is something to see on the other Galera nodes. Another idea, maybe a OOM on the PVE node?
 
Hi,

that we tought too, but also ... there was nothing in the logs. A normal shutdown would be send data to syslog or our sidecar agent (Graylog). It looks like more than a "qm stop <id>", but needless to say .. there was no qm command also in our snoopy.log.
 
You might want to bump up the logging, inisde the VMs. As for now, there seem to be many possibilities with no lead.