Some VMs are randomly shutting down

Bradley Gebhardt

New Member
Mar 31, 2023
3
0
1
Hi,

I have quite a few ubuntu 22 VMs running and some of them will just randomly turn off.
Proxmomx has 16GB of ram, a Intel(R) Xeon(R) CPU E3-1230 v6 CPU and a 1TB Hard Drive
Kernel version Linux 5.15.60-1-pve #1 SMP PVE 5.15.60-1

The shutdowns aren't listed in the VM's task history making me believe something is going wrong somewhere.

The VMs that have shut down are a VM hosting just a postgres DB, another hosting a syslog server that writes to that postgres DB and another hosting ansible awx.
I've looked through the journalctl for the syslog vm and looked at the last boot before the shutdown yesterday and don't see anything that would indicate a fatal error, just lots of [preauth] messages. Hoping someone can point me in the right direction

Thanks
 
Sorry for the very late reply from me, been dealing with moving infrastructure around so I haven't gotten to looking too deep into this.

Running cat /var/log/syslog | grep oom only returned minikube messages related to memory issues when trying to start the service.

I just had another VM randomly turn off on a new proxmox server that was installed recently, running version 7.4-3. The VE has 32GB of RAM available and an 8 core intel Xeon processo. The VM is running Mikrotik routeros and was started earlier today. It has 1 core allocated and 512mb of ram. There is a similar one running on the other server that hasn't turned off yet.

I have included some of the VE syslogs around the time that the VM turned off. I had opened up the console in the browser and then a few minutes later the VM turned off. Hope that helps.

Code:
Apr 13 14:46:14 cptx01 pveproxy[856227]: worker exit
Apr 13 14:46:16 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:46:27 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:46:36 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:46:47 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:46:56 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:06 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:16 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:26 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:37 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:46 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:57 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:06 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:17 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:26 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:31 cptx01 pvedaemon[856830]: starting vnc proxy UPID:cptx01:000D12FE:011A50C5:6437FA1F:vncproxy:1000:root@pam:
Apr 13 14:48:31 cptx01 pvedaemon[824763]: <root@pam> starting task UPID:cptx01:000D12FE:011A50C5:6437FA1F:vncproxy:1000:root@pam:
Apr 13 14:48:37 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:46 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:56 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:06 cptx01 pveproxy[1094]: worker 850714 finished
Apr 13 14:49:06 cptx01 pveproxy[1094]: starting 1 worker(s)
Apr 13 14:49:06 cptx01 pveproxy[1094]: worker 857018 started
Apr 13 14:49:06 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:10 cptx01 pveproxy[857016]: got inotify poll request in wrong process - disabling inotify
Apr 13 14:49:16 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:27 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:35 cptx01 kernel: [185087.073496] fwbr1000i0: port 2(tap1000i0) entered disabled state
Apr 13 14:49:35 cptx01 kernel: [185087.120889] fwbr1000i0: port 1(fwln1000i0) entered disabled state
Apr 13 14:49:35 cptx01 kernel: [185087.121014] vmbr0: port 8(fwpr1000p0) entered disabled state
Apr 13 14:49:35 cptx01 kernel: [185087.123953] device fwln1000i0 left promiscuous mode
Apr 13 14:49:35 cptx01 kernel: [185087.123960] fwbr1000i0: port 1(fwln1000i0) entered disabled state
Apr 13 14:49:35 cptx01 kernel: [185087.158823] device fwpr1000p0 left promiscuous mode
Apr 13 14:49:35 cptx01 kernel: [185087.158830] vmbr0: port 8(fwpr1000p0) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.836300] fwbr1000i1: port 2(tap1000i1) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.867817] fwbr1000i1: port 1(fwln1000i1) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.867890] vmbr1: port 5(fwpr1000p1) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.868043] device fwln1000i1 left promiscuous mode
Apr 13 14:49:36 cptx01 kernel: [185087.868046] fwbr1000i1: port 1(fwln1000i1) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.902750] device fwpr1000p1 left promiscuous mode
Apr 13 14:49:36 cptx01 kernel: [185087.902757] vmbr1: port 5(fwpr1000p1) entered disabled state
Apr 13 14:49:36 cptx01 qmeventd[750]: read: Connection reset by peer
Apr 13 14:49:36 cptx01 pvedaemon[824763]: VM 1000 qmp command failed - unable to open monitor socket
Apr 13 14:49:36 cptx01 pvestatd[1058]: VM 1000 qmp command failed - VM 1000 not running
Apr 13 14:49:37 cptx01 pvedaemon[824763]: <root@pam> end task UPID:cptx01:000D12FE:011A50C5:6437FA1F:vncproxy:1000:root@pam: OK
Apr 13 14:49:37 cptx01 systemd[1]: 1000.scope: Succeeded.
Apr 13 14:49:37 cptx01 systemd[1]: 1000.scope: Consumed 28min 28.656s CPU time.
Apr 13 14:49:37 cptx01 pveproxy[857016]: worker exit
Apr 13 14:49:37 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:38 cptx01 qmeventd[857173]: Starting cleanup for 1000
Apr 13 14:49:38 cptx01 qmeventd[857173]: Finished cleanup for 1000
 
Sorry for the very late reply from me, been dealing with moving infrastructure around so I haven't gotten to looking too deep into this.

Running cat /var/log/syslog | grep oom only returned minikube messages related to memory issues when trying to start the service.

I just had another VM randomly turn off on a new proxmox server that was installed recently, running version 7.4-3. The VE has 32GB of RAM available and an 8 core intel Xeon processo. The VM is running Mikrotik routeros and was started earlier today. It has 1 core allocated and 512mb of ram. There is a similar one running on the other server that hasn't turned off yet.

I have included some of the VE syslogs around the time that the VM turned off. I had opened up the console in the browser and then a few minutes later the VM turned off. Hope that helps.

Code:
Apr 13 14:46:14 cptx01 pveproxy[856227]: worker exit
Apr 13 14:46:16 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:46:27 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:46:36 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:46:47 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:46:56 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:06 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:16 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:26 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:37 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:46 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:47:57 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:06 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:17 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:26 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:31 cptx01 pvedaemon[856830]: starting vnc proxy UPID:cptx01:000D12FE:011A50C5:6437FA1F:vncproxy:1000:root@pam:
Apr 13 14:48:31 cptx01 pvedaemon[824763]: <root@pam> starting task UPID:cptx01:000D12FE:011A50C5:6437FA1F:vncproxy:1000:root@pam:
Apr 13 14:48:37 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:46 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:48:56 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:06 cptx01 pveproxy[1094]: worker 850714 finished
Apr 13 14:49:06 cptx01 pveproxy[1094]: starting 1 worker(s)
Apr 13 14:49:06 cptx01 pveproxy[1094]: worker 857018 started
Apr 13 14:49:06 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:10 cptx01 pveproxy[857016]: got inotify poll request in wrong process - disabling inotify
Apr 13 14:49:16 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:27 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:35 cptx01 kernel: [185087.073496] fwbr1000i0: port 2(tap1000i0) entered disabled state
Apr 13 14:49:35 cptx01 kernel: [185087.120889] fwbr1000i0: port 1(fwln1000i0) entered disabled state
Apr 13 14:49:35 cptx01 kernel: [185087.121014] vmbr0: port 8(fwpr1000p0) entered disabled state
Apr 13 14:49:35 cptx01 kernel: [185087.123953] device fwln1000i0 left promiscuous mode
Apr 13 14:49:35 cptx01 kernel: [185087.123960] fwbr1000i0: port 1(fwln1000i0) entered disabled state
Apr 13 14:49:35 cptx01 kernel: [185087.158823] device fwpr1000p0 left promiscuous mode
Apr 13 14:49:35 cptx01 kernel: [185087.158830] vmbr0: port 8(fwpr1000p0) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.836300] fwbr1000i1: port 2(tap1000i1) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.867817] fwbr1000i1: port 1(fwln1000i1) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.867890] vmbr1: port 5(fwpr1000p1) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.868043] device fwln1000i1 left promiscuous mode
Apr 13 14:49:36 cptx01 kernel: [185087.868046] fwbr1000i1: port 1(fwln1000i1) entered disabled state
Apr 13 14:49:36 cptx01 kernel: [185087.902750] device fwpr1000p1 left promiscuous mode
Apr 13 14:49:36 cptx01 kernel: [185087.902757] vmbr1: port 5(fwpr1000p1) entered disabled state
Apr 13 14:49:36 cptx01 qmeventd[750]: read: Connection reset by peer
Apr 13 14:49:36 cptx01 pvedaemon[824763]: VM 1000 qmp command failed - unable to open monitor socket
Apr 13 14:49:36 cptx01 pvestatd[1058]: VM 1000 qmp command failed - VM 1000 not running
Apr 13 14:49:37 cptx01 pvedaemon[824763]: <root@pam> end task UPID:cptx01:000D12FE:011A50C5:6437FA1F:vncproxy:1000:root@pam: OK
Apr 13 14:49:37 cptx01 systemd[1]: 1000.scope: Succeeded.
Apr 13 14:49:37 cptx01 systemd[1]: 1000.scope: Consumed 28min 28.656s CPU time.
Apr 13 14:49:37 cptx01 pveproxy[857016]: worker exit
Apr 13 14:49:37 cptx01 pvestatd[1058]: no such logical volume pve/data
Apr 13 14:49:38 cptx01 qmeventd[857173]: Starting cleanup for 1000
Apr 13 14:49:38 cptx01 qmeventd[857173]: Finished cleanup for 1000

The VM is now stopping a few seconds after booting, even after doubling both the CPU cores and RAM amount
 
- Did you modify the Proxmox host disk layout in any way, for example deleting partitions to free up diskspace?
(if so, maybe Proxmox needs a swap partition that doesn't exist).

- Have you tried changing the processor type from default (kvm64) to "host" in the properties of the vm?