[HELP] Proxmox Linux VM crashing randomly with no logs or way to diagnose. I've tried everything!

rursache

New Member
Jul 7, 2022
12
5
3
Romania
radu.ursache.ro
Hey guys,

I'm pulling my hair out for over a week already trying to troubleshoot a very weird issue.

I'm running Proxmox on a Intel NUC 11 (NUC11ATKPE) with 32GB RAM and 256GB SSD.

I have 2 debian VMs:
  1. A light one running only AdGuardHome in docker (1 cpu, 1gb ram, 15gb storage) which works without any issue (5 days+ uptime)
  2. Another one running 13 docker containers, smb, 4 passthrough USBs (2 HDDs passed to SMB, a zigbee adapter and a usb audio card) with 4 cpu cores, 8gb ram, 80gb storage.
The second VM crashes randomly anywhere between 2 hours and 2 days of uptime. The crash is actually a complete freeze/lockup of the machine: no ping, no mouse interaction from Proxmox console, no VNC access, no nothing. I must hard reset the VM for it to work again.

I suspected memory at first however the RAM usage stays under 40% at all times.

There are no logs in either Proxmox syslog or the entire /var/log directory on the VM

Here is the VM config and screenshots of the settings and hardware tabs of the VM with issues

What I tried and didn't fixed it:
  • Multiple Linux distros from Ubuntu Server to normal Ubuntu, Linux Mint, Ubuntu Mate, Debian without DE, Debian with Cinnamon, Debian with LXDE. I'm currently running Ubuntu Mate.
  • Setting tdp_mmu as per Proxmox documentation
  • Updating the kernel in VM and in Proxmox to multiple versions from 5.4 to 5.18
  • Increasing swap size
  • Disabling ballooning in memory options
  • Switching machine mode from q35 to i440fx
  • Changing processor type from host to kvm64 and qemu64
  • A memtest at host level - all good
I currently have a cronjob in Proxmox to ping the VM each minute and if the ping fails it will reset it. However this breaks any SMB file transfer currently in progress and other things like my HomeAssistant automations, different scripts or ongoing torrent downloads.

EDIT: All the things running in the "crashing VM" were running for 1 year+ on a Raspberry Pi 4 without any issues. Same configs, containers and paths. I manually migrated them one by one.
EDIT 2: Here is a list of all the running docker containers
EDIT 3: Here is the `/var/log` directory of the VM

Any help or ideas?

Thanks!
 
Last edited:
I can't think of anything specific right now, but I would stop the containers and unplug the passthroughs, and see if the issue persists. If not, progressively enable them to find the culprit.
 
  • Like
Reactions: rursache
I can't think of anything specific right now, but I would stop the containers and unplug the passthroughs, and see if the issue persists. If not, progressively enable them to find the culprit.
Thanks for your answer.

I 100% relay on those passthroughs, if those are the culprits I might just as well go bare-metal.

I'll try stopping the docker containers and see what's up.
 
i think it's a problem between the cpu model and the container.

I have the same problem.

before i have Intel Apollo Lake N3450 / with debian Vm with docker for bitwarden.E verything was working as expected


i migrate on n5105 ( the same cpu branch a you ) and the vm crash . So far 2 times in two days, no logs.

i have no other vm with docker ( proxy / mails/ web serveur / mail gateway / pbs ... ) and every is fine for those;

i see in another topic that a new non official kernel can do better with this CPU..
 
  • Like
Reactions: rursache

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!