Hi all!
I'm at my wit's end trying to figure out an issue I've been troubleshooting for weeks. Posting here in the hopes that someone else has some fresh ideas I haven't tried yet.
I have a Ubuntu VM that will become unresponsive after a seemingly random amount of time. I start the VM and it may be up for days, hours, or even minutes before something happens and I can no longer SSH into it or reach any of the services hosted on it.
The VM is Ubuntu Server 22.04 with only Docker, NFS, and Samba installed. I've also tried the 22.04.1 update which did not seem to change the behavior. Everything else on the VM is running inside docker containers. Below I will list all of the docker containers and their versions, along with the hardware I'm running and other data.
I have an Uptime Robot monitor that hits my instance of Portainer running on the VM, so I've been able to know exactly when the VM goes dark within 5 minutes or so. It's truly random, and never seems to happen at the same time. The VM also stays responsive for wildly different amounts of time each time I restart it, from minutes to days.
When it goes down I can no longer access web UIs from the docker containers running on it, like Portainer, and I can not SSH into it. The Proxmox instance running on the box remains responsive. Attempting to use the console inside Proxmox does seem to log in, but the console itself is unresponsive and I cannot type anything into the terminal. I have 3 cores allocated to the VM and once it freezes the CPU stays consistent at 33%, so it appears maybe a rouge single-threaded process is pegging one of the cores when the issue occurs. Logs in syslog stop when the VM freezes and there's nothing related to the issue in the logs.
Until the VM stops responding, everything on the VM seems to work great.
I have re-built this VM numerous times over the course of the last few weeks, and I've even tried a different machine all together. I've also tried manually updating the Ubuntu kernel to no effect. I've used both the cloud-init image of Ubuntu as well as the vanilla image that I ran through the setup wizard on myself.
I found a similar issue to this in this thread, but the users here are on Ubuntu 20.04:
https://forum.proxmox.com/threads/ubuntu-20-04-04-machine-freezes.112507
Short of a different distro of Linux or ditching Proxmox and running Ubuntu bare metal, I'm not really sure what else to try here. Please let me know if you have any ideas, they're much appreciated.
Details and specs on the hardware/software:
Box 1:
Mele QuieterQ3
Intel Celeron N5105
8GB LPDDR4x RAM
500GB Western Digital SN570 m.2 SSD
Box 2:
Beelink U59
Intel Celeron N5105
16GB DDR4 RAM
500GB Western Digital SA510 SATA SSD
Proxmox VE v7.2-3
Ubuntu Server 22.04/22.04.1
Kernels Tried:
Installed applications (apt):
Docker containers:
				
			I'm at my wit's end trying to figure out an issue I've been troubleshooting for weeks. Posting here in the hopes that someone else has some fresh ideas I haven't tried yet.
I have a Ubuntu VM that will become unresponsive after a seemingly random amount of time. I start the VM and it may be up for days, hours, or even minutes before something happens and I can no longer SSH into it or reach any of the services hosted on it.
The VM is Ubuntu Server 22.04 with only Docker, NFS, and Samba installed. I've also tried the 22.04.1 update which did not seem to change the behavior. Everything else on the VM is running inside docker containers. Below I will list all of the docker containers and their versions, along with the hardware I'm running and other data.
I have an Uptime Robot monitor that hits my instance of Portainer running on the VM, so I've been able to know exactly when the VM goes dark within 5 minutes or so. It's truly random, and never seems to happen at the same time. The VM also stays responsive for wildly different amounts of time each time I restart it, from minutes to days.
When it goes down I can no longer access web UIs from the docker containers running on it, like Portainer, and I can not SSH into it. The Proxmox instance running on the box remains responsive. Attempting to use the console inside Proxmox does seem to log in, but the console itself is unresponsive and I cannot type anything into the terminal. I have 3 cores allocated to the VM and once it freezes the CPU stays consistent at 33%, so it appears maybe a rouge single-threaded process is pegging one of the cores when the issue occurs. Logs in syslog stop when the VM freezes and there's nothing related to the issue in the logs.
Until the VM stops responding, everything on the VM seems to work great.
I have re-built this VM numerous times over the course of the last few weeks, and I've even tried a different machine all together. I've also tried manually updating the Ubuntu kernel to no effect. I've used both the cloud-init image of Ubuntu as well as the vanilla image that I ran through the setup wizard on myself.
I found a similar issue to this in this thread, but the users here are on Ubuntu 20.04:
https://forum.proxmox.com/threads/ubuntu-20-04-04-machine-freezes.112507
Short of a different distro of Linux or ditching Proxmox and running Ubuntu bare metal, I'm not really sure what else to try here. Please let me know if you have any ideas, they're much appreciated.
Details and specs on the hardware/software:
Box 1:
Mele QuieterQ3
Intel Celeron N5105
8GB LPDDR4x RAM
500GB Western Digital SN570 m.2 SSD
Box 2:
Beelink U59
Intel Celeron N5105
16GB DDR4 RAM
500GB Western Digital SA510 SATA SSD
Proxmox VE v7.2-3
Ubuntu Server 22.04/22.04.1
Kernels Tried:
- 5.15.0-47-generic (Ubuntu 22.04/22.04.1)
- 5.19.3-051903-generic (Ubuntu 22.04.1)
Installed applications (apt):
- qemu-guest-agent
- aptitude
- apt-transport-https
- ca-certificates
- curl
- software-properties-common
- python3-pip
- virtualenv
- python3-setuptools
- docker-ce
- nfs-kernel-server
- nfs-common
- samba
Docker containers:
- traefik:latest
- portainer/portainer-ce:latest
- timothyjmiller/cloudflare-ddns:latest
- mariadb:10.1
- filerun/filerun:latest
- qmcgaw/gluetun:latest
- lscr.io/linuxserver/homeassistant:latest
- netdata/netdata:stable
- lscr.io/linuxserver/overseerr:latest
- lscr.io/linuxserver/prowlarr:develop
- lscr.io/linuxserver/qbittorrent:4.4.0
- lscr.io/linuxserver/radarr:latest
- lscr.io/linuxserver/sonarr:latest
- louislam/uptime-kuma:1.17.1
- lscr.io/linuxserver/wireguard:latest
 
	 
	