New Proxmox system going down frequently

czonin

New Member
Jan 7, 2023
2
0
1
Recently started using Proxmox on an Intel NUC10i7FNKN (16GB memory, 500GB SSD). I got Home Assistant and Plex installed and running in VM's and so far they're working great.

One major issue I've been experiencing is that my system goes offline and I need to reboot the hardware to get it back up. I accessed the syslog through
Code:
less /var/log/syslog
but I'm not sure how to find what I need to diagnose the issue. Any help would be appreciated!
 
Hi,

Can you provide us the Syslog with the time when the system got down?
You can sort the syslog from PVE Web UI.
 
So I've learned a lot, and made a lot of progress since my post, but still experiencing some issues.

I realized that I was able to access the syslog from the UI which made things a lot easier. I dug through it around the times that the system would go down and didn't really see anything that stood out. I thought it could be hardware related and started to monitor utilization/thermals. I installed Netdata and set up sensors.

CPU utilization was very high (consistently over 90%), especially on the Home Assistant VM and I believe that was causing very high CPU temperatures (steadily over 90C). I found out how to change the governor from performance to powersave and repasted the NUC. That helped a lot and I started to see better uptime for Proxmox but it still ended up going down. I eventually tried switching the CPU type in Proxmox for both Home Assistant and Plex from kvm64 to host and that improved things a ton. Now overall CPU utilization is under 20% and thermals hover in the 60-85C range. Now Proxmox has been up for 19 hours where I wouldn't see half of that previously.

The main thing that I'm still experiencing is that my Home Assistant goes down, while the VM is still running. It has happened a couple times now, most recently overnight last night. Syslog from last night is attached. Another thing I noticed that could be related, is that when I try to reboot the VM to get Home Assistant back up, the reboot fails, and I see this in the syslog:

Code:
Jan 09 07:16:03 pve pvedaemon[1111]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:16:06 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:17:01 pve CRON[498327]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jan 09 07:17:01 pve CRON[498328]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 09 07:17:01 pve CRON[498327]: pam_unix(cron:session): session closed for user root
Jan 09 07:17:35 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:17:46 pve pvedaemon[498703]: requesting reboot of VM 100: UPID:pve:00079C0F:00670D52:63BC05EA:qmreboot:100:root@pam:
Jan 09 07:17:46 pve pvedaemon[1111]: <root@pam> starting task UPID:pve:00079C0F:00670D52:63BC05EA:qmreboot:100:root@pam:
Jan 09 07:17:55 pve pvedaemon[1111]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:18:15 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:18:35 pve pvedaemon[1111]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 09 07:18:54 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 09 07:19:13 pve pvedaemon[1111]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 09 07:27:12 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 09 07:27:46 pve pvedaemon[498703]: VM 100 qmp command failed - VM 100 qmp command 'guest-shutdown' failed - got timeout
Jan 09 07:27:46 pve pvedaemon[498703]: VM quit/powerdown failed
Jan 09 07:27:46 pve pvedaemon[1111]: <root@pam> end task UPID:pve:00079C0F:00670D52:63BC05EA:qmreboot:100:root@pam: VM quit/powerdown failed
Jan 09 07:30:20 pve pvedaemon[1109]: <root@pam> successful auth for user 'root@pam'
 

Attachments

  • syslog-1-9-2023.txt
    50.5 KB · Views: 1
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!