New Proxmox system going down frequently

czonin · Jan 7, 2023

Recently started using Proxmox on an Intel NUC10i7FNKN (16GB memory, 500GB SSD). I got Home Assistant and Plex installed and running in VM's and so far they're working great.

One major issue I've been experiencing is that my system goes offline and I need to reboot the hardware to get it back up. I accessed the syslog through

Code:

less /var/log/syslog

but I'm not sure how to find what I need to diagnose the issue. Any help would be appreciated!

Moayad · Jan 9, 2023

Hi,

Can you provide us the Syslog with the time when the system got down?
You can sort the syslog from PVE Web UI.

czonin · Jan 9, 2023

So I've learned a lot, and made a lot of progress since my post, but still experiencing some issues.

I realized that I was able to access the syslog from the UI which made things a lot easier. I dug through it around the times that the system would go down and didn't really see anything that stood out. I thought it could be hardware related and started to monitor utilization/thermals. I installed Netdata and set up sensors.

CPU utilization was very high (consistently over 90%), especially on the Home Assistant VM and I believe that was causing very high CPU temperatures (steadily over 90C). I found out how to change the governor from performance to powersave and repasted the NUC. That helped a lot and I started to see better uptime for Proxmox but it still ended up going down. I eventually tried switching the CPU type in Proxmox for both Home Assistant and Plex from kvm64 to host and that improved things a ton. Now overall CPU utilization is under 20% and thermals hover in the 60-85C range. Now Proxmox has been up for 19 hours where I wouldn't see half of that previously.

The main thing that I'm still experiencing is that my Home Assistant goes down, while the VM is still running. It has happened a couple times now, most recently overnight last night. Syslog from last night is attached. Another thing I noticed that could be related, is that when I try to reboot the VM to get Home Assistant back up, the reboot fails, and I see this in the syslog:

Code:

Jan 09 07:16:03 pve pvedaemon[1111]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:16:06 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:17:01 pve CRON[498327]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jan 09 07:17:01 pve CRON[498328]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 09 07:17:01 pve CRON[498327]: pam_unix(cron:session): session closed for user root
Jan 09 07:17:35 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:17:46 pve pvedaemon[498703]: requesting reboot of VM 100: UPID:pve:00079C0F:00670D52:63BC05EA:qmreboot:100:root@pam:
Jan 09 07:17:46 pve pvedaemon[1111]: <root@pam> starting task UPID:pve:00079C0F:00670D52:63BC05EA:qmreboot:100:root@pam:
Jan 09 07:17:55 pve pvedaemon[1111]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:18:15 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - got timeout
Jan 09 07:18:35 pve pvedaemon[1111]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 09 07:18:54 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 09 07:19:13 pve pvedaemon[1111]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 09 07:27:12 pve pvedaemon[1109]: VM 100 qmp command failed - VM 100 qmp command 'guest-ping' failed - unable to connect to VM 100 qga socket - timeout after 31 retries
Jan 09 07:27:46 pve pvedaemon[498703]: VM 100 qmp command failed - VM 100 qmp command 'guest-shutdown' failed - got timeout
Jan 09 07:27:46 pve pvedaemon[498703]: VM quit/powerdown failed
Jan 09 07:27:46 pve pvedaemon[1111]: <root@pam> end task UPID:pve:00079C0F:00670D52:63BC05EA:qmreboot:100:root@pam: VM quit/powerdown failed
Jan 09 07:30:20 pve pvedaemon[1109]: <root@pam> successful auth for user 'root@pam'

Search

Search

New Proxmox system going down frequently

czonin

New Member

Moayad

Proxmox Staff Member

czonin

New Member

Attachments