I'm experiencing sporadic but reoccurring issues with one of my VMs.
One VM has a super high load average. top works sometimes, htop completely hangs. SSH sometimes works.
top shows basically no CPU usage.
This setup worked without issue for the last 3 months. This started about a week ago.
I did not change anything in the setup during the last 4 weeks. Just regular kernel updates etc.
I have attached a NanoKVM however I can't see how thats affecting a single VM only.
This is the 2nd time in the last week.
Proxmox automatic backup is scheduled at 2:30am so it should not be the culprit.
VM Settings:
CPU: 8 Core (host)
Mem: 32GB (of 64GB in host)

I have quite a lot of docker containers (~40) running in the VM so troubleshooting individual software is gonna be challenging.
To name a few:
- Authentik
- Harbor
- Swag
- Arr Stack + Plex + Jellyfin
- Grafana
- Netbox
- Unifi Controller
- Yourls
samba is running natively.
CPU: 12600k
MB: Z690-P
Proxmox VE 8.2.7
Passthrough PCI devices:
- ConnectX4 Lx Virtual Function
- LSI SAS 9207-8i
- 2x Intel P4511 4TB U.2 SSD
Zabbix can still get metrics (up to a point):
View attachment 75911

This might be interesting:


Okay while writing it seems to have completely locked up... No more metrics, no more SSH.
Proxmox Web Console also not working.
Last few log lines from VM:

CPU usage in web ui is not showing anything strange:
View attachment 75912
No dmesg messages to speak of around the time it happened ( on the proxmox host):
Other occurence zabbix screenshot:

One VM has a super high load average. top works sometimes, htop completely hangs. SSH sometimes works.
top shows basically no CPU usage.
This setup worked without issue for the last 3 months. This started about a week ago.
I did not change anything in the setup during the last 4 weeks. Just regular kernel updates etc.
I have attached a NanoKVM however I can't see how thats affecting a single VM only.
This is the 2nd time in the last week.
Proxmox automatic backup is scheduled at 2:30am so it should not be the culprit.
VM Settings:
CPU: 8 Core (host)
Mem: 32GB (of 64GB in host)

I have quite a lot of docker containers (~40) running in the VM so troubleshooting individual software is gonna be challenging.
To name a few:
- Authentik
- Harbor
- Swag
- Arr Stack + Plex + Jellyfin
- Grafana
- Netbox
- Unifi Controller
- Yourls
samba is running natively.
CPU: 12600k
MB: Z690-P
Proxmox VE 8.2.7
Passthrough PCI devices:
- ConnectX4 Lx Virtual Function
- LSI SAS 9207-8i
- 2x Intel P4511 4TB U.2 SSD
Code:
root@px1:~# uname -a
Linux px1 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64 GNU/Linux
Zabbix can still get metrics (up to a point):
View attachment 75911

This might be interesting:


Okay while writing it seems to have completely locked up... No more metrics, no more SSH.
Proxmox Web Console also not working.
Last few log lines from VM:

CPU usage in web ui is not showing anything strange:
View attachment 75912
No dmesg messages to speak of around the time it happened ( on the proxmox host):
Code:
Oct 07 21:15:55 px1 sudo[855938]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855940]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855932]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855934]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855942]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Oct 07 21:15:55 px1 kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Oct 07 21:15:55 px1 kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Oct 07 21:15:55 px1 kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Oct 07 21:15:55 px1 kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Oct 07 21:15:55 px1 kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Oct 07 21:15:55 px1 sudo[855944]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855931]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855936]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855933]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855935]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855941]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855943]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855945]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855937]: pam_unix(sudo:session): session closed for user root
Oct 07 21:15:55 px1 sudo[855939]: pam_unix(sudo:session): session closed for user root
Oct 07 21:17:01 px1 CRON[856491]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 07 21:17:01 px1 CRON[856492]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 07 21:17:01 px1 CRON[856491]: pam_unix(cron:session): session closed for user root
Oct 07 21:22:31 px1 smartd[20846]: Device: /dev/disk/by-id/ata-CT500MX500SSD1_2208E60D8197 [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 68 to 67
Oct 07 21:24:28 px1 pvedaemon[1996]: <root@pam> successful auth for user 'root@pam'
Oct 07 21:24:31 px1 pvedaemon[1995]: <root@pam> starting task UPID:px1:000D1F17:00A30BA0:6704356F:vncproxy:102:root@pam:
Oct 07 21:24:31 px1 pvedaemon[859927]: starting vnc proxy UPID:px1:000D1F17:00A30BA0:6704356F:vncproxy:102:root@pam:
Oct 07 21:38:40 px1 pvedaemon[1995]: <root@pam> end task UPID:px1:000D1F17:00A30BA0:6704356F:vncproxy:102:root@pam: OK
Oct 07 21:39:25 px1 pvedaemon[1997]: <root@pam> successful auth for user 'root@pam'
Other occurence zabbix screenshot:
