[SOLVED] Proxmox VMs and everything else greyed out

nekome

New Member
Jun 5, 2022
6
1
3
1658269357437.png

Has anyone had this happen before?

Linux pve 5.15.39-1-pve #1 SMP PVE 5.15.39-1 (Wed, 22 Jun 2022 17:22:00 +0200) x86_64

This happens after some use. I can ssh into VMs and they work fine but PVE UI is greyed out.
CPU usage is moving, but I cannot view VMs (only SSH).

I have to restart PVE to get it working fine but then it happens again.
 
Last edited:
I also have portainer, pfSense, truenas and pihole running on proxmox. Could those be causing issues or could there be a port collision?
 
Can you check if there are errors in the log of pvestatd: journalctl -u pvestatd.

Some anecdotal things I have heard is that one of your storages might not be available, something like a NFS or CIFS/samba.
 
Can you check if there are errors in the log of pvestatd: journalctl -u pvestatd.

Some anecdotal things I have heard is that one of your storages might not be available, something like a NFS or CIFS/samba.
Thanks for the suggestion.

I should mention I have Graphite metrics server set as a container in a portainer server which Proxmox sends metrics to.
There are a lots of lock errors:
Untitled.png
 
Jul 20 12:08:55 pve pvestatd[3120]: status update time (6.536 seconds)
Jul 20 12:08:58 pve pvestatd[3120]: storage 'truenas-pve' is not online
Jul 20 12:08:58 pve pvestatd[3120]: storage 'truenas-plain' is not online
Jul 20 12:41:18 pve pvestatd[3120]: node status update error: metrics send error 'portainer-graphite': failed to send metrics: Connection refused
Jul 20 12:41:18 pve pvestatd[3120]: qemu status update error: metrics send error 'portainer-graphite': failed to send metrics: Connection refused
Jul 20 12:41:28 pve pvestatd[3120]: node status update error: metrics send error 'portainer-graphite': failed to send metrics: Connection refused
Jul 20 12:41:28 pve pvestatd[3120]: qemu status update error: metrics send error 'portainer-graphite': failed to send metrics: Connection refused

Basically 20 minutes after startup, it starts to send connection refused messages (the graphite server could legit be off)

And 1h 20m after that it starts logging too many open files

Jul 20 14:05:08 pve pvestatd[3120]: command 'lxc-info -n 105 -p' failed: open3: pipe(GLOB(0x5570a439d478), GLOB(0x5570a43af678)) failed: Too many open files at /usr/share/p>
Jul 20 14:05:08 pve pvestatd[3120]: zfs error: open3: pipe(GLOB(0x5570a43aa2d8), GLOB(0x5570a43acad0)) failed: Too many open files at /usr/share/perl5/PVE/Tools.pm line 455.
Jul 20 14:05:08 pve pvestatd[3120]: storage 'truenas-plain' is not online
Jul 20 14:05:08 pve pvestatd[3120]: storage 'truenas-pve' is not online
Jul 20 14:05:08 pve pvestatd[3120]: storage 'synology-pve' is not online
Jul 20 14:05:08 pve pvestatd[3120]: command 'lxc-info -n 105 -p' failed: open3: pipe(GLOB(0x5570a436b548), GLOB(0x5570a43b92e8)) failed: Too many open files at /usr/share/p>
Jul 20 14:05:18 pve pvestatd[3120]: node status update error: metrics send error 'portainer-graphite': failed to send metrics: Connection refused
Jul 20 14:05:18 pve pvestatd[3120]: qemu status update error: metrics send error 'portainer-graphite': failed to send metrics: Connection refused
Jul 20 14:05:18 pve pvestatd[3120]: command 'lxc-info -n 105 -p' failed: open3: pipe(IO::File=GLOB(0x5570a43b2978), GLOB(0x5570a43a4998)) failed: Too many open files at /us>
Jul 20 14:05:18 pve pvestatd[3120]: storage 'truenas-plain' is not online
Jul 20 14:05:18 pve pvestatd[3120]: storage 'truenas-pve' is not online
Jul 20 14:05:18 pve pvestatd[3120]: zfs error: open3: pipe(IO::File=GLOB(0x5570a43a4998), GLOB(0x5570a43b0c68)) failed: Too many open files at /usr/share/perl5/PVE/Tools.pm>
Jul 20 14:05:18 pve pvestatd[3120]: storage 'synology-pve' is not online

Soon after that, starts only logging too many open files

Jul 20 14:05:48 pve pvestatd[266841]: can't open '/proc/mounts' - Too many open files
Jul 20 14:05:48 pve pvestatd[3120]: can't open '/proc/mounts' - Too many open files
Jul 20 14:05:48 pve pvestatd[266841]: storage status update error: Unrecognised protocol udp at /usr/share/perl5/PVE/Status/Graphite.pm line 104.
Jul 20 14:05:48 pve pvestatd[3120]: storage status update error: Unrecognised protocol udp at /usr/share/perl5/PVE/Status/Graphite.pm line 104.
Jul 20 14:05:48 pve pvestatd[3120]: Use of uninitialized value $line in pattern match (m//) at /usr/share/perl5/PVE/ProcFSTools.pm line 322.
Jul 20 14:05:48 pve pvestatd[266841]: Use of uninitialized value $line in pattern match (m//) at /usr/share/perl5/PVE/ProcFSTools.pm line 322.
Jul 20 14:05:58 pve pvestatd[266356]: ipcc_send_rec[1] failed: Too many open files
Jul 20 14:05:58 pve pvestatd[266356]: ipcc_send_rec[2] failed: Too many open files
Jul 20 14:05:58 pve pvestatd[266356]: ipcc_send_rec[3] failed: Too many open files
Jul 20 14:05:58 pve pvestatd[266841]: ipcc_send_rec[1] failed: Too many open files
Jul 20 14:05:58 pve pvestatd[266356]: ipcc_send_rec[4] failed: Too many open files
Jul 20 14:05:58 pve pvestatd[266841]: ipcc_send_rec[2] failed: Too many open files
Jul 20 14:05:58 pve pvestatd[3120]: can't lock file '/var/log/pve/tasks/.active.lock' - can't open file - Too many open files
Jul 20 14:05:58 pve pvestatd[3120]: Use of uninitialized value $line in pattern match (m//) at /usr/share/perl5/PVE/ProcFSTools.pm line 128.
Jul 20 14:05:58 pve pvestatd[3120]: Use of uninitialized value in subtraction (-) at /usr/share/perl5/PVE/ProcFSTools.pm line 215.
Jul 20 14:05:58 pve pvestatd[266356]: status update error: Too many open files
Jul 20 14:05:58 pve pvestatd[266841]: ipcc_send_rec[3] failed: Too many open files
 
Last edited:
Looks a bit like pvestatd is crashing because it can't get a file lock. This would explain the question marks. Since when are you seeing the question marks?
 
About 1hour 20minutes after start up. During that time I get lots of "qemu status update error: metrics send error 'portainer-graphite': failed to send metrics: Connection refused" messages.

I think it might be some kind of connection cleanup issue if graphite metrics server is unavailable. I'll try removing graphite monitoring to see will it make a change.

I've put another message with log details but it's waiting on mod approval.
 
So the issue seems to be graphite server not being reachable.
I've switched the graphite server to TCP and let it run and it worked fine.
After I turned off the server, after some time, proxmox UI got greyed out.
After I removed the graphite server from Datacenter / Metric server, the UI got alive again.
 
  • Like
Reactions: shrdlicka
Nekome, dejame decirte que tu hilo me sirvio bastante, estaba mirando el metric server en la semana y veo que despues empezo uno de mis nodos a presentar problemas, despues de un fin de semana reiniciando pvestatd de forma manual y muchas veces las VM, con tu hilo pude encontrar e identificar que era el metric server que no podia comunicarse con su host, PVE creo que un metric server no deberia de crearnos semejante problema, es algo a resolver por parte de proxmox, pero no me enojo, igual no le cambio por nada jaja
 
Are there any plans to fix this bug?

Im having the same problem when one of my Backup-NAS is offline. Which is a normal case, because this NAS is only for weekly backups, and not intended to run all the time. It has an auto-shutdown and auto-wakeup powerplan for once in a week.

Since we migrated from Vmware to Proxmox, this NAS needs to run all the time, even if its not in use. Just because of this problem.
Which is annying. - It eats plenty of electricity.