Status of VMs and LXCs disappering

deltamikealpha · Jun 27, 2023

Since updating to PVE 8, I've been having an issue that appears to be vzdump related, or an issue that's stopping vzdump executing. The more I've thought about it I suspect it's the latter.

At between 12-12:30 for the last couple of days, the status' of all my VMs and containers just disappears. I restarted pvestatd when I noticed today - which brought the VM status back, but not that of the containers. Whilst checking things out for this post, it's lost the status of everything again.

All VMs are running, containers yesterday seemed to lose network on the back of it - my rProxy certainly couldn't forward traffic to them. I assume the same today (didn't check), but having shut them down and restarted, they appear to be OK.

Both days there's been a vzdump that's failed on the same container when I've noticed the behaviour.
The backup job in question runs every 6 hours to snapshot more frequently used containers - but this issue isn't appearing at other times, so I think something's a bit wonky elsewhere.

In writing the post, I restarted pvestatd for a second time which, yet again brought the VM status back, and has now left me with full unknowns again. /etc/pve/.rrd is now blank.

If I check the status of the the pvestatd service, I get the following - container 912 is the problematic container for backup.

Code:

root@proxmox:~# systemctl status pvestatd
● pvestatd.service - PVE Status Daemon
     Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
     Active: active (running) since Tue 2023-06-27 12:57:17 BST; 10s ago
    Process: 1626430 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
   Main PID: 1626432 (pvestatd)
      Tasks: 2 (limit: 134945)
     Memory: 86.5M
        CPU: 688ms
     CGroup: /system.slice/pvestatd.service
             ├─1626432 pvestatd
             └─1642002 lxc-info -n 912 -p

Not really sure where to go from here, any idea of what to check next?

deltamikealpha · Jun 27, 2023

I now can't log into the web interface - everything's still working in VMs/CTs, but something's really upset..

Dunuin · Jun 27, 2023

Stuff like not being able to login can be caused by pvestatd getting stuck. See here: https://forum.proxmox.com/threads/unavailable-storage-will-make-the-webui-unusable.119107/

deltamikealpha · Jun 27, 2023

Dunuin said:
Stuff like not being able to login can be caused by pvestatd getting stuck. See here: https://forum.proxmox.com/threads/unavailable-storage-will-make-the-webui-unusable.119107/

I stumbled across this a couple of days ago when it first happened - and on first look it doesn't appear as though it's an issue (files are still browsable etc) -- checking syslog though it does show the NAS is supposedly not responding.

At some point in screwing round with all of this I managed to stop container 912 (I don't know how, it failed many times) - syslog is still being spammed by messages saying NFS isn't available though I've removed the only entry to NFS on the host box.

I'll give it a reboot after working hours, stands to reason it could still be some dodgy state that container's in as it itself uses an NFS share

Search

Search

Status of VMs and LXCs disappering

deltamikealpha

New Member

deltamikealpha

New Member

Dunuin

Distinguished Member

deltamikealpha

New Member