Status of VMs and LXCs disappering

deltamikealpha

New Member
May 19, 2023
8
0
1
Since updating to PVE 8, I've been having an issue that appears to be vzdump related, or an issue that's stopping vzdump executing. The more I've thought about it I suspect it's the latter.

At between 12-12:30 for the last couple of days, the status' of all my VMs and containers just disappears. I restarted pvestatd when I noticed today - which brought the VM status back, but not that of the containers. Whilst checking things out for this post, it's lost the status of everything again.

All VMs are running, containers yesterday seemed to lose network on the back of it - my rProxy certainly couldn't forward traffic to them. I assume the same today (didn't check), but having shut them down and restarted, they appear to be OK.

Both days there's been a vzdump that's failed on the same container when I've noticed the behaviour.
The backup job in question runs every 6 hours to snapshot more frequently used containers - but this issue isn't appearing at other times, so I think something's a bit wonky elsewhere.

In writing the post, I restarted pvestatd for a second time which, yet again brought the VM status back, and has now left me with full unknowns again. /etc/pve/.rrd is now blank.

If I check the status of the the pvestatd service, I get the following - container 912 is the problematic container for backup.

Code:
root@proxmox:~# systemctl status pvestatd
● pvestatd.service - PVE Status Daemon
     Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
     Active: active (running) since Tue 2023-06-27 12:57:17 BST; 10s ago
    Process: 1626430 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
   Main PID: 1626432 (pvestatd)
      Tasks: 2 (limit: 134945)
     Memory: 86.5M
        CPU: 688ms
     CGroup: /system.slice/pvestatd.service
             ├─1626432 pvestatd
             └─1642002 lxc-info -n 912 -p

Not really sure where to go from here, any idea of what to check next?
 
Stuff like not being able to login can be caused by pvestatd getting stuck. See here: https://forum.proxmox.com/threads/unavailable-storage-will-make-the-webui-unusable.119107/
I stumbled across this a couple of days ago when it first happened - and on first look it doesn't appear as though it's an issue (files are still browsable etc) -- checking syslog though it does show the NAS is supposedly not responding.

At some point in screwing round with all of this I managed to stop container 912 (I don't know how, it failed many times) - syslog is still being spammed by messages saying NFS isn't available though I've removed the only entry to NFS on the host box.

I'll give it a reboot after working hours, stands to reason it could still be some dodgy state that container's in as it itself uses an NFS share
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!