Periodically VMs Show ? as status

ahwelp · Feb 2, 2023

Hello fellow ProxMoxers!

I am running a home server with a machine with lower hardware.

Code:

4 x Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz
1 X 8Gb Ram DDR3

It is working... fine, but sometimes the web interface simply get me nuts!

All VMs show a unkonwn status and hide their IDs and names. With a reboot everything comes back fine. But the idea is 24/7 availability.
The Web interface is as the file attachment

How to revert this on a running instance and prevent it from happening again?

Thanks!

ahwelp · Feb 2, 2023

Also...

When trying to create or clone a VM, the node lists as offline, even the computer on and the VMs running

Dunuin · Feb 2, 2023

I would try to log in using SSH and then run systemctl status pve*.service to check if all PVE services are up and running without any errors/warnings in the journal. I would guess one of them like "pveproxy" or "pve-cluster" got a problem.

Neobin · Feb 2, 2023

In addition:
When the problem occurs, what is the full output in code-tags of: pvesm status?

This behavior can also often be seen, in cases where there is a problem with a storage and/or its reachability.

ahwelp · Feb 2, 2023

Looking at the pve*.service there were multiple services with faults.

I restarted the pvestatd and a few seconds latter, the GUI worked again

The log for the pvestatd was:

Code:

● pvestatd.service - PVE Status Daemon
     Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
     Active: failed (Result: signal) since Thu 2023-02-02 15:43:15 -03; 1h 35min ago
    Process: 1049 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
   Main PID: 1060 (code=killed, signal=SEGV)
        CPU: 3h 20min 10.978s

Feb 02 11:57:24 <hostname> pvestatd[1060]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - got timeout
Feb 02 11:57:24 <hostname> pvestatd[1060]: status update time (9.300 seconds)
Feb 02 11:57:33 <hostname> pvestatd[1060]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - got timeout
Feb 02 11:57:33 <hostname> pvestatd[1060]: status update time (8.192 seconds)
Feb 02 11:57:43 <hostname> pvestatd[1060]: VM 111 qmp command failed - VM 111 qmp command 'query-proxmox-support' failed - unable to
connect to VM 111 qmp socket - timeout after 51 retries
Feb 02 11:57:44 <hostname> pvestatd[1060]: status update time (8.215 seconds)
Feb 02 15:37:56 <hostname> pvestatd[1060]: command '/sbin/lvs --separator : --noheadings --units b --unbuffered --nosuffix --config 'report/time_format="%s"' --options vg_name,lv_name,lv_size,lv_attr,pool_lv,data_percent,metadata_percent,snap_percent,uuid,tags,metadata_size,time' failed: got signal 11
Feb 02 15:43:15 <hostname> systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Feb 02 15:43:15 <hostname> systemd[1]: pvestatd.service: Failed with result 'signal'.
Feb 02 15:43:15 <hostname> systemd[1]: pvestatd.service: Consumed 3h 20min 10.978s CPU time.

ahwelp · Feb 2, 2023

Neobin said:
In addition:
When the problem occurs, what is the full output in code-tags of: pvesm status?

This behavior can also often be seen, in cases where there is a problem with a storage and/or its reachability.

Code:

root@<hostname>:~# pvesm status
Name              Type     Status           Total            Used       Available        %
hard_drive         lvm     active       976756736       100663296       876093440   10.31%
local              dir     active        57225328        11734528        42551512   20.51%
local-lvm      lvmthin     active       148299776        31647172       116652603   21.34%

Neobin · Feb 2, 2023

ahwelp said:

Code:

root@<hostname>:~# pvesm status
Name              Type     Status           Total            Used       Available        %
hard_drive         lvm     active       976756736       100663296       876093440   10.31%
local              dir     active        57225328        11734528        42551512   20.51%
local-lvm      lvmthin     active       148299776        31647172       116652603   21.34%

Is this from before or after you:

ahwelp said:
I restarted the pvestatd

?

ahwelp said:

Code:

Feb 02 15:37:56 <hostname> pvestatd[1060]: command '/sbin/lvs --separator : --noheadings --units b --unbuffered --nosuffix --config 'report/time_format="%s"' --options vg_name,lv_name,lv_size,lv_attr,pool_lv,data_percent,metadata_percent,snap_percent,uuid,tags,metadata_size,time' failed: got signal 11
Feb 02 15:43:15 <hostname> systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV

Would check the syslog around this time.

ahwelp · Feb 2, 2023

Neobin said:
Is this from before or after you:

?

Would check the syslog around this time.

The pvesm status was taken after the restart. I did not loged it before restarting the service.

The syslog was taken before the restart. I saved the log on a file to have a reference.

The VM 111 is ruining Ubuntu 18.04 with PHP, Apache and Postgres. Also, is common to the VM overwrite the console with kernel panics. This is common on all VMs. This is a point I would open a new question, but the logs on the VMs are on the attachment, maybe we can see something over here.

It seems that a timeout while checking a VM status killed the pvestatd service. Now, why it happens?

bbgeek17 · Feb 2, 2023

ahwelp said:
Main PID: 1060 (code=killed, signal=SEGV)

smells like bad memory or similar hardware fault, could also be subtle file system corruption due to, for example, power loss.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Search

Search

Periodically VMs Show ? as status

ahwelp

New Member

Attachments

ahwelp

New Member

Attachments

Dunuin

Distinguished Member

Neobin

Distinguished Member

ahwelp

New Member

ahwelp

New Member

Neobin

Distinguished Member

ahwelp

New Member

Attachments

bbgeek17

Distinguished Member

We value your privacy