pvestatd failure and question marks on node, containers, vms and each storage

Jarvar

Active Member
Aug 27, 2019
317
10
38
Hello.
After a few hours, my node will display grey question marks on the node, containers, virtual machines and storage.
If I start pvestatd in system services it will start working again. I found this out from somebody who did something similar.
However it keeps going back to the question mark symbol
How do I find out what is causing it?
Thank you.

pveversion pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-15-pve)
 
The best way to find out what happened is to note the time when nodes transitioned questionable state, then login to each node and examine the log, ie if it happened within last 15 minutes:
journalctl --since "15 min ago"
study and compare the logs, there should be something about transition event. good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Thank you @bbgeek17
I did jounralctl --since "1 hour ago"

what kind of error am I looking for?
I did have zfs storage inside storage that wasn't connected or mounted. Would that cause a crash?
Thank you so much.

Would it be something like this?
Nov 01 14:30:21 pve004 pvedaemon[2520]: zfs error: cannot open 'crucial': no such pool Nov 01 14:30:21 pve004 kernel: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1 Nov 01 14:30:21 pve004 kernel: xhci_hcd 0000:00:14.0: Looking for event-dma 0000000113cab0f0 trb-start 0000000113cab100 trb-end 0000000113cab2b0 seg-start > Nov 01 14:30:21 pve004 kernel: xhci_hcd 0000:00:14.0: WARN Successful completion on short TX Nov 01 14:30:21 pve004 kernel: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1 Nov 01 14:30:21 pve004 kernel: xhci_hcd 0000:00:14.0: Looking for event-dma 0000000113caea40 trb-start 0000000113caea50 trb-end 0000000113caec00 seg-start

This is the closest I found to where I started pvestatd.

Nov 01 15:37:32 pve004 systemd[195241]: Startup finished in 171ms. Nov 01 15:37:32 pve004 systemd[1]: Started user@0.service - User Manager for UID 0. Nov 01 15:37:32 pve004 systemd[1]: Started session-67.scope - Session 67 of User root. Nov 01 15:37:32 pve004 login[195260]: ROOT LOGIN on '/dev/pts/0' Nov 01 15:37:36 pve004 systemd[1]: session-67.scope: Deactivated successfully. Nov 01 15:37:36 pve004 systemd-logind[2012]: Session 67 logged out. Waiting for processes to exit. Nov 01 15:37:36 pve004 systemd-logind[2012]: Removed session 67. Nov 01 15:37:36 pve004 pvedaemon[2521]: <root@pam> end task UPID:pve004:0002FA9E:007D6935:6542A8FB:vncshell::root@pam: OK Nov 01 15:37:46 pve004 systemd[1]: Stopping user@0.service - User Manager for UID 0... Nov 01 15:37:46 pve004 systemd[195241]: Activating special unit exit.target... Nov 01 15:37:46 pve004 systemd[195241]: Stopped target default.target - Main User Target. Nov 01 15:37:46 pve004 systemd[195241]: Stopped target basic.target - Basic System. Nov 01 15:37:46 pve004 systemd[195241]: Stopped target paths.target - Paths. Nov 01 15:37:46 pve004 systemd[195241]: Stopped target sockets.target - Sockets. Nov 01 15:37:46 pve004 systemd[195241]: Stopped target timers.target - Timers. Nov 01 15:37:46 pve004 systemd[195241]: Closed dirmngr.socket - GnuPG network certificate management daemon. Nov 01 15:37:46 pve004 systemd[195241]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers). Nov 01 15:37:46 pve004 systemd[195241]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted). Nov 01 15:37:46 pve004 systemd[195241]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation). Nov 01 15:37:46 pve004 systemd[195241]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache. Nov 01 15:37:46 pve004 systemd[195241]: Removed slice app.slice - User Application Slice. Nov 01 15:37:46 pve004 systemd[195241]: Reached target shutdown.target - Shutdown. Nov 01 15:37:46 pve004 systemd[195241]: Finished systemd-exit.service - Exit the Session. Nov 01 15:37:46 pve004 systemd[195241]: Reached target exit.target - Exit the Session. Nov 01 15:37:46 pve004 systemd[1]: user@0.service: Deactivated successfully. Nov 01 15:37:46 pve004 systemd[1]: Stopped user@0.service - User Manager for UID 0. Nov 01 15:37:46 pve004 systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0... Nov 01 15:37:46 pve004 systemd[1]: run-user-0.mount: Deactivated successfully. Nov 01 15:37:46 pve004 systemd[1]: user-runtime-dir@0.service: Deactivated successfully. Nov 01 15:37:46 pve004 systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0. Nov 01 15:37:46 pve004 systemd[1]: Removed slice user-0.slice - User Slice of UID 0. Nov 01 15:37:52 pve004 pvedaemon[2520]: <root@pam> starting task UPID:pve004:0002FC7D:007D7136:6542A910:srvstart:pvestatd:root@pam: Nov 01 15:37:52 pve004 pvedaemon[195709]: starting service pvestatd: UPID:pve004:0002FC7D:007D7136:6542A910:srvstart:pvestatd:root@pam: Nov 01 15:37:52 pve004 systemd[1]: Starting pvestatd.service - PVE Status Daemon... Nov 01 15:37:53 pve004 pvestatd[195731]: starting server Nov 01 15:37:53 pve004 systemd[1]: Started pvestatd.service - PVE Status Daemon.
 
I've removed the storage from the storage.cfg atleast. The USB connected drive is still on but not constantly trying to connect like before. it's been about 4 hours since the node failed, so I will test overnight.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!