PVE Cluster Node Time-Outs

TechBrain64

Member
Oct 5, 2022
15
2
8
San Antonio, TX
Hi All,

Yesterday I noticed my 7.3-2 nodes had an upgrade to 7.3-6 available, so I performed the upgrades. After which one of the nodes is displaying the following error on the status widget and other pages are not displaying their respective information until 20-30 minutes later.

Node-Time-Out.png

Anyone with idea how I can solve this?
 
Hi,
what is the output of pveversion -v and systemctl status pve-cluster.service pveproxy.service pvedaemon.service? Anything interesting in /var/log/syslog?
 
HI,

i got this message on the cluster-master:

pveproxy.service - PVE API Proxy Server Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2023-03-13 20:06:36 CET; 17h ago Process: 1141 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS) Process: 1143 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS) Process: 87948 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS) Main PID: 1145 (pveproxy) Tasks: 4 (limit: 37997) Memory: 255.6M CPU: 19.318s CGroup: /system.slice/pveproxy.service ├─ 1145 pveproxy ├─ 87974 pveproxy worker ├─ 87975 pveproxy worker └─395824 pveproxy worker Mar 14 07:57:35 Alphanode pveproxy[87976]: proxy detected vanished client connection Mar 14 08:12:45 Alphanode pveproxy[87976]: proxy detected vanished client connection Mar 14 08:12:47 Alphanode pveproxy[87976]: proxy detected vanished client connection Mar 14 08:12:48 Alphanode pveproxy[87976]: proxy detected vanished client connection Mar 14 13:29:49 Alphanode pveproxy[1145]: worker 87976 finished Mar 14 13:29:49 Alphanode pveproxy[1145]: starting 1 worker(s) Mar 14 13:29:49 Alphanode pveproxy[1145]: worker 395824 started Mar 14 13:29:49 Alphanode pveproxy[395823]: worker exit Mar 14 13:30:26 Alphanode pveproxy[87975]: proxy detected vanished client connection Mar 14 13:30:28 Alphanode pveproxy[395824]: proxy detected vanished client connection
 
The status of the services seems fine to me. What does pvecm status show? Can you ping or ssh from/to the problematic node? Does it make a difference if you access the web interface with the IP of the node itself or from a different node?

If you open the developer tools in your browser (usually Ctrl+Shift+C), go to the Network tab and then reload the page, what requests show as failing and with what status code? Clearing the browser cache might also be worth a shot.

Only grepping for fail/warn/error will miss a lot of information ;) The full log from around the time the issue happens or since the last boot might be more telling.