invalid PVE ticket (401) on 8 node PVE 6.4 cluster / chrony ?

guerbywork · Apr 21, 2022

Hi,

After 292 days of uptime without issue on our 8 node PVE 6.4 cluster (with subscription, maintained up to date) we started to randomly get "invalid PVE ticket (401)" messages on the web UI.

There are multiple threads about this particular error message on the forum, in some cases it seemed to have been solved by replacing systemd-timesyncd by chrony.

The wiki mentions chrony but only in the context of PVE 7:

https://pve.proxmox.com/wiki/Time_Synchronization

Is it safe to "apt-get install chrony" on a 8 nodes PVE 6.4 cluster? I tried on a standalone PVE 6.4 and it just stopped systemd-timesyncd but it seems not to be masked:

Code:

# systemctl status systemd-timesyncd
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/systemd-timesyncd.service.d
           └─disable-with-time-daemon.conf
   Active: inactive (dead) since Thu 2022-04-21 14:48:52 CEST; 1min 39s ago

Also we looked at various logs, incluing following "journalctl -f" on multiple nodes but without any logged error while the web UI shows the "invalid PVE ticket", is there a way to debug what happens?

Thanks for your help!

guerbywork · Apr 21, 2022

Code:

# pveversion --verbose
proxmox-ve: 6.4-1 (running kernel: 5.4.119-1-pve)
pve-manager: 6.4-14 (running version: 6.4-14/15e2bf61)
pve-kernel-5.4: 6.4-15
pve-kernel-helper: 6.4-15
pve-kernel-5.4.174-2-pve: 5.4.174-2
pve-kernel-5.4.162-1-pve: 5.4.162-2
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.15-pve1~bpo10
ceph-fuse: 15.2.15-pve1~bpo10
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-2
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.7-pve1

guerbywork · Apr 22, 2022

We still get random 401, while looking more closely at some PVE processes :

Code:

www-data    2900  0.0  0.0 354016 140092 ?       Ss    2021  14:28 pveproxy
www-data 1774629  0.3  0.0 364960 136240 ?       S    08:56   0:10  \_ pveproxy worker
www-data 1796770  0.4  0.0 365224 136336 ?       S    09:09   0:10  \_ pveproxy worker
www-data 1840035  0.6  0.0 364496 135312 ?       S    09:35   0:04  \_ pveproxy worker

root@r640b:~# systemctl status pveproxy
● pveproxy.service - PVE API Proxy Server
   Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2021-07-03 13:02:25 CEST; 9 months 18 days ago
  Process: 868855 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
 Main PID: 2900 (pveproxy)
    Tasks: 4 (limit: 618170)
   Memory: 203.3M
   CGroup: /system.slice/pveproxy.service
           ├─   2900 pveproxy
           ├─1774629 pveproxy worker
           ├─1796770 pveproxy worker
           └─1840035 pveproxy worker

Apr 22 09:09:34 r640b pveproxy[2900]: worker 1796770 started
Apr 22 09:09:39 r640b pveproxy[1796770]: Clearing outdated entries from certificate cache
Apr 22 09:20:41 r640b pveproxy[1713846]: Clearing outdated entries from certificate cache
Apr 22 09:26:52 r640b pveproxy[1774629]: Clearing outdated entries from certificate cache
Apr 22 09:35:24 r640b pveproxy[1713846]: worker exit
Apr 22 09:35:24 r640b pveproxy[2900]: worker 1713846 finished
Apr 22 09:35:24 r640b pveproxy[2900]: starting 1 worker(s)
Apr 22 09:35:24 r640b pveproxy[2900]: worker 1840035 started
Apr 22 09:36:20 r640b pveproxy[1840035]: Clearing outdated entries from certificate cache
Apr 22 09:40:04 r640b pveproxy[1796770]: Clearing outdated entries from certificate cache

It looks like the pveproxy main process (pid 2900) hasn't been restarted since its initial launch 9 monthes ago. Same for most of PVE service processes, eg:

Code:

# systemctl status pvestatd
● pvestatd.service - PVE Status Daemon
   Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2021-07-03 13:02:23 CEST; 9 months 18 days ago
 Main PID: 2864 (pvestatd)
    Tasks: 1 (limit: 618170)
   Memory: 214.9M
   CGroup: /system.slice/pvestatd.service
           └─2864 pvestatd

Is it normal? Should we restart these processes through the PVE web UI or systemctl?

guerbywork · Apr 28, 2022

We don't get the 401 error anymore, we didn't do anything.

It would be nice to have some logs or place to look at next time it happens

.

Search

Search

invalid PVE ticket (401) on 8 node PVE 6.4 cluster / chrony ?

guerbywork

Member

guerbywork

Member

guerbywork

Member

guerbywork

Member