Auth Fail on Main pve and can not see containers via cluster login

MasterCATZ

Member
Sep 8, 2022
42
7
13
Screenshot from 2025-03-18 22-47-55.pngI can log in via ssh

I can still access pveHA via webui but it only half ass see's pveMain , the cpu status etc updates live but it does not see the container

all the containers on pveMain are still running

and Quarum is working so unsure why i can not log into PVEMain ?

the last error from journalctl was from months ago

so far all the processes I have checked say they are ok
systemctl status pve-cluster corosync pvedaemon pve-firewall pve-ha-crm pve-ha-lrm pvestatd

apart from
(pvemain pvestatd[1444]: got timeout)



what can i do to correct it with out rebooting i rather not take the containers offline



Code:
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-03-18 22:19:18 AEST; 52min ago
   Main PID: 3307622 (pmxcfs)
      Tasks: 6 (limit: 37970)
     Memory: 54.9M
        CPU: 6.401s
     CGroup: /system.slice/pve-cluster.service
             └─3307622 /usr/bin/pmxcfs

Mar 18 22:27:58 pvemain pmxcfs[3307622]: [dcdb] notice: leader is 1/3307622
Mar 18 22:27:58 pvemain pmxcfs[3307622]: [dcdb] notice: synced members: 1/3307622
Mar 18 22:27:58 pvemain pmxcfs[3307622]: [dcdb] notice: start sending inode updates
Mar 18 22:27:58 pvemain pmxcfs[3307622]: [dcdb] notice: sent all (6) updates
Mar 18 22:27:58 pvemain pmxcfs[3307622]: [dcdb] notice: all data is up to date
Mar 18 22:27:58 pvemain pmxcfs[3307622]: [status] notice: received all states
Mar 18 22:27:58 pvemain pmxcfs[3307622]: [status] notice: all data is up to date
Mar 18 22:37:12 pvemain pmxcfs[3307622]: [status] notice: received log
Mar 18 22:46:00 pvemain pmxcfs[3307622]: [status] notice: received log
Mar 18 23:01:00 pvemain pmxcfs[3307622]: [status] notice: received log

● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-09-06 20:29:10 AEST; 6 months 10 days ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 1375 (corosync)
      Tasks: 9 (limit: 37970)
     Memory: 140.0M
        CPU: 1d 19h 14min 24.760s
     CGroup: /system.slice/corosync.service
             └─1375 /usr/sbin/corosync -f

Mar 18 22:27:51 pvemain corosync[1375]:   [MAIN  ] Completed service synchronization, ready to provide service.
Mar 18 22:27:57 pvemain corosync[1375]:   [KNET  ] rx: host: 2 link: 0 is up
Mar 18 22:27:57 pvemain corosync[1375]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Mar 18 22:27:57 pvemain corosync[1375]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Mar 18 22:27:57 pvemain corosync[1375]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Mar 18 22:27:58 pvemain corosync[1375]:   [QUORUM] Sync members[2]: 1 2
Mar 18 22:27:58 pvemain corosync[1375]:   [QUORUM] Sync joined[1]: 2
Mar 18 22:27:58 pvemain corosync[1375]:   [TOTEM ] A new membership (1.cbb3) was formed. Members joined: 2
Mar 18 22:27:58 pvemain corosync[1375]:   [QUORUM] Members[2]: 1 2
Mar 18 22:27:58 pvemain corosync[1375]:   [MAIN  ] Completed service synchronization, ready to provide service.

● pvedaemon.service - PVE API Daemon
     Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-09-06 20:29:11 AEST; 6 months 10 days ago
    Process: 3308037 ExecReload=/usr/bin/pvedaemon restart (code=exited, status=0/SUCCESS)
   Main PID: 1455 (pvedaemon)
      Tasks: 31 (limit: 37970)
     Memory: 360.3M
        CPU: 5h 21min 32.733s
     CGroup: /system.slice/pvedaemon.service
             ├─   1455 pvedaemon
             ├─ 774335 /usr/bin/dtach -A /var/run/dtach/vzctlconsole118 -r winch -z lxc-console -n 118 -e -1
             ├─ 774336 lxc-console -n 118 -e -1
             ├─ 774469 /usr/bin/dtach -A /var/run/dtach/vzctlconsole125 -r winch -z lxc-console -n 125 -e -1
             ├─ 774470 lxc-console -n 125 -e -1
             ├─1817428 /usr/bin/dtach -A /var/run/dtach/vzctlconsole127 -r winch -z lxc-console -n 127 -e -1
             ├─1817429 lxc-console -n 127 -e -1
             ├─2132523 /usr/bin/dtach -A /var/run/dtach/vzctlconsole103 -r winch -z lxc-console -n 103 -e -1
             ├─2132524 lxc-console -n 103 -e -1
             ├─2219253 /usr/bin/dtach -A /var/run/dtach/vzctlconsole129 -r winch -z lxc-console -n 129 -e -1
             ├─2219254 lxc-console -n 129 -e -1
             ├─3214215 /usr/bin/dtach -A /var/run/dtach/vzctlconsole110 -r winch -z lxc-console -n 110 -e -1
             ├─3214216 lxc-console -n 110 -e -1
             ├─3236038 "pvedaemon worker"
             ├─3238250 "pvedaemon worker"
             ├─3238683 "pvedaemon worker"
             ├─3245390 lxc-info -n 110 -p
             ├─3245690 lxc-info -n 110 -p
             ├─3245973 lxc-info -n 110 -p
             ├─3308045 "pvedaemon worker"
             ├─3308046 "pvedaemon worker"
             ├─3308047 "pvedaemon worker"
             ├─3308054 lxc-info -n 110 -p
             ├─3308055 lxc-info -n 110 -p
             ├─3308056 lxc-info -n 110 -p
             ├─3419697 /usr/bin/dtach -A /var/run/dtach/vzctlconsole128 -r winch -z lxc-console -n 128 -e -1
             ├─3419698 lxc-console -n 128 -e -1
             ├─3659515 "task UPID:pvemain:0037D6FB:391FA33E:676D15EE:vncshell::root@pam:"
             ├─3659516 /usr/bin/termproxy 5900 --path /nodes/pvemain --perm Sys.Console -- /bin/login -f root
             ├─3961656 /usr/bin/dtach -A /var/run/dtach/vzctlconsole132 -r winch -z lxc-console -n 132 -e -1
             └─3961657 lxc-console -n 132 -e -1

Mar 18 22:19:41 pvemain pvedaemon[1455]: received signal HUP
Mar 18 22:19:41 pvemain pvedaemon[1455]: server closing
Mar 18 22:19:41 pvemain pvedaemon[1455]: server shutdown (restart)
Mar 18 22:19:41 pvemain systemd[1]: Reloaded pvedaemon.service - PVE API Daemon.
Mar 18 22:19:42 pvemain pvedaemon[1455]: restarting server
Mar 18 22:19:42 pvemain pvedaemon[1455]: starting 3 worker(s)
Mar 18 22:19:42 pvemain pvedaemon[1455]: worker 3308045 started
Mar 18 22:19:42 pvemain pvedaemon[1455]: worker 3308046 started
Mar 18 22:19:42 pvemain pvedaemon[1455]: worker 3308047 started
Mar 18 22:19:42 pvemain pvedaemon[3308046]: <root@pam> successful auth for user 'root@pam'

● pve-firewall.service - Proxmox VE firewall
     Loaded: loaded (/lib/systemd/system/pve-firewall.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-09-06 20:29:10 AEST; 6 months 10 days ago
   Main PID: 1428 (pve-firewall)
      Tasks: 1 (limit: 37970)
     Memory: 100.7M
        CPU: 16h 28min 13.975s
     CGroup: /system.slice/pve-firewall.service
             └─1428 pve-firewall

Sep 06 20:29:10 pvemain systemd[1]: Starting pve-firewall.service - Proxmox VE firewall...
Sep 06 20:29:10 pvemain pve-firewall[1428]: starting server
Sep 06 20:29:10 pvemain systemd[1]: Started pve-firewall.service - Proxmox VE firewall.
Mar 18 22:19:36 pvemain systemd[1]: Reloading pve-firewall.service - Proxmox VE firewall...
Mar 18 22:19:36 pvemain pve-firewall[3307874]: send HUP to 1428
Mar 18 22:19:36 pvemain pve-firewall[1428]: received signal HUP
Mar 18 22:19:36 pvemain pve-firewall[1428]: server shutdown (restart)
Mar 18 22:19:36 pvemain systemd[1]: Reloaded pve-firewall.service - Proxmox VE firewall.
Mar 18 22:19:37 pvemain pve-firewall[1428]: restarting server

● pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon
     Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-03-18 22:20:31 AEST; 51min ago
    Process: 3335263 ExecStart=/usr/sbin/pve-ha-crm start (code=exited, status=0/SUCCESS)
   Main PID: 3335269 (pve-ha-crm)
      Tasks: 1 (limit: 37970)
     Memory: 112.4M
        CPU: 1.052s
     CGroup: /system.slice/pve-ha-crm.service
             └─3335269 pve-ha-crm

Mar 18 22:20:31 pvemain systemd[1]: Starting pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon...
Mar 18 22:20:31 pvemain pve-ha-crm[3335269]: starting server
Mar 18 22:20:31 pvemain pve-ha-crm[3335269]: status change startup => wait_for_quorum
Mar 18 22:20:31 pvemain systemd[1]: Started pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon.
Mar 18 22:20:36 pvemain pve-ha-crm[3335269]: status change wait_for_quorum => slave
Mar 18 22:24:56 pvemain pve-ha-crm[3335269]: successfully acquired lock 'ha_manager_lock'
Mar 18 22:24:56 pvemain pve-ha-crm[3335269]: watchdog active
Mar 18 22:24:56 pvemain pve-ha-crm[3335269]: status change slave => master

● pve-ha-lrm.service - PVE Local HA Resource Manager Daemon
     Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-03-18 22:20:27 AEST; 51min ago
    Process: 3335231 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
   Main PID: 3335249 (pve-ha-lrm)
      Tasks: 1 (limit: 37970)
     Memory: 111.8M
        CPU: 16.817s
     CGroup: /system.slice/pve-ha-lrm.service
             └─3335249 pve-ha-lrm

Mar 18 22:20:26 pvemain systemd[1]: Starting pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
Mar 18 22:20:27 pvemain pve-ha-lrm[3335249]: starting server
Mar 18 22:20:27 pvemain pve-ha-lrm[3335249]: status change startup => wait_for_agent_lock
Mar 18 22:20:27 pvemain systemd[1]: Started pve-ha-lrm.service - PVE Local HA Resource Manager Daemon.
Mar 18 22:20:37 pvemain pve-ha-lrm[3335249]: successfully acquired lock 'ha_agent_pvemain_lock'
Mar 18 22:20:37 pvemain pve-ha-lrm[3335249]: watchdog active
Mar 18 22:20:37 pvemain pve-ha-lrm[3335249]: status change wait_for_agent_lock => active

● pvestatd.service - PVE Status Daemon
     Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-09-06 20:29:10 AEST; 6 months 10 days ago
    Process: 3308064 ExecReload=/usr/bin/pvestatd restart (code=exited, status=0/SUCCESS)
   Main PID: 1444 (pvestatd)
      Tasks: 2 (limit: 37970)
     Memory: 353.8M
        CPU: 3d 14h 15min 9.321s
     CGroup: /system.slice/pvestatd.service
             ├─   1444 pvestatd
             └─3245436 lxc-info -n 110 -p

Mar 18 20:29:59 pvemain pvestatd[1444]: got timeout
Mar 18 20:29:59 pvemain pvestatd[1444]: status update time (20.470 seconds)
Mar 18 20:30:04 pvemain pvestatd[1444]: got timeout
Mar 18 20:30:09 pvemain pvestatd[1444]: got timeout
Mar 18 20:30:14 pvemain pvestatd[1444]: got timeout
Mar 18 20:30:19 pvemain pvestatd[1444]: got timeout
Mar 18 22:19:42 pvemain systemd[1]: Reloading pvestatd.service - PVE Status Daemon...
Mar 18 22:19:43 pvemain pvestatd[3308064]: send HUP to 1444
Mar 18 22:19:43 pvemain pvestatd[1444]: received signal HUP
Mar 18 22:19:43 pvemain systemd[1]: Reloaded pvestatd.service - PVE Status Daemon.
 
Last edited:
pveversion -v
Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.3.5 (running version: 8.3.5/dac3aa88bac3f300)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 19.2.0-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.1
libpve-rs-perl: 0.9.2
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.3-1
proxmox-backup-file-restore: 3.3.3-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.6
pve-cluster: 8.0.10
pve-container: 5.2.4
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.4.0
pve-qemu-kvm: 9.2.0-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.8
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
 
I have already tried
service pveproxy restart
and it made no change


as someone said this command would let the containers and VM keep running I did it

systemctl restart corosync.service pvedaemon.service pve-firewall.service pve-ha-crm.service pve-ha-lrm.service pvestatd.service

and it made all the pve's reboot so not happy ...

thankfully MariaDB recovered from the power off event this time around
 
Last edited: