[SOLVED] Can't login and timeout when restarting services

tsulhc

New Member
Mar 17, 2022
1
0
1
33
Hey guys!

After adding a node to the cluster I cannot login anymore on the gui and I can't restart pvedaemon or pveproxy or use pvecm updatecerts --force.

pveversion -v

Bash:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.13: 7.1-9
pve-kernel-5.13.19-6-pve: 5.13.19-14
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

The cluster quorate without problems

Bash:
Cluster information
-------------------
Name:             Netpass
Config Version:   32
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Mar 17 17:58:49 2022
Quorum provider:  corosync_votequorum
Nodes:            8
Node ID:          0x00000008
Ring ID:          1.8a741
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   8
Highest expected: 8
Total votes:      8
Quorum:           5
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.15.70
0x00000002          1 192.168.15.21
0x00000003          1 192.168.15.80
0x00000004          1 192.168.15.50
0x00000005          1 192.168.15.90
0x00000006          1 192.168.15.40
0x00000007          1 192.168.15.60
0x00000008          1 192.168.15.30 (local)

systemctl status pveproxy

Bash:
● pveproxy.service - PVE API Proxy Server
     Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-03-16 20:20:57 CET; 21h ago
    Process: 1923 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
    Process: 1926 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
   Main PID: 1929 (pveproxy)
      Tasks: 4 (limit: 77016)
     Memory: 228.1M
        CPU: 10.015s
     CGroup: /system.slice/pveproxy.service
             ├─  1929 pveproxy
             ├─  1930 pveproxy worker
             ├─507590 pveproxy worker
             └─517540 pveproxy worker

Mar 17 13:02:13 mikasa pveproxy[1931]: proxy detected vanished client connection
Mar 17 13:02:13 mikasa pveproxy[1930]: proxy detected vanished client connection
Mar 17 13:02:37 mikasa pveproxy[1931]: worker exit
Mar 17 13:02:37 mikasa pveproxy[1929]: worker 1931 finished
Mar 17 13:02:37 mikasa pveproxy[1929]: starting 1 worker(s)
Mar 17 13:02:37 mikasa pveproxy[1929]: worker 517540 started
Mar 17 13:03:14 mikasa pveproxy[517540]: Clearing outdated entries from certificate cache
Mar 17 13:06:16 mikasa pveproxy[507590]: proxy detected vanished client connection
Mar 17 17:36:58 mikasa pveproxy[517540]: proxy detected vanished client connection
Mar 17 18:10:52 mikasa pveproxy[1930]: proxy detected vanished client connection #Every time I try to login

systemctl restart pveproxy

Bash:
Job for pveproxy.service failed because a timeout was exceeded.
See "systemctl status pveproxy.service" and "journalctl -xe" for details.

systemctl status pveproxy.service

Bash:
● pveproxy.service - PVE API Proxy Server
     Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
     Active: deactivating (final-sigterm) (Result: timeout) since Thu 2022-03-17 18:41:01 CET; 22min ago
    Process: 696475 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=killed, signal=KILL)
      Tasks: 3 (limit: 77016)
     Memory: 156.0M
        CPU: 182ms
     CGroup: /system.slice/pveproxy.service
             ├─687710 /usr/bin/perl -T /usr/bin/pveproxy stop
             ├─693698 /usr/bin/perl /usr/bin/pvecm updatecerts --silent
             └─696476 /usr/bin/perl /usr/bin/pvecm updatecerts --silent

Mar 17 18:59:04 mikasa systemd[1]: Starting PVE API Proxy Server...
Mar 17 18:59:35 mikasa pvecm[696475]: got timeout
Mar 17 19:00:35 mikasa systemd[1]: pveproxy.service: start-pre operation timed out. Terminating.
Mar 17 19:02:05 mikasa systemd[1]: pveproxy.service: State 'stop-sigterm' timed out. Killing.
Mar 17 19:02:05 mikasa systemd[1]: pveproxy.service: Killing process 696475 (pvecm) with signal SIGKILL.
Mar 17 19:02:05 mikasa systemd[1]: pveproxy.service: Killing process 687710 (pveproxy) with signal SIGKILL.
Mar 17 19:02:05 mikasa systemd[1]: pveproxy.service: Killing process 693698 (pvecm) with signal SIGKILL.
Mar 17 19:02:05 mikasa systemd[1]: pveproxy.service: Killing process 696476 (pvecm) with signal SIGKILL.

service pvedaemon restart

Bash:
Job for pvedaemon.service failed because a timeout was exceeded.
See "systemctl status pvedaemon.service" and "journalctl -xe" for details.

systemctl status pvedaemon.service

Bash:
● pvedaemon.service - PVE API Daemon
     Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
     Active: activating (start) since Thu 2022-03-17 18:32:11 CET; 17s ago
Cntrl PID: 683742 (pvedaemon)
      Tasks: 6 (limit: 77016)
     Memory: 355.1M
        CPU: 283ms
     CGroup: /system.slice/pvedaemon.service
             ├─134389 pvedaemon worker
             ├─134390 pvedaemon worker
             ├─134391 pvedaemon worker
             ├─677909 /usr/bin/perl -T /usr/bin/pvedaemon stop
             ├─680793 /usr/bin/perl -T /usr/bin/pvedaemon start
             └─683742 /usr/bin/perl -T /usr/bin/pvedaemon start

pvecm updatecerts --force

Bash:
(re)generate node files
generate new node certificate
got timeout

ntp is synced on all nodes.

I can only access to the gui or restart services killing the cluster and reset quorum with pvecm expected 1.

EDIT: probably i was just impatient, the issue resolved itself after a night.
 
Last edited: