[SOLVED] Can't login to webui: Network error or Proxmox VE services not running?

filedotexe

New Member
May 28, 2021
4
0
1
24
Hi, after rebooting my main node I've been unable to login to the webui, i get "Network error or Proxmox VE services not running?". I've checked other threads here and have followed them to no avail. My setup consists of two nodes. I can login to the secondary node, but nothing shows and i get "Connection error 401: permission denied - invalid PVE ticket" Following other threads i've tried systemctl restart pveproxy pvedaemon and it did nothing, pvecm updatecerts but it returns with "got timeout".

my pvedaemon status looks fine.
Code:
● pvedaemon.service - PVE API Daemon
   Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-05-28 02:49:06 CDT; 3min 18s ago
  Process: 5227 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
 Main PID: 5229 (pvedaemon)
    Tasks: 4 (limit: 4915)
   Memory: 135.4M
   CGroup: /system.slice/pvedaemon.service
           ├─5229 pvedaemon
           ├─5230 pvedaemon worker
           ├─5231 pvedaemon worker
           └─5232 pvedaemon worker

May 28 02:49:06 nexus systemd[1]: Starting PVE API Daemon...
May 28 02:49:06 nexus pvedaemon[5229]: starting server
May 28 02:49:06 nexus pvedaemon[5229]: starting 3 worker(s)
May 28 02:49:06 nexus pvedaemon[5229]: worker 5230 started
May 28 02:49:06 nexus pvedaemon[5229]: worker 5231 started
May 28 02:49:06 nexus pvedaemon[5229]: worker 5232 started
May 28 02:49:06 nexus systemd[1]: Started PVE API Daemon.

pvecm status looks good to my knowledge
Code:
Cluster information
-------------------
Name:             Home
Config Version:   7
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri May 28 02:53:10 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.3694
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.0.135 (local)
0x00000002          1 10.0.0.134


I've check journalctl and I do see an error that might be causing the issue but I honestly have no clue what to do.
Code:
May 28 02:35:41 nexus kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 28 02:35:41 nexus kernel:       Tainted: P           O      5.4.114-1-pve #1
May 28 02:35:41 nexus kernel: INFO: task pvedaemon:2246 blocked for more than 120 seconds.
May 28 02:35:41 nexus kernel: R13: 000055de6de61230 R14: 000055de6ddfd388 R15: 00000000000001ff
May 28 02:35:41 nexus kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055de6b0cea98
May 28 02:35:41 nexus kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000006
May 28 02:35:41 nexus kernel: RDX: 000055de684bd3d4 RSI: 00000000000001ff RDI: 000055de6de61230
May 28 02:35:41 nexus kernel: RAX: ffffffffffffffda RBX: 000055de69c85260 RCX: 00007f76782740d7
May 28 02:35:41 nexus kernel: RSP: 002b:00007ffdba2c1288 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
May 28 02:35:41 nexus kernel: Code: Bad RIP value.
May 28 02:35:41 nexus kernel: RIP: 0033:0x7f76782740d7
May 28 02:35:41 nexus kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 28 02:35:41 nexus kernel:  do_syscall_64+0x57/0x190
May 28 02:35:41 nexus kernel:  __x64_sys_mkdir+0x1b/0x20
May 28 02:35:41 nexus kernel:  do_mkdirat+0x59/0x110
May 28 02:35:41 nexus kernel:  filename_create+0x8e/0x180
May 28 02:35:41 nexus kernel:  down_write+0x3d/0x40
May 28 02:35:41 nexus kernel:  rwsem_down_write_slowpath+0x2ed/0x4a0
May 28 02:35:41 nexus kernel:  schedule+0x33/0xa0
May 28 02:35:41 nexus kernel:  ? filename_parentat.isra.55.part.56+0xf7/0x180
May 28 02:35:41 nexus kernel:  __schedule+0x2e6/0x700
May 28 02:35:41 nexus kernel: Call Trace:
May 28 02:35:41 nexus kernel: pvesr           D    0  2007      1 0x00000000

I'm currently running pve-manager/6.4-6/be2fa32c (running kernel: 5.4.114-1-pve)

Thanks, in advanced for any help, I'm unsure what to do.
 
* Make sure that the system-time is in sync between both nodes (e.g. by installing chrony)
* let `journalctl -f` run in one shell then
* try restarting corosync and pve-cluster (`systemctl restart corosync pve-cluster`
* afterwards restart pvedaemon and pveproxy
* then check the journal output (this should give a more complete picture)

I hope this helps!
 
Hi, unfortunately syncing the system time didn't work. Journalctl -f shows some errors which I attached below, but eventually it seems to sort itself out and everything starts. I did find some "no quorum" errors but right after it would say "[status] notice: node has quorum". I did however notice that pvesr had failed to start while checking the logs. systemd show it started but im not sure if this is what its supposed to look like

Code:
root@nexus ~# systemctl status pvesr
● pvesr.service - Proxmox VE replication runner
   Loaded: loaded (/lib/systemd/system/pvesr.service; static; vendor preset: enabled)
   Active: activating (start) since Fri 2021-05-28 12:48:23 CDT; 9min ago
 Main PID: 73510 (pvesr)
    Tasks: 1 (limit: 4915)
   Memory: 77.7M
   CGroup: /system.slice/pvesr.service
           └─73510 /usr/bin/perl -T /usr/bin/pvesr run --mail 1

May 28 12:48:23 nexus systemd[1]: Starting Proxmox VE replication runner...
May 28 12:48:23 nexus pvesr[73510]: trying to acquire cfs lock 'file-replication_cfg' ...
May 28 12:48:24 nexus pvesr[73510]: trying to acquire cfs lock 'file-replication_cfg' ...
May 28 12:48:25 nexus pvesr[73510]: trying to acquire cfs lock 'file-replication_cfg' ...
May 28 12:48:26 nexus pvesr[73510]: trying to acquire cfs lock 'file-replication_cfg' ...
May 28 12:48:27 nexus pvesr[73510]: trying to acquire cfs lock 'file-replication_cfg' ...




Journalctl -f
Code:
May 28 12:48:18 nexus systemd[1]: Stopping Corosync Cluster Engine...
May 28 12:48:18 nexus corosync-cfgtool[73474]: Shutting down corosync
May 28 12:48:18 nexus corosync[72132]:   [MAIN  ] Node was shut down by a signal
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Unloading all Corosync service engines.
May 28 12:48:18 nexus corosync[72132]:   [QB    ] withdrawing server sockets
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
May 28 12:48:18 nexus pmxcfs[72123]: [confdb] crit: cmap_dispatch failed: 2
May 28 12:48:18 nexus corosync[72132]:   [QB    ] withdrawing server sockets
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Service engine unloaded: corosync configuration map access
May 28 12:48:18 nexus corosync[72132]:   [QB    ] withdrawing server sockets
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Service engine unloaded: corosync configuration service
May 28 12:48:18 nexus pmxcfs[72123]: [status] crit: cpg_dispatch failed: 2
May 28 12:48:18 nexus pmxcfs[72123]: [status] crit: cpg_leave failed: 2
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_dispatch failed: 2
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_leave failed: 2
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pmxcfs[72123]: [dcdb] crit: cpg_send_message failed: 9
May 28 12:48:18 nexus pve-ha-lrm[1638]: unable to write lrm status file - unable to open file '/etc/pve/nodes/nexus/lrm_status.tmp.1638' - Device or resource busy
May 28 12:48:18 nexus corosync[72132]:   [QB    ] withdrawing server sockets
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
May 28 12:48:18 nexus pmxcfs[72123]: [quorum] crit: quorum_dispatch failed: 2
May 28 12:48:18 nexus pmxcfs[72123]: [status] notice: node lost quorum
May 28 12:48:18 nexus corosync[72132]:   [QB    ] withdrawing server sockets
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Service engine unloaded: corosync profile loading service
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Service engine unloaded: corosync resource monitoring service
May 28 12:48:18 nexus corosync[72132]:   [SERV  ] Service engine unloaded: corosync watchdog service
May 28 12:48:19 nexus pmxcfs[72123]: [quorum] crit: quorum_initialize failed: 2
May 28 12:48:19 nexus pmxcfs[72123]: [quorum] crit: can't initialize service
May 28 12:48:19 nexus pmxcfs[72123]: [confdb] crit: cmap_initialize failed: 2
May 28 12:48:19 nexus pmxcfs[72123]: [confdb] crit: can't initialize service
May 28 12:48:19 nexus pmxcfs[72123]: [dcdb] notice: start cluster connection
May 28 12:48:19 nexus pmxcfs[72123]: [dcdb] crit: cpg_initialize failed: 2
May 28 12:48:19 nexus pmxcfs[72123]: [dcdb] crit: can't initialize service
May 28 12:48:19 nexus pmxcfs[72123]: [status] notice: start cluster connection
May 28 12:48:19 nexus pmxcfs[72123]: [status] crit: cpg_initialize failed: 2
May 28 12:48:19 nexus pmxcfs[72123]: [status] crit: can't initialize service
May 28 12:48:19 nexus corosync[72132]:   [MAIN  ] Corosync Cluster Engine exiting normally
May 28 12:48:19 nexus systemd[1]: corosync.service: Succeeded.
May 28 12:48:19 nexus systemd[1]: Stopped Corosync Cluster Engine.
May 28 12:48:19 nexus systemd[1]: Stopping The Proxmox VE cluster filesystem...
May 28 12:48:19 nexus pmxcfs[72123]: [main] notice: teardown filesystem
May 28 12:48:20 nexus pvesr[72154]: trying to acquire cfs lock 'file-replication_cfg' ...
May 28 12:48:20 nexus systemd[1865]: etc-pve.mount: Succeeded.
May 28 12:48:20 nexus systemd[1]: etc-pve.mount: Succeeded.
May 28 12:48:21 nexus pmxcfs[72123]: [quorum] crit: quorum_finalize failed: 9
May 28 12:48:21 nexus pmxcfs[72123]: [confdb] crit: cmap_track_delete nodelist failed: 9
May 28 12:48:21 nexus pmxcfs[72123]: [confdb] crit: cmap_track_delete version failed: 9
May 28 12:48:21 nexus pmxcfs[72123]: [confdb] crit: cmap_finalize failed: 9
May 28 12:48:21 nexus pmxcfs[72123]: [main] notice: exit proxmox configuration filesystem (0)
May 28 12:48:21 nexus systemd[1]: pve-cluster.service: Succeeded.
May 28 12:48:21 nexus systemd[1]: Stopped The Proxmox VE cluster filesystem.
May 28 12:48:21 nexus systemd[1]: Starting The Proxmox VE cluster filesystem...
May 28 12:48:21 nexus pmxcfs[73492]: [quorum] crit: quorum_initialize failed: 2
May 28 12:48:21 nexus pmxcfs[73492]: [quorum] crit: can't initialize service
May 28 12:48:21 nexus pmxcfs[73492]: [confdb] crit: cmap_initialize failed: 2
May 28 12:48:21 nexus pmxcfs[73492]: [confdb] crit: can't initialize service
May 28 12:48:21 nexus pmxcfs[73492]: [dcdb] crit: cpg_initialize failed: 2
May 28 12:48:21 nexus pmxcfs[73492]: [dcdb] crit: can't initialize service
May 28 12:48:21 nexus pmxcfs[73492]: [status] crit: cpg_initialize failed: 2
May 28 12:48:21 nexus pmxcfs[73492]: [status] crit: can't initialize service
May 28 12:48:22 nexus pvesr[72154]: trying to acquire cfs lock 'file-replication_cfg' ...
May 28 12:48:22 nexus systemd[1]: Started The Proxmox VE cluster filesystem.
May 28 12:48:22 nexus systemd[1]: Starting Corosync Cluster Engine...
 
I rebooted both nodes and my router and now I can login! but i get kicked out immediately with Connection error 401: permission denied - invalid PVE ticket. Some progress at last!
edit: I was previously unable to use the qm command to manually start my vms but it's working again. I've check both nodes and their times are both synced, but at least i can start my vms.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!