Hi all,
I have just started the Proxmox VE (6.2) adventure on a new SoYouStart host and since the day one I've been having GUI hangs... at least when I try to open it from my own computer. In localhost the page is served:
I would really appreciate any ideas. Hard reset brings the GUI back, but just for a while... if I leave my computer for a few hours with the GUI tab open (doesn't matter if I am logged in or not), I just cannot open it anymore (server timeout error). Currently the GUI doesn't work, but I do have SSH open (it was opened before the hang) and working. I had a situation when even SSH wasn't working, but SoYouStart's IPMI was connecting me to the server. This is what I saw when connecting through IPMI:
Firewall is off:
port 8006 is open:
opening the page in browser (https://my.url:8006) results in connection time out (tried from latest Firefox, Chrome, Safari) and the access log shows no entries after the hang happened:
tcpdump also doesn't seem to see anything:
I have two VMs, both are running without any issue the entire time:
The node itself is listed normally with pvesh, but pvecm reports an error regarding corosync:
By the way, I don't have a cluster, haven't created one, it's just a single node.
Corosync service is also down:
continued in the next post due to length limit...
I have just started the Proxmox VE (6.2) adventure on a new SoYouStart host and since the day one I've been having GUI hangs... at least when I try to open it from my own computer. In localhost the page is served:
Bash:
→ curl -s -k https://localhost:8006 | grep title
<title>dom01 - Proxmox Virtual Environment</title>
→
I would really appreciate any ideas. Hard reset brings the GUI back, but just for a while... if I leave my computer for a few hours with the GUI tab open (doesn't matter if I am logged in or not), I just cannot open it anymore (server timeout error). Currently the GUI doesn't work, but I do have SSH open (it was opened before the hang) and working. I had a situation when even SSH wasn't working, but SoYouStart's IPMI was connecting me to the server. This is what I saw when connecting through IPMI:
Firewall is off:
Bash:
→iptables --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
port 8006 is open:
Bash:
→ netstat -an |grep 8006
tcp 0 0 0.0.0.0:8006 0.0.0.0:* LISTEN
opening the page in browser (https://my.url:8006) results in connection time out (tried from latest Firefox, Chrome, Safari) and the access log shows no entries after the hang happened:
Bash:
→ tail -f /var/log/pveproxy/access.log
my.home.ip.addr - root@pam [09/11/2020:16:44:55 +0100] "GET /api2/json/cluster/resources HTTP/1.1" 200 995
my.home.ip.addr - root@pam [09/11/2020:16:44:57 +0100] "GET /api2/json/cluster/tasks HTTP/1.1" 200 898
my.home.ip.addr - - [09/11/2020:19:19:56 +0100] "GET /api2/json/nodes/dom01/status HTTP/1.1" 401 -
my.home.ip.addr - - [09/11/2020:19:20:14 +0100] "GET /api2/json/nodes/dom01/status HTTP/1.1" 401 -
my.home.ip.addr - - [09/11/2020:19:20:32 +0100] "GET /api2/json/nodes/dom01/status HTTP/1.1" 401 -
my.home.ip.addr - - [09/11/2020:19:20:50 +0100] "GET /api2/json/nodes/dom01/status HTTP/1.1" 401 -
my.home.ip.addr - - [09/11/2020:19:21:08 +0100] "GET /api2/json/nodes/dom01/status HTTP/1.1" 401 -
my.home.ip.addr - - [09/11/2020:19:21:26 +0100] "GET /api2/json/nodes/dom01/status HTTP/1.1" 401 -
my.home.ip.addr - - [09/11/2020:19:21:31 +0100] "GET /api2/json/access/domains HTTP/1.1" 200 159
127.0.0.1 - - [09/11/2020:23:01:47 +0100] "GET / HTTP/1.1" 200 2161
tcpdump also doesn't seem to see anything:
Bash:
→ tcpdump -i vmbr0 port 8006
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
2 packets received by filter
0 packets dropped by kernel
I have two VMs, both are running without any issue the entire time:
Bash:
→ qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
100 mail running 8196 600.00 1523
101 drive running 16384 600.00 1586
The node itself is listed normally with pvesh, but pvecm reports an error regarding corosync:
Bash:
→ pvesh get /nodes
┌───────┬────────┬───────┬───────┬────────┬───────────┬───────────┬─────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────┐
│ node │ status │ cpu │ level │ maxcpu │ maxmem │ mem │ ssl_fingerprint │ uptime │
╞═══════╪════════╪═══════╪═══════╪════════╪═══════════╪═══════════╪═════════════════════════════════════════════════════════════════════════════════════════════════╪═════════════╡
│ dom01 │ online │ 1.55% │ │ 16 │ 62.46 GiB │ 10.53 GiB │ 05:21:69:BA:F6:1E:83:27:E0:50:73:EA:3B:C8:2E:62:FB:3D:C4:5F:02:2B:12:2D:68:F9:24:C0:11:C5:5A:31 │ 19h 10m 51s │
└───────┴────────┴───────┴───────┴────────┴───────────┴───────────┴─────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┘
Bash:
→ pvecm nodes
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
By the way, I don't have a cluster, haven't created one, it's just a single node.
Corosync service is also down:
Bash:
→ service corosync status
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Condition: start condition failed at Mon 2020-11-09 23:04:20 CET; 37min ago
└─ ConditionPathExists=/etc/corosync/corosync.conf was not met
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Nov 09 04:24:18 dom01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Nov 09 20:47:20 dom01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Nov 09 20:49:56 dom01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Nov 09 21:15:37 dom01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Nov 09 21:22:55 dom01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Nov 09 21:26:11 dom01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Nov 09 21:31:16 dom01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Nov 09 23:04:20 dom01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
continued in the next post due to length limit...