[SOLVED] Proxmox node is offline but everything else is running

fcarucci

New Member
May 13, 2023
26
10
3
Hello, I have a 5 node proxmox cluster with 4 node ceph.
One of the nodes' UI can not be accessed (server offline), but I can ssh into the node, all VMs are running, ceph monitor is up and everything else seems to be working fine.
I can ping the ip where the UI is supposed to be running on. I disabled the cluster firewall. Ceph is running fine with no errors.

pveproxy seems to be running
Code:
  22582 ?        S      0:00 pveproxy worker
  22583 ?        S      0:00 pveproxy worker
  22587 ?        S      0:00 pveproxy worker
cluster status looks ok
Code:
Cluster information
-------------------
Name:             Slapdash
Config Version:   30
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sat Dec 14 13:28:49 2024
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000001
Ring ID:          1.5520
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.0.1 (local)
0x00000002          1 10.10.0.2
0x00000004          1 10.10.0.4
0x00000005          1 10.10.0.5
0x00000006          1 10.10.0.6

The only substantial difference between this node and the others is that I updated this node to the latest kernel version this morning (6.8.12-5).
What else can I try? Thanks!
 
Last edited:
After a "systemctl restart pveproxy" webui works again ? Otherwise if not migrate vm's and reboot the host.
 
I tried rebooting the host.

I just found this big clue
Code:
Dec 14 13:44:10 pve pveproxy[38402]: unable to open log file '/var/log/pveproxy/access.log' ->
Dec 14 13:44:10 pve pveproxy[38403]: unable to open log file '/var/log/pveproxy/access.log'
 
Ok, I fixed it. If anyone has the same problem, here's what happened.

I was running out of space and I mindlessly did a rm -rf of the content of /var/logs which is probably not a smart thing to do in general.
But in this case, it looks like if pveproxy doesn't find the its folder in the logs, it fails.
I fixed it by creating the folder and setting the right permissions.

I would suggest to add some code to create the log folder if it doesn't exist.

Thanks for your help!
 
  • Like
Reactions: waltar
So your "/" is too small or you have too much other data ... "/" should never run full :)