During adding node to cluster: "pve cluster filesystem not online"

ralph42

New Member
Nov 4, 2025
13
0
1
Again, I tried to add a 4th node to my 3-node cluster. fresh hardware, fresh setup of Proxmox. Updated to the current version.

When joining the cluster, I have the same result like a few days ago: The node appears in the tree, but the "Join CLuster" task fails. Then the cluster crashes, until I shutdown the new node and "pvecm delnode nodename".

On the new node, I had a look at journalctl. After some complains "[quorum] crit: quorum_initialize failed" I found:
"replication: cfs_lock 'file-replication_cfg' error: pve cluster filesystem not online.
There are futher lines about that, but then comes the line I was talking about a few days ago:
"unable to create '/etc/pve/nodes' - Permission denied" - which is also the message shown in the web frontend.

If I understand that correctly, "Permission denied" is a bit misleading. If the filesystem is not available, nothing can be written and joining the cluster fails.

After rebooting of course there is a filesystem available again and I can see that /etc/pve/nodes is empty on the new node.

Ok, an ideas?
 
This probably is not about the DataCenter Manager, right?
You have issues adding a Node to a PVE Cluster. This is not the appropriate topic to post it in.

Anyway...

So the current state is, that you have added the node, it failed, and then you removed it again, while it was off, yes?

Remove any leftovers of the old node from the working 3-Nodes cluster filesystem.
(This should be under /etc/pve/nodes/NODENAME)

Reset the fourth node back to a single node (Make sure every network reaching the other nodes is disconnected for now). Follow this Guide:
https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node
Under Separate a Node Without Reinstalling.

Before trying to readd the node to the cluster, reconnect the networks again, check if every cluster network is reachable between the three nodes and the new node.
Do you have any weird MTU or VLAN configurations?


Also be aware that if you want to use HA with four nodes, you probably need a Q-Device:
https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support
 
Good Morning.

Not exactly.

While adding the node to the cluster, I got an error message in the GUI for this task. It said, that some permissions on the filesystem are denied.
After that message appeared, the cluster got unstable after under a minute. VMs are not reachable anymore, the status of nodes switch to red,... such stuff.
When I shutdown the new node, the cluster recovers.
In /etc/pve/nodes of the existing nodes I can find the new folder for the new node.

The I restart that new node again and try to find out what was going wrong. During that investigation I detected that message I posted about pve cluster filesystem not online
/etc/pve/nodes of this node was empty.

I repeated that procedure several times, with different hardware. Always the same result.
(I am meanwhile aware of that "Seperate without reinstalling", but I also did reinstalling several times)

My network config is as simple as it can be. 1 network device on each node. No VLAN. Everything on default.

About a year ago I was able to build up the existing cluster without any issues.

My plan was to add 2 more (quite dumb) nodes to keep an odd number.
 
Could you please share the network config of every node? Also of the new node, not added yet.

cat /etc/network/interfaces

You may also share a system report of every node, if you are comfortable with that.
Check the system report first for any private information, that you do not want to share first.
You can create it under every Node > Subscription > System Report. (Repeat for every node)


My plan was to add 2 more (quite dumb) nodes to keep an odd number.
If you add a fourth real node, you only need one additional to have an odd number again.
 
Is there a different way to create system report for the failed node? Web gui does not come up.
 

Attachments

So far the config of the three nodes is looking fine.
I can't see any issues.

Is there a different way to create system report for the failed node? Web gui does not come up.

Sure you can also do it via ssh: "pvereport > /root/pvereport.txt"

You need to get the file via scp. On Windows in PowerShell, you can do:

scp root@<PVEIP>:/root/pvereport.txt pvereport.txt

This copies the pvereport.txt to your current working directory (Where the PowerShell is in).
But that the GUI does not come up is weird.

What does "systemctl status pveproxy" say?
Have you modified /etc/hosts or /etc/hostname? Do they match the hostnames?
Maybe you have missed settings the hostnames there?

Also is the time properly synchronized between all nodes? (timedatectl)
=> System Clock synchronized: Yes?
 
Last edited:
Here comes pvereport of node05.

The content of /etc/hosts can be found in my prvious post in "pve nodes network.txt".

root@node05:~# systemctl status pveproxy
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; preset: enabled)
Active: active (running) since Thu 2025-11-13 13:13:53 CET; 11min ago
Process: 893 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 906 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 909 (pveproxy)
Tasks: 4 (limit: 9341)
Memory: 158.7M
CPU: 4.590s
CGroup: /system.slice/pveproxy.service
├─909 pveproxy
├─910 "pveproxy worker"
├─911 "pveproxy worker"
└─912 "pveproxy worker"

Nov 13 13:13:48 node05 systemd[1]: Starting pveproxy.service - PVE API Proxy Server...
Nov 13 13:13:52 node05 pvecm[894]: got inotify poll request in wrong process - disabling inotify
Nov 13 13:13:53 node05 pveproxy[909]: starting server
Nov 13 13:13:53 node05 pveproxy[909]: starting 3 worker(s)
Nov 13 13:13:53 node05 pveproxy[909]: worker 910 started
Nov 13 13:13:53 node05 pveproxy[909]: worker 911 started
Nov 13 13:13:53 node05 pveproxy[909]: worker 912 started
Nov 13 13:13:53 node05 systemd[1]: Started pveproxy.service - PVE API Proxy Server.
 

Attachments

Also is the time properly synchronized between all nodes? (timedatectl)
=> System Clock synchronized: Yes?

"But that the GUI does not come up is weird."
After the 2nd reboot it worked, without changes.
Not normal behavior.

Do you maybe have a device in your network with the same IP? That could explain why the GUI is sometimes available, sometimes not.

The config looks fine to me, as long as you have /etc/hostname also correct with node04 or node05, and the /etc/hosts correct (Which I can confirm), it should work.

Did you clean the previous three nodes properly? Also check, that no orphaned key is still in /etc/pve/priv/authorized_keys.
Also:



After removal of the node, its SSH fingerprint will still reside in the known_hosts of the other nodes. If you receive an SSH error after rejoining a node with the same IP or hostname, run pvecm updatecerts once on the re-added node to update its fingerprint cluster wide.

After that I am sadly out of ideas.
 
System clock synchronized: yes
NTP service: active


/etc/hostname also good

Nobody answers when the new host is down, so no double IP

in /etc/pve/priv/authorized_keys there is no new entry for the new host on the existing nodes of the cluster, obviously the process breaks down before they get written.

And yes, I cleaned up /etc/pve/nodes.

ALso out of ideas.
 
So I took the new host to play a bit. Update to Proxmox 9.
No issues with that.

And then try to add it to the cluster again. No, it did not work. But it behaved different.

- The cluster did not fail that fast, I could "pvecm delnode node05" on a cluster member without "Not Quorum" error, while node05 was still up and running.
- There was no pve cluster filesystem not online in journalctl
- /etc/pve/nodes does not exist -> a lot of errors in journalctl that writing was not possible
- /etc/pve/priv is empty, only a folder named "lock" is there

After going through "... without reinstalling" and reboot, there is a /etc/pve/nodes folder, but only containing folder node05. <--- this is a really strange behavior!!!!!!!!
All other nodes have /etc/pve/nodes/node05 (and themselves, of course)
But Nothing about node05 in /etc/pve/priv/authorized-keys


However: Without any change in any configuration, the process behaved different, only by upgrading node05 from 8.1.14 to 9.0.11

There is a version dependency.