Ex-cluster node has frequent writes to pve-cluster and pve-replication

chill

New Member
Sep 2, 2023
4
0
1
Hi,

Running 8.0.4 on a new install. This was joined to a cluster, had VMs migrated over onto it, and was then removed from that cluster by following "Separate a Node Without Reinstalling". So far so good.

The issue I see is that the following files are written to / re-created every 60 seconds. Taking a zfs snapshot and diffing I see:

Code:
 # zfs diff rpool/ROOT/pve-1@snap1 rpool/ROOT/pve-1@snap2
M       /var/lib/pve-manager
M       /var/lib/pve-cluster/config.db-wal
M       /var/lib/pve-cluster/config.db-shm
+       /var/lib/pve-manager/pve-replication-state.json
-       /var/lib/pve-manager/pve-replication-state.json

The file /var/lib/pve-manager/pve-replication-state.json only contains {}. Also pvesr status shows nothing, and in the log for pvescheduler I see the below after startup:

Code:
pvescheduler[101516]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
pvescheduler[101515]: replication: cfs-lock 'file-replication_cfg' error: no quorum!

Is there a timer or something I need to stop here?

Thanks!
 
Hi there, thanks for the reply. Each of those windows in the GUI are empty and have a popup saying "Replication needs at least two nodes".
 
Hi,

Has anyone found a solution? I just found out I have a similar problem. The following files seem to be very excessively written to every minute!! This node is not a part of a cluster and has no replication job at all.

Code:
/var/lib/pve-manager
/var/lib/pve-manager/pve-replication-state.json
/var/lib/pve-cluster/config.db-shm
/var/lib/pve-cluster/config.db-wal

I've also turned off HA and corosync to minimize disk writes. Is there any other services that should be stopped?
Code:
systemctl disable --now pve-ha-crm.service
systemctl disable --now pve-ha-lrm.service
systemctl disable --now corosync.service
 
Those files are updated by the pve-cluster service, which is needed even if you don't have a cluster, as it provides the Proxmox Cluster Filesysme (pmxcfs) that generates /etc/pve directory where the configuration resides [1]. Shouldn't be a problem at all.

[1] https://pve.proxmox.com/wiki/Service_daemons#pve-cluster
Thank you for your reply.

Since this node is not a part of a cluster, would it be ok to mount /var/lib/pve-manager and /var/lib/pve-cluster/ with sync=disabled zfs property in ZFS? That would disable sync writes and hopefully reduce the constant disk writes of pve-cluster. This ZFS pool is using a special vdev on solid-state disks, so the constant disk writes are not good.
 
I wouldn't mess with that at all. If the few bytes/minute that are written by those processes have any impact in your drives, replace them with proper hardware. Anything will write more bytes that that: logs, backups, tasks, VM's themselves, etc.

Well, as I am finding out more and more of these threads, I noticed the same pattern - everyone advised to use PLP SSDs and not look for implementation reasons for the very unusual:

https://forum.proxmox.com/threads/etc-pve-500k-600m-amplification.154074/#post-701223

If anyone (@chill @tkittich) caught in this is willing to try alternative pmxcfs, please let me know. In turn, it would help find out which processes are actually writing beyond 1 block size and which are just constantly writing, which would be the the next candidate for optimisation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!