"pmxcfs" writing to disk all the time...

Rhinox

Active Member
Sep 28, 2016
272
38
28
34
I noticed this: when I start PVE 5.0, the process "pmxcfs" keeps writing something to disk all the time.

proxmox_writing.jpg


Every 3-4 seconds there is a spike in writeops. This is going on and on forewer.

According to wiki, "The Proxmox Cluster file system (“pmxcfs”) is a database-driven file system for storing configuration files, replicated in real time to all cluster nodes using corosync. We use this to store all PVE related configuration files".

This is solo-host with just local storage (no cluster). No VM, no LXC-container is running. No config-file is being edited. The whole host is basically idle, except for steady writing-flux caused by pmxcfs process. Why???

This is probably one reason why consumer-SSD gets eaten up so quickly, and if installed on usb-stick, it gets destroyed within a few days (this happened to me some time ago even with industry-level SLC-based usb-stick!)....
 
even on an empty standalone node, the pve-ha-lrm daemon writes periodically its status into /etc/pve (this is not optimal and our improvement list)
but it is a rather small write

This is probably one reason why consumer-SSD gets eaten up so quickly, and if installed on usb-stick, it gets destroyed within a few days (this happened to me some time ago even with industry-level SLC-based usb-stick!)....
your observation was right (pmxcfs writes periodically), but your conclusion is wrong
on a host with running vms, there will be more written by collecting the rrd stats, the guest file system (when on the same disk), the logs from systemd, kernel etc. so this is not the fault of the pmxcfs
 
There is no problem with writing status (if it is necessary), but doing it every 3-4 seconds seems to me to be overkill. Especially if host is configured as standalone (I suppose that "ha" has something to do with high availability; can I disable pve-ha-lrm/pve-ha-crm on stand-alone host?).

Concerning my conclusion: I tried to start single VM, waited to boot and "settle" a little, and let it run idle. And guess what? Surprisingly, most of the writes were again those of "pmxcfs" (but I'm collecting logs on different box). As you confirmed, those are small writes, but short-periodic.

This actually is worse for SSD then writing big data, because even with writing a few bytes the "page" (size can be anything between 2kB and 16kB) is marked as used and next writing must be done to the next free page. And what's even worse, "pages" can not be modified individually, but in "blocks" (128 or 256 "pages"). So once every "page" is used at least once, all next writes must be done using "read-modify-write" for the whole block. "pmxcfs" might send just a few bytes, but ultimately a few MB must be written. That's terribly high "write amplification factor"! And because zfs (on linux) does not support "trim", it comes to this scenario very quickly.

I still think this exactly (a lot of small writes) is the reason why on consumer-ssd TBW-counter is spinning so quickly. Previously I have used 850/pro, and ~25% of its TBW was expended in just a few months (btw, I did not see this with other hypervisors I tested). I'm using industry-level SSD now so it does not bother me so much, but I still think this issue is worth of investigating. When you check this forum, there are a few topics with users complaining TBW of their SSDs are used very quickly (expecially with PVE 5.x)...
 
There is no problem with writing status (if it is necessary), but doing it every 3-4 seconds seems to me to be overkill. Especially if host is configured as standalone (I suppose that "ha" has something to do with high availability; can I disable pve-ha-lrm/pve-ha-crm on stand-alone host?).
yes, if you do not enable ha you can disable the pve-ha-lrm/pve-ha-crm

This actually is worse for SSD then writing big data, because even with writing a few bytes the "page" (size can be anything between 2kB and 16kB) is marked as used and next writing must be done to the next free page. And what's even worse, "pages" can not be modified individually, but in "blocks" (128 or 256 "pages"). So once every "page" is used at least once, all next writes must be done using "read-modify-write" for the whole block. "pmxcfs" might send just a few bytes, but ultimately a few MB must be written. That's terribly high "write amplification factor"! And because zfs (on linux) does not support "trim", it comes to this scenario very quickly.

I still think this exactly (a lot of small writes) is the reason why on consumer-ssd TBW-counter is spinning so quickly. Previously I have used 850/pro, and ~25% of its TBW was expended in just a few months (btw, I did not see this with other hypervisors I tested). I'm using industry-level SSD now so it does not bother me so much, but I still think this issue is worth of investigating. When you check this forum, there are a few topics with users complaining TBW of their SSDs are used very quickly (expecially with PVE 5.x)...
this depends very much on the ssd, and even then, every other part of the system (rrd logging, logs from kernel etc) do the same thing, especially on zfs
i have here for example a crucial mx200 as root disk, and after 1.5 years the "percent lifetime used" (crucial wearout indicator) stands at 1%
 
  • Like
Reactions: lixaotec
Yes, I can agree. It just seems in this case that those other parts of systems do it less frequently. And log-server can be configured not to write every message separatelly, instead filling first predefined buffer (accepting the risk of loosing some logs in case of sudden system failure)...

I'll try to disable pve-ha-lrm/pve-ha-crm and see if it changes anyting. Thanks for the reply.
 
A few more observations:

1. It is not possible to disable service from web-interface, only stop it (or start/restart). It means, after reboot it is again running. Could it be possible to add to "system" tab also "disable" (in addition to "start", "stop", "restat" that are already there)?

2. I logged to console and used "systemctl" to stop & disable pve-ha-lrm/pve-ha-crm. Now the process "pmxcfs" is not writing to disk all the time. Idle pve-host is really idle, there is no i/o activity.

I tried to stop "pve-cluster" too, but that is probably not good idea even on stand-alone host, as immediatelly syslog started to write error-messages with the same speed as "pmxcfs" did previously...
 
I tried to stop "pve-cluster" too, but that is probably not good idea even on stand-alone host, as immediatelly syslog started to write error-messages with the same speed as "pmxcfs" did previously...
yes, you should not do that, pve-cluster is the service which starts pmxcfs, and we need it for many things
 
even on an empty standalone node, the pve-ha-lrm daemon writes periodically its status into /etc/pve (this is not optimal and our improvement list)
but it is a rather small write



Is a mistake(in my own opinion ), in etc must be only static configuration files, and not some status info. The place for this is /tmp or /var/tmp. This is the (pmxcfs) first case that doing this(from what I see/know/remember) in more of 10 years in linux.

Maybe a symlink could fix this? Or a ram tmp ?
 
Is a mistake(in my own opinion ), in etc must be only static configuration files, and not some status info. The place for this is /tmp or /var/tmp. This is the (pmxcfs) first case that doing this(from what I see/know/remember) in more of 10 years in linux.

Maybe a symlink could fix this? Or a ram tmp ?

/etc/pve is not a real directory stored on disk, it is a FUSE filesystem backed by an sqlite DB stored in /var/lib/pve-cluster
 
/etc/pve is not a real directory stored on disk, it is a FUSE filesystem backed by an sqlite DB stored in /var/lib/pve-cluster
Be it anyway, it does not correspond to FHS nor LSB. Status is kind of log-message, so it should be written somewhere ot /var/log. So that db-based fuse should be mounted somewhere ins /var/log...
 
Be it anyway, it does not correspond to FHS nor LSB. Status is kind of log-message, so it should be written somewhere ot /var/log. So that db-based fuse should be mounted somewhere ins /var/log...

no. status != log, and status in this case is akin to state anyway. pmxcfs is primarily used for synchronized configuration and state storage. which means that /var/lib is the right place to store the (non-user accessible) DB:

5.8. /var/lib : Variable state information

State information is generally used to preserve the condition of an application (or a group of inter-related applications) between invocations and between different instances of the same application. State information should generally remain valid after a reboot, should not be logging output, and should not be spooled data.

An application (or a group of inter-related applications) must use a subdirectory of /var/lib for its data. There is one required subdirectory, /var/lib/misc, which is intended for state files that don't need a subdirectory; the other subdirectories should only be present if the application in question is included in the distribution.

/var/lib/<name> is the location that must be used for all distribution packaging support. Different distributions may use different names, of course.

and /etc/pve is the right place to expose it to the user, because the primary interaction with the mounted pmxcfs (by the user) is for configuration purposes. there is some extra information exposed via the mounted FUSE file system that is actually not backed by the DB, but transparently generated, but overall the existing scheme is by far the best fit in the standard file system hierarchy.
 
  • Like
Reactions: Pablo Alcaraz
So. In 5.2 this problem is still there. Is there any solution other than disable pve-ha-lrm/pve-ha-crm?
 
Still no solution?
My standalone proxmox installation writes 20G every day when idle.

Also is a path to rrdcached directory hardcoded?
I moved /var/lib/rrdcached to another disk and changed paths in /etc/default/rrdcached appropriately, but got errors:
Jan 23 21:19:42 pve pmxcfs[2853]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/113: opening '/var/lib/rrdcached/db/pve2-vm/113': No such file or directory
 
Last edited:
Still no solution?
My standalone proxmox installation writes 20G every day when idle.

Also is a path to rrdcached directory hardcoded?
I moved /var/lib/rrdcached to another disk and changed paths in /etc/default/rrdcached appropriately, but got errors:
Jan 23 21:19:42 pve pmxcfs[2853]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/113: opening '/var/lib/rrdcached/db/pve2-vm/113': No such file or directory
I just posted this - maybe it will be useful:
https://forum.proxmox.com/threads/reducing-rrdcached-writes.64473/

20G/day seems like a lot for just rrd data!
 
Ist there any solution to exclude this two services from being started? Otherwise one has to stop them every time the system starts. The two services doesn't seem to be started via init.d
 
They're handled by systemd. You disable them by typing:
Bash:
systemctl stop pve-ha-lrm.service
systemctl stop pve-ha-crm.service
systemctl disable pve-ha-lrm.service
systemctl disable pve-ha-crm.service
 
Still no solution?
My standalone proxmox installation writes 20G every day when idle.

Also is a path to rrdcached directory hardcoded?
I moved /var/lib/rrdcached to another disk and changed paths in /etc/default/rrdcached appropriately, but got errors:
Jan 23 21:19:42 pve pmxcfs[2853]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/113: opening '/var/lib/rrdcached/db/pve2-vm/113': No such file or directory
Did you ever succeed in redirecting rrdcached? Did you restart the service/reboot before you received the error message?
 
Did you ever succeed in redirecting rrdcached? Did you restart the service/reboot before you received the error message?
Yes, i did. I don't know what i'd done exactly, but this worked, But finally i didn't want to fight windmills and decided to install proxmox on a dedicated ssd.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!