PVE Logging

Gabgobie · Mar 30, 2024

Good evening everybody,

first of all I would like to thank the you for this awesome project.

Since I have just experienced a boot drive failure caused by the write load, I have been looking into ways to reduce the load on my drives. While doing so, I've seen that a lot of the write load is produced by the logs. Therefore I am looking into disabling logging to disk and using a dedicated log aggregation VM with its own drive for that. Maybe we can turn this thread into a tutorial in the long term. For now my intention is to document the path I am taking in this regard.

So far, there are some unknowns to me here. Btw I am on PVE8.1.10

- How do I disable all logging to disk in PVE?
Taking a look at the wiki, the logging service should be rsyslogd but there is only one file in /etc/rsyslog.d/ (which is postfix.conf) and everything is pointing me to think that it is actually journald. After setting Storage=none in /etc/systemd/journald.conf, no logs show up in the GUI which would be expected and when calling journalctl I get a message telling me that there are no logs available.

Bash:

root@pve1:~# journalctl
No journal files were found.
-- No entries --

So far so good.

But taking a look at the ls /var/log output there are still files being generated and if I delete the folder I won't have access to the WebUI as pveproxy will fail if its access.log file doesn't exist so it seems to circumvent the journald log. After restoring the /var/log/pveproxy/access.log directory and file (including permissions), it will work fine again.

Bash:

root@pve1:~# ls /var/log
alternatives.log  btmp  chrony  ifupdown2  lastlog  private  pve  pve-firewall.log  pveproxy  README  wtmp

Am I right in suspecting that they are circumventing journald and how would I get them to stop logging to disk/use journald so I can send those logs to the aggregation system?

- What options are there for me to use for log aggregation?
From the PVE GUI I can see that InfluxDB and Graphite are supported for metrics out of the box but does that include logs? Are they still going to work when journald's storage setting is set to none or would I have to set the server to forward logs to in the journald.conf?

Best,
Gab

Gabgobie · Mar 31, 2024

Further reading has brought me to this thread which has an answer to the pveproxy log issue although it hasn't gotten much attention.

It suggests symlinking /var/log/pveproxy/access.log to /dev/null which seems to work to get rid of it writing to disk although I can't say I am a big fan of the solution as it would also prevent me from getting these logs to my log aggregation system. I would much prefer if pveproxy would log to journald so there is a unified way of getting logs.

One could also move it to RAM by using the following /etc/fstab entry: tmpfs /var/log/pveproxy/ tmpfs defaults,uid=33,gid=33,size=1024m 0 0, however this would still defeat the purpose of a unified logging system.
I also expect the above-mentioned method to cause issues as the log of pveproxy can be quite a lot and I don't know how it handels being out of space. Going by the thread I linked in the first line, I would expect for space to run out within the first 24h.

Are there any good reasons I am missing for not having pveproxy log to the standard logging service? The same goes for any other services that still appear in /var/log.

Gabgobie · Apr 2, 2024

I'm back.

DISCLAIMERS:
This is my first time looking at Perl code and everything I think to know about Perl is from websearches I did to understand the behavior of the code I was inspecting.
I haven't had the time to test what happens in case the log files run out of space. It could potentially brick parts of the system in unforseen ways until the next reboot which would clear the dedicated log storage.

For pveproxy, taking a look at the source, the log file is hard-coded in line 106 to be written to /var/log/pveproxy/access.log and this file is automatically rotated. It is used for both pveproxy and spiceproxy.

As it seems pveproxy is already using the syslog function from the PVE::SafeSyslog package for some of its log messages. I wasn't able to find the lines that actually log to the access.log file in a reasonable amount of time.

Thinking about the reason for the access logging to bypass the syslog, the only good one I can come up with would be that there is no log facility intended for this use.

Going on the the part where I try to solve my issue:

1. move the access log into a tmpfs and adjust the log rotation to avoid excessive memory usage

add tmpfs /var/log/pveproxy/ tmpfs defaults,uid=33,gid=33,size=1024m 0 0 to your /etc/fstab.
- The suggested line would allow for 1G of memory to be allocated to your access.log. Adjust the size as needed.
- uid=33 and gid=33 are www-data. Without them pveproxy won't be able to access the log file.
adjust /etc/logrotate.d/pve to your needs. In my case to no longer keep old versions of the access log to conserve memory
- rotate
  - rotate 7 the default setting
  - rotate 0 don't keep rotated versions (my choice, although I'd say it would be reasonable to use 1 instead)
  - rotate -1 keep all rotated versions
- size <size> (mutually exclusive with time values -> repace dailyor whichever other value is configured)
  - size 100: 100 bytes
  - size 100k: 100 kilobytes
  - size 100M: 100 Megabytes
  - size 100G: 100 Gigabytes
- Alternatively to size you could also go for a combination of frequency and maxsize <size> (my preference)

2. move the rest of the log files to memory

add tmpfs /var/log/ tmpfs defaults,uid=0,gid=0,size=1024m 0 0 to your /etc/fstab.
- The suggested line would allow for 1G of memory to be allocated for all of your other logfiles.
look through /etc/logrotate.d/and adjust the files to your needs
- If I am not mistaken, /etc/logrotate.conf will set the default values
- Take the possible settings from the manpage

At this point I would expect that no more logfiles touch your disk. So far so good. The next goal in line would be to reduce the memory impact.

In the previous steps we already went through the log rotation settings. You can increase the frequency at which logrotate checks if the size you specified has been reached by moving the file mv /etc/cron.daily/logrotate /etc/cron.hourly/logrotate or by removing the file rm /etc/cron.daily/logrotate and instead adding your own line in crontab with a custom interval, be aware though that the file you are giving up for this includes error handling and informs you of any abnormal behavior. It is up to you to determine if you need that.

The logging can be further optimized but at this point we are going into detail so I will try to keep this short.
The following will provide us with a list of applications to take a look at in regards to logging.

Code:

root@pve:~# ls /var/log
alternatives.log  btmp  chrony  ifupdown2  lastlog  private  pve  pve-firewall.log  pveproxy  README  wtmp

alternatives.log: Not worth the effort to me
btmp: From my understanding logging failed logins
chrony/: For me this is currently an empty directory. Not worth going after
ifupdown2/: I'm pretty sure this is needed for Proxmox to work properly. Alternative versions of your interface settings are saved here. I'd recommend not touching this.
lastlog: I can only assume the contents from the name
private/: In my case an empty directory
pve/: There's only the tasks folder in here. It's needed for the UI to display properly.
pve-firewall.log: Logging behavior can be influenced with the appropriate values in
- /etc/pve/firewall/cluster.fw
  - log_ratelimit
- /etc/pve/nodes/<nodename>/host.fw
  - log_level_in
  - log_level_out
  - log_nf_conntrack
- /etc/pve/firewall/<VMID>.fw
  - log_level_in
  - log_level_out
wtmp

I have yet to configure CEPH but from what I read, by default it's logging to /var/log/ceph. It can however be configured to log to log to syslog instead.

This was quite a journey and I'd be happy to read any additions to this post.

I think at this point this would be ripe to add the Tutorial Prefix. Below I'll link other sources I found for reducing the write load on your main PVE disks.

ZFS Write Amplification
(Edit in additional sources if suggested or found)

Best,
Gab

VictorSTS · Apr 2, 2024

Why not just using proper enterprise disks in a mirror instead of reinventing the logging wheel?
Why not just monitor the wearout of the drives and replace them when needed?

Most of the changes you are doing are hard to maintain. Any update may overwrite your /etc/logrotate.d changes, there might be changes in the logging system used by any PVE package, there may be some new package that again logs to /var/log...

IMHO, the problem isn't that PVE logs too much but that you need proper hardware for PVE.

Gabgobie · Apr 2, 2024

TL;DR:

Why not just using proper enterprise disks in a mirror instead of reinventing the logging wheel?

Expensive and not feasible for home use.

Why not just monitor the wearout of the drives and replace them when needed?

This should be done anyways but why cause more wearout than necessary?

Most of the changes you are doing are hard to maintain.

Agreed. This is an issue I didn't think about.

the problem isn't that PVE logs too much

Agreed. I seem to have been unclear about the issue I take with the current approach.

you need proper hardware for PVE

Disagreed. PVE could run on just about anything. Log generation shouldn't be the reason you need to spend more on hardware.

L;R:
I have to disaggree on some of that.

I will always think of unnecessary wear as an issue. Mirroring the drives (which I do) doubles the wear.

Enterprise hardware is expensive. Why pay the premium when it is very much not necessary. In this case the only thing necessitating the use of expensive hardware is the durability against logging.

I agree that the changes can be hard to maintain. This is an aspect I didn't think about. Although the only time this would become an issue is if one of these new applications were to generate so much log data that the logs no longer fit within the boundary I set for the tmpfs which has replaced /var/log.

the problem isn't that PVE logs too much

I may have been unclear if you think that my issue is the amount of logging from PVE and I can see how it came to that. The more I went over this the more I realised that I won't be able to have everything log to the syslog instead of files, which is what I am actually taking issue with. Over the course of writing this post my attention shifted towards protecting my disks over trying to force everything into the syslog some way or another. I appreciate extensive logging. I also appreciate a unified approach. The optimal case for me would be to have everything log to syslog which in turn pushes the logs to Grafana Loki or any other log aggregation server.

I appreciate you sharing your views on the matter! I believe nothing can be accomplished/improved/learned without discussion.

Best,
Gab

Search

Search

PVE Logging

Gabgobie

New Member

Gabgobie

New Member

Gabgobie

New Member

VictorSTS

Famous Member

Gabgobie

New Member