Any way to make proxmox check if ACME cert renewal needed on startup?

lriley06

New Member
Aug 17, 2024
7
1
3
I am using my selfhosted smallstep server to issue certificates for everything in my homelab. By design, the certificates are short-lived (only 24 hours). I have managed to request the certificate just fine via proxmox, and the auto renewal process seems to work fine.

However, when the proxmox node is first started up after a few days of being powered off, the certificate is (of course) invalid. It seems that proxmox does not check to see if the certificate is invalid when it boots up, meaning that it has to wait for the daily cronjob to run again and then renew the certificate (could take multiple hours!). My current workaround is just disabling the browser security warnings and manually renewing the certificates, although this obviously isn't ideal!

Is there anything that I could do to change this? I'm quite new to proxmox but if anyone thinks it would be worth submitting a feature request for this then I could maybe do that?

Thanks in advance!
 
You are talking about cronjob, so could you add a "@reboot" cronjob to execute just after reboot?

Systemd timers have similar possibility to run the task after reboot.
 
I was thinking along these lines too! Only downside is I would prefer to avoid modifying things under the hood and creating a non-standard environment at this stage! I guess this is all the developers would change anyway though, so I might make a PR with these changes once I’ve worked it out!
 
So how does this coincide with:

Who setup that cronjob?

I'm not familiar with smallstep, so maybe I'm missing something here.
My mistake, I somehow remembered reading a docs page saying that the built in ACME certificate auto update was handled by a cron job! However, it clearly says " the certificate will be automatically renewed by the pve-daily-update.service"!
 
  • Like
Reactions: gfngfn256
I have found the issue though. This "pve-daily-update.service" is is triggered by "pve-daily-update.timer"

Contents of /etc/systemd/system/timers.target.wants/pve-daily-update.timer:
Code:
[Unit]
Description=Daily PVE download activities

[Timer]
OnCalendar=*-*-* 1:00
RandomizedDelaySec=5h
Persistent=true

[Install]
WantedBy=timers.target

The Persistent=true line would make this service work perfectly for my use case (ie: if the system is started and the pve-daily-update.timer timer would've been triggered during the time since the system was last shut down, then the pve-daily-update.timer timer would be triggered immediately).
However, this RandomizedDelaySec line being set to 5h means that even though systemd can see that pve-daily-update.timer needs to be triggered, this can be delayed by any time up to 5 hours! This means that you could wait for up to 5 hours after the system boots for the ACME certificates to be renewed if needed. Obviously this behaviour isn't immediately obvious to the person writing the systemd unit file, and seems that a systemd issue has even been created here about this behaviour: https://github.com/systemd/systemd/issues/21166

In any case, this RandomizedDelaySec line obviously doesn't work for tasks that actually need to be run immediately after boot, like ACME renewal. I will try and work out how to submit a bug request!

Thanks so much everyone for your help :)
 
Since usually things that happen daily, like updates, aren't that time-sensitive (and are even preferably on random times to not get "bursts" of network-traffic usage), that random delay is usually fine.
For a specific use-case in my case I also didn't want that delay to be so large, so what I did was run the below command and remove the "h" from RandomizedDelaySec (and change the start-time), so that it is 5 seconds instead of 5 hours, and the below method will stay intact during system-updates (as confirmed by Proxmox-staff [1]).
Code:
systemctl edit --full pve-daily-update.timer

[1] https://forum.proxmox.com/threads/proxmox-5-pve-daily-update-service-edit-cronjob.45993/#post-218608
 
I have done a similar temporary workaround for this, but I would imagine it’s still an issue for anyone using ACME though. Even if you are using Letsencrypt certs (comes with a validity of 9 months rather than 24 hours in my case), the exact same issue would still happen if the server was left off for 9 months. Sure it’s much less likely but still a bug affecting all users of the ACME feature.
 
To counter that argument:
If I leave ANY system off for 9 months, I'm always expecting some possible issues, as well as slow-downs while it is downloading and installing all the updates and the like.
Since the update-task runs between 1:00 and 6:00 local time, that time is when people usually not work, so for most things that's fine (including the 1-6 hours that the certificate is still encrypting just not verifying the connection).
Also, without a valid certificate, 99% of the time everything will still work, they might just get a warning about it being expired, except if specifically configured to verify the certificate.

That said though, you're always free to enter it as a bug (even though my personal opinion is that it is not a bug), but I thought to at least share a working and supported solution in case someone else finds this post.


EDIT: Just to clarify: I meant a warning in the logs of AUTOMATIC systems that do run 24/7, not for manual user processes.
I would say that if you use these short-running certificates, you'll probably also might need some other changes from the default, as a lot of people probably will still use the 9-month to a year type of certificates for the foreseeable future, and 6 hours (max) of verification-warnings during less-productive hours out of 6,5K hours (9 months) of running time (or even less if the certificate is renewed 1 day before it expires at the latest) will be good enough for most people.
 
Last edited:
Thanks so much for your help! I’m aware it’s a small thing but I definitely think it would be good to have this fixed, especially as short-lived PKI is becoming more popular for security reasons. Also, training users to “just bypass the security warning page, this is a known issue” probably isn’t good practice and I’m sure there are all kinds of examples of this happening in other scenarios and later having bad outcomes! Certainly in this example if 999999 times out of 1000000 the user clicks “ignore” on the invalid certificate warning, they definitely will not notice the difference in the 1 time out of 1000000 when the certificate is stolen by by an attacker and used for whatever nefarious purpose :)
 
I ran into the same exact problem with expired ACME certificates. Like you, I often turn my home-lab off for several days, and I have a SmallStep CA running in an LXC on my Proxmox server.

I was looking for a specific link this evening while cleaning up my SmallStep CA install notes, and came across this thread in a search.

Figure I might as well add my full solution to the post..

First, I made my SmallStep LXC start on boot with a "Start/Shut Down Order" of 1.
Next, I made the pve-daily-update.service run 1 minute after the node boots up if it missed the 1am timer.

As previously mentioned an override for systemd services needs to be created so the config changes persist through updates. I had been using an override.conf created with "systemctl edit pve-daily-update.timer", but I just switched to using the --full option referenced above. Thanks sw-omit!

I changed the "RandomDelaySec" to "60" and set "FixedRandomDelay" to "true"

systemctl edit --full pve-daily-update.timer
/etc/systemd/system/pve-daily-update.timer
Code:
[Unit]
Description=Daily PVE download activities

[Timer]
OnCalendar=*-*-* 1:00
RandomDelaySec=60
FixedRandomDelay=true
Persistent=true

[Install]
WantedBy=timers.target

I also needed to update the SmallStep ACME provisioner to allow certificate renewal after expiration. SmallStep explicitly advises against this, but I don't really think its much of a risk for an internal homelab server.

$STEPPATH/config/ca.json
JSON:
            {
                "type": "ACME",
                "name": "acme",
                "forceCN": true,
                "claims": {
                    "enableSSHCA": true,
                    "disableRenewal": false,
                    "allowRenewalAfterExpiry": true,
                    "disableSmallstepExtensions": false,
                    "dnsnames": ["myinternaldomain.com", "*.myinternaldomain.com"]
                },
                "options": {
                    "x509": {},
                    "ssh": {}
                }
            }

Honestly though.. I've since made the ACME certs last 30 days as 24hrs was a bit aggressive. I also use a SmallStep ACME cert for my bare-metal opnsense. I found I needed to access the router web ui even though the proxmox server had been off for a few days.

That can be achieved by adding the following to the claims section of the ca.json
JSON:
                    "minTLSCertDuration": "24h",
                    "maxTLSCertDuration": "720h",
                    "defaultTLSCertDuration": "720h",


Sources:
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_certs_acme_automatic_renewal
https://www.freedesktop.org/software/systemd/man/latest/systemd.timer.html
https://smallstep.com/docs/step-ca/...r-expiry-for-intermittently-connected-devices
https://smallstep.com/docs/step-ca/configuration/#configuration-options
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!