Critical failure after update — GRUB broken despite active enterprise subscription

Dear Proxmox team,


I'm writing to express my deep frustration after performing a simple update on one of my Proxmox hosts. The system broke and dropped to GRUB, failing to boot normally. I am using the enterprise repository with an active subscription, precisely to avoid this kind of instability.


It is absolutely unacceptable that a paid, production-grade virtualization platform can fail like this — especially due to a GRUB/EFI issue, which is fundamental to system operation. This kind of failure should never happen on a platform that promotes itself as reliable and enterprise-ready.


This incident has put me at risk of losing critical clients, as they depend on services running on this host. And now I find myself wasting precious time manually recovering a system that should never have failed in the first place.


I would like to know:


  • What exactly caused this issue?
  • What is the official guidance from the Proxmox team to prevent this from happening again?
  • Will this be addressed in upcoming updates?

Please treat this with the seriousness and urgency it deserves.


Sincerely,
Márcio
 
Make an account there, click submit ticket and set to "critical" so you get response quickly and make sure to put all this information or reference this forum post and make sure to include the license key.

You can get to this page yourself from the main proxmox website "Get Help" in the top right and Customer Portal.
 
  • Like
Reactions: Johannes S
You might find it quicker to use the ticket system, no? As you paid for tickets as part of being on enterprise - https://my.proxmox.com/en
Thank you for your reply — I appreciate it.


However, I must strongly emphasize that this kind of failure should never happen in a production environment, especially due to such a basic issue. It's precisely to avoid situations like this that we choose to pay for the enterprise subscription.


Having a system break at the boot level after an official update is not something that should be passed off lightly. Even with access to support tickets, no customer should be put in a position where they're manually recovering from a broken GRUB in the middle of a workday.


I sincerely hope the Proxmox team takes this seriously and works to prevent such regressions in the future.
 
  • Like
Reactions: prmadmax
Hi,

How did you proceed with the update ?

Best regards,
After performing a dist-upgrade on a production Proxmox host (non-ZFS, UEFI boot), the server failed to boot and got stuck at a black GRUB screen. This happened despite being a paying Proxmox Enterprise subscriber, and I must express my frustration: I received absolutely no support from Proxmox, not even basic guidance. For a paid product, this is simply unacceptable.


I had to investigate and fix the problem entirely on my own, risking business downtime and client impact. I'm sharing the solution here in case anyone else faces the same issue, because apparently you're on your own even as a paying customer.

Emergency boot with Super GRUB2 Disk​


Since the system was completely unbootable, I used Super GRUB2 Disk to temporarily boot into the existing Debian/Proxmox installation.


Without it, I would not have been able to access the OS at all.

The issue:​

  • System was using UEFI boot
  • EFI partition existed, but had no GRUB directory
  • proxmox-boot-tool was never initialized, and not enforced or verified during install or updates
  • Update broke boot completely — got stuck in GRUB rescue with no kernels visible


  • # 1. Confirm the system is running in UEFI mode
    ls /sys/firmware/efi && echo "UEFI OK"

    # 2. Identify the EFI System Partition (usually a vfat FS)
    blkid | grep vfat
    # Example output: /dev/nvme0n1p2: UUID="FE4C-C4D8" TYPE="vfat"

    # 3. Register the UUID in the proxmox-boot-tool system
    echo FE4C-C4D8 > /etc/kernel/proxmox-boot-uuids

    # 4. Mount the EFI partition (if not already mounted)
    mkdir -p /boot/efi
    mount /dev/nvme0n1p2 /boot/efi

    # 5. Run initial refresh to generate GRUB structure
    proxmox-boot-tool refresh

    # 6. IMPORTANT: Unmount EFI so the hook system can remount and update properly
    umount /boot/efi

    # 7. Refresh again to complete the bootloader setup
    proxmox-boot-tool refresh

    # 8. Confirm status and EFI boot entry
    proxmox-boot-tool status
    efibootmgr -v
After this, I rebooted — and thankfully, the system booted normally using the newly generated GRUB setup. But again, this should never happen silently, especially in paid enterprise-grade software. There should be a check during installation or updates that validates the EFI setup is compliant with proxmox-boot-tool.


I'm documenting it here to help others — since I got no help when I needed it most.
 
Last edited:
I don't know why your system ran into problems with this particular and relatively low risk update. Do feel free to share your experience and maybe people here can figure out what went wrong for you today. I do feel sorry that this happened to you but please also improve your process to prevent such a stressful situations in the future.

I feels to me like you have some expectations that don't match reality (as I see it):
  • (I'm guessing here that you have 'community support' since you appear not to have used any support ticket and only posted on this forum which is not a good place for urgent help. If I'm wrong and you got no support on your ticket then you should definately find a different support company more local that can actually support you during your business hours!)
    Community support gives you entitlement to support from Proxmox. No. it's the other way around: you pay to support Proxmox. Get a higher tier support subscription with actual support tickets for support from Proxmox based on a SLA.
  • Paying Proxmox or using the enterprise repository ensures that nothing ever goes wrong. No, if there is a bug that you run into then it takes longer for you to get the fix when you use the enterprise repository. Note that nobody else reported problems about this security update today. Things sometimes happen. What if it wasn't the update but broken hardware instead.
  • This forum will give you support from Proxmox. No, this forum will give you, often but not always, help from mostly random volunteers on the internet and sometimes from Proxmox staff. You did not add your subscription to your forum account and Proxmox staff might haved missed your post. Either way, it's mostly people helping each other in their spare time when they feel like it.
Here are some things, that are under your control, that could be improved:
  • Updating a production server while it was in use. You could have planned for maintenance down time and done the update without exposing your customers.
  • Updating a production server. You could have first tested the update on a test server. There is also Proxmox Offline MIrror that can help manage when updates are put to production.
  • Fall-back scenario. Implement a fall-back server in case anything goes wrong and might be out of your control, like a botched Proxmox update or a hardware failure.
  • Don't post on busy forum where your cry for help scroll off the first page very quickly and expect volunteers to immediately fix it for you.
  • Buy support tickets to make sure you can get support (from Proxmox or another company) from experts within a known amount of time.
I did not write this to claim that it was (partly) your fault. It's honest feedback on some things that might help you and others in the future, as problems will always happen regardless.
 
I don't know why your system ran into problems with this particular and relatively low risk update. Do feel free to share your experience and maybe people here can figure out what went wrong for you today. I do feel sorry that this happened to you but please also improve your process to prevent such a stressful situations in the future.

I feels to me like you have some expectations that don't match reality (as I see it):
  • (I'm guessing here that you have 'community support' since you appear not to have used any support ticket and only posted on this forum which is not a good place for urgent help. If I'm wrong and you got no support on your ticket then you should definately find a different support company more local that can actually support you during your business hours!)
    Community support gives you entitlement to support from Proxmox. No. it's the other way around: you pay to support Proxmox. Get a higher tier support subscription with actual support tickets for support from Proxmox based on a SLA.
  • Paying Proxmox or using the enterprise repository ensures that nothing ever goes wrong. No, if there is a bug that you run into then it takes longer for you to get the fix when you use the enterprise repository. Note that nobody else reported problems about this security update today. Things sometimes happen. What if it wasn't the update but broken hardware instead.
  • This forum will give you support from Proxmox. No, this forum will give you, often but not always, help from mostly random volunteers on the internet and sometimes from Proxmox staff. You did not add your subscription to your forum account and Proxmox staff might haved missed your post. Either way, it's mostly people helping each other in their spare time when they feel like it.
Here are some things, that are under your control, that could be improved:
  • Updating a production server while it was in use. You could have planned for maintenance down time and done the update without exposing your customers.
  • Updating a production server. You could have first tested the update on a test server. There is also Proxmox Offline MIrror that can help manage when updates are put to production.
  • Fall-back scenario. Implement a fall-back server in case anything goes wrong and might be out of your control, like a botched Proxmox update or a hardware failure.
  • Don't post on busy forum where your cry for help scroll off the first page very quickly and expect volunteers to immediately fix it for you.
  • Buy support tickets to make sure you can get support (from Proxmox or another company) from experts within a known amount of time.
I did not write this to claim that it was (partly) your fault. It's honest feedback on some things that might help you and others in the future, as problems will always happen regardless.
Thank you for your response.


While I understand the community forum is not an official support channel, I must express my disappointment. I do have a paid enterprise subscription, and the fact that such a critical and disruptive issue occurred from a security update — supposedly tested and vetted — is deeply concerning.


I understand that problems can happen, but boot failure due to a simple dist-upgrade should never occur in a paid, production-grade virtualization platform. That's exactly why we pay for stability — not just support tickets.


I did not open a ticket because the issue was urgent and needed immediate resolution, and I expected a certain level of safety from the enterprise repository itself — which turned out not to be the case this time. I had to recover everything myself, and I’ve lost valuable time and risked client trust in the process.


Yes, I will review my update and fallback strategies, as suggested. But this does not change the fact that Proxmox should not allow a boot-breaking update to be shipped via enterprise repos, period.


I’m not blaming the community — I appreciate volunteers. But I’m genuinely upset that Proxmox does not seem to acknowledge the gravity of this failure for paying customers relying on a production system.
 
@leesteken

Thanks again for your colorful reply.


You seem more concerned with defending Proxmox at all costs than acknowledging a very real and serious failure that affected a paying customer. It’s becoming obvious you're just another brand loyalist who can't stand when someone criticizes something you personally like — even when the criticism is legitimate.


Let me remind you — this forum exists for users to report problems and express dissatisfaction, which is exactly what I did. I don’t need your approval to be frustrated. I’m well within my rights, as a paying customer, to complain when a simple security update from the enterprise repository breaks GRUB and takes a production host offline.


Did I say anything about how many hosts I run?
Did I ask for your judgment on my backup or redundancy policies?


No.
So maybe don’t make assumptions about my infrastructure just to try and shift blame away from Proxmox.


You keep asking what kind of subscription I have, as if the validity of my complaint depends on that. It doesn't.
The bottom line is: no paid virtualization platform should ever ship an update through its stable channel that breaks the boot process. Period.


If you think that's acceptable, that says more about your standards than mine.


I came here to share a serious issue — something that could affect others — and your passive-aggressive "advice" comes off more like a lecture designed to shut down criticism. Maybe next time, focus on helping or just move along instead of tone-policing others.


And no — I’m not “ranting.” I’m holding a vendor accountable for a mistake. That’s what responsible professionals do.

And just to be clear: I love Proxmox. It's my preferred virtualization platform and I use it every day, recommend it to clients, and advocate for it.
But when something this serious happens, I will speak up — and I expect it to be taken seriously, not brushed aside.
 
FYI: We discussed this shortly internal as another dev asked about the changes to GRUB being in any way a possible cause here, which we ruled out as is as close to impossible as it gets, as the recent GRUB update really just dropped the NTFS module from being preloaded, so that a security check in lock down mode (secure boot) cannot be circumvented. As nothing changed on how GRUB is installed, at which stage this hangs, it's close to impossible that the update itself is problematic. And that's just to never say never, albeit if this would be a generic problem, we would have dozens if not more threads here.

Anyhow, this seems to me rather like a broken storage where the grub image cannot be fully written out.

First thing to check would be if the grub–or some other–process hangs (D state), e.g. using top as that would confirm that the update hangs in the IO path, meaning likely some broken storage (hw or software). Also check the journal system log for any errors around the time this got stuck. This needs more info to be solved.

And yes, while we take reports very seriously, if we can be quite sure that this is not a general issue–which we were here, especially as we looked at those (one line) grub change very closely–we do not put a high priority on community forums threads. The enterprise support is really the only channel where we actually have guaranteed response times governed by the subscription agreement, we communicate that very clearly and transparent.
 
Last edited:
FYI: We discussed this shortly internal as another dev asked about the changes to GRUB being in any way a possible cause here, which we ruled out as is as close to impossible as it gets, as the recent GRUB update really just dropped the NTFS module from being preloaded, so that a security check in lock down mode (secure boot) cannot be circumvented. As nothing changed on how GRUB is installed, at which stage this hangs, it's close to impossible that the update itself is problematic. And that's just to never say never, albeit if this would be a generic problem, we would have dozens if not more threads here.

Anyhow, this seems to me rather like a broken storage where the grub image cannot be fully written out.

First thing to check would be if the grub–or some other–process hangs (D state), e.g. using top as that would confirm that the update hangs in the IO path, meaning likely some broken storage (hw or software). Also check the journal system log for any errors around the time this got stuck. This needs more info to be solved.

And yes, while we take reports very seriously, if we can be quite sure that this is not a general issue–which we were here, especially as we looked at those (one line) grub change very closely–we do not put a high priority on community forums threads. The enterprise support is really the only channel where we actually have guaranteed response times governed by the subscription agreement, we communicate that very clearly and transparent.

Thank you for the clarification — that’s already more helpful than the tone some others have taken here.


I understand that from your internal analysis, the GRUB update appears to be safe and minimal, and I agree with the general principle: “never say never,” even if it’s unlikely.


However, I can assure you that this host was running perfectly fine before the update, and only after the apt dist-upgrade (which included GRUB updates), the system refused to boot. No hardware alerts, no previous signs of I/O failure, and the system recovered completely once I manually booted using Super GRUB and reinstalled GRUB. If the disk were truly damaged, I doubt the recovery would’ve been so clean.


So even if this wasn’t caused directly by the GRUB package content, something during the upgrade process definitely triggered the issue, and that in itself is worth investigating, even if it's an edge case.


I also understand that official tickets are the preferred support channel — but again, this happened suddenly on a production system, and the forum was the only place I could go for immediate insight. If enterprise support truly wants better adoption, maybe consider adding some kind of monitoring or triage even in the community space when critical topics arise.


Thanks again for at least engaging seriously on the technical side — it’s appreciated.
 
This host was running in UEFI mode with Secure Boot enabled.

That’s precisely the kind of environment affected by the recent GRUB changes — especially since the update removed the NTFS module to prevent Secure Boot circumvention.

What concerns me the most is the behavior of proxmox-boot-tool in this context.
There was no clear indication during the update that something went wrong. If proxmox-boot-tool refresh failed silently (or didn’t run properly under Secure Boot), that’s a huge blind spot. A boot failure after a package update from the enterprise repo — without any warnings — is something that should never happen.

So even if the GRUB package itself was minimal, the interaction between it, Secure Boot, and Proxmox’s boot tooling might still present critical edge cases.

This deserves deeper investigation, not just dismissal. Thanks again for responding — this kind of technical engagement is appreciated.
 
and only after the apt dist-upgrade (which included GRUB updates), the system refused to boot.
You provided almost no actual useful information, but from it seems the upgrade hung, was interrupted and rebooted, grub was not correctly update due to that and the system failed to boot. Use a Proxmox VE ISO's rescue boot option, a live system to repair this.

That’s precisely the kind of environment affected by the recent GRUB changes — especially since the update removed the NTFS module to prevent Secure Boot circumvention.
Do you boot from NTFS? Else it is not precisely affected by recent GRUB changes, please stop trying to blame the GRUB update and start providing some actual relevant details of what happened.

This deserves deeper investigation, not just dismissal. Thanks again for responding — this kind of technical engagement is appreciated.
Start by providing more details if you want help here... Providing the /var/log/apt/history.log* and /var/log/apt/term.log* files and the system log from around the time the update was done would be a simple start to shed some light on what really happened. Then post hardware details and what root filestem/storage is used, if you got a system where the update is still hung is affected check for hanging processes.

And if you're nodes have a valid subscription for those hosts eligible for enterprise support our enterprise support will gladly take a look, they can help to get the relevant logs and data faster.
 
You provided almost no actual useful information, but from it seems the upgrade hung, was interrupted and rebooted, grub was not correctly update due to that and the system failed to boot. Use a Proxmox VE ISO's rescue boot option, a live system to repair this.


Do you boot from NTFS? Else it is not precisely affected by recent GRUB changes, please stop trying to blame the GRUB update and start providing some actual relevant details of what happened.


Start by providing more details if you want help here... Providing the /var/log/apt/history.log* and /var/log/apt/term.log* files and the system log from around the time the update was done would be a simple start to shed some light on what really happened. Then post hardware details and what root filestem/storage is used, if you got a system where the update is still hung is affected check for hanging processes.

And if you're nodes have a valid subscription for those hosts eligible for enterprise support our enterprise support will gladly take a look, they can help to get the relevant logs and data faster.

I appreciate your response, but I think you're misunderstanding my intention here.


I'm not "trying to blame" the GRUB update blindly — I'm highlighting that the combination of a GRUB update, Secure Boot, UEFI, and the proxmox-boot-tool mechanism deserves attention, especially when it results in a broken boot without any warning during a supposedly stable, enterprise-grade update process.


No, I don’t boot from NTFS — that’s exactly the point. If the update removed the NTFS module to address Secure Boot bypass, and my system had Secure Boot enabled, then the interaction between the GRUB update and my environment is technically relevant, whether or not NTFS is directly involved.


Regarding the update being "interrupted":
There were no signs that the update failed or hung during the process. The apt dist-upgrade completed cleanly with no errors. If proxmox-boot-tool refresh failed silently or didn’t trigger correctly, that’s a serious blind spot — not a user error.


I’ve already recovered the system using a rescue boot (Super GRUB in this case), and yes, I can gather the logs you requested — but honestly, the tone of this exchange is starting to feel more hostile than constructive.


If this were a common mistake on my part, I wouldn’t be the only one reporting a boot failure right after a GRUB update on an enterprise host. But I am — which is precisely why I raised the issue here: not to rant, but to help raise awareness about a potential edge case that could hit others too.


If you're genuinely interested in investigating, I’ll be happy to provide detailed logs and hardware specs. If the goal is just to deflect and assign blame, then I’ll escalate through enterprise support instead, where hopefully the tone is more professional.


Let me know how you'd prefer to proceed.
 
yes, please provide the /var/log/apt/term.log part covering that particular upgrade, and the journal output covering the subsequent reboot.