[Community project] Proximo — an open-source, least-privilege MCP/API layer for managing PVE with an AI agent (feedback wanted)

broadway

New Member
Jun 20, 2026
12
0
1
Hi all — sharing an open-source side project for feedback from people who actually run Proxmox in anger. It's a community project, not affiliated with Proxmox, and this isn't a support request — I'm after criticism of the approach.

What it is: Proximo is a small server that lets an AI agent (or any MCP/A2A client) manage a PVE cluster through the REST API with a scoped API token. It does not touch the hypervisor directly and is API-only by default. Safety was the design priority, because handing an LLM access to a hypervisor is obviously risky.

What that looks like in practice:

- Dry-run before every mutation. Each change first returns a preview: the exact operation, the guest's current state, and a computed "blast radius." Before deleting/disabling a storage, for instance, it reads the cluster and lists the specific guests that would lose a disk (and which won't boot vs. just degrade). The mutation can't run without that plan having been generated first.
- Tamper-evident audit log (hash-chained), kept locally — a verifiable record of what was planned and confirmed.
- Auto-snapshot before risky changes, with one-call rollback, wherever the storage supports snapshots.
- Scoped token, least privilege. I run it day-to-day against a PVEAuditor-style read-only token; mutations are refused at the API level unless the token is actually granted them. The token is never logged.
- In-container exec is opt-in and loud. The REST API has no exec-in-LXC endpoint, so that path goes over ssh -> pct exec; it's off by default, gated by a fail-closed CTID allowlist, and it warns that it grants near-root.

Maturity — stated plainly: brand new (v0.6.0), no real-world adoption yet. 145 tools and a large test suite, but a good portion of that surface still runs against mocks. What I've exercised against a real PVE 9.2 API (a single node plus a nested 3-node test cluster): the core lifecycle + the governance/dangerous plane (roles/groups/users/ACLs, storage, SDN/network, realms), offline guest migration, and HA-rule config — full create/read/delete cycles. I have not validated real HA fencing (needs a hardware watchdog), online live-migration (needs shared storage), or anything at production scale. SDN/network apply I deliberately never fire on a live host — it's unrecoverable. I'm also not claiming it's the first or only safety-minded Proxmox tool; there are others with real trust mechanisms. This is just the approach I landed on and would like critiqued.

Install (runs on your machine, on demand — no daemon, no open port):

uvx proximo-proxmox (or: pip install proximo-proxmox)

Source + docs (Apache-2.0): https://github.com/john-broadway/proximo

What I'd most value from this community: where does the trust model fall down on a real cluster? Are the blast-radius assumptions wrong for setups I haven't seen (shared storage, unusual boot configs, HA edge cases)? Is the token/permission posture sane? I'd rather hear it here than after someone points it at production.

(Full disclosure, since it's relevant: it's a human+AI project — I drove the design; an AI coding partner did much of the implementation, credited per-commit in the repo. Noting it because some folks rightly want to know.)

Thanks for any time you spend kicking the tires.
 
Thanks for sharing your project here. I appreciate that you openly state this is a human+AI hybrid project. However, looking closely at the concept, the GitHub profile, and the architectural choice, I see three potential scenarios here—all of which act as major red flags that keep me from deploying or testing this on any of my systems:

1. The "Sock-Puppet" / Social Engineering Angle
Aside from your very fresh GitHub profile, there seems to be virtually zero digital footprint or verifiable history for your identity in the tech community. With all due respect, your highly emotional/sympathetic bio ("100% service-connected disabled veteran") combined with a sudden appearance out of nowhere triggers classic social engineering alerts. It positions a tool at one of the most critical access points (PVE API) while shielding the author from critical scrutiny through a bulletproof personal narrative.

2. The "Vibe Coding" Enthusiast
Even if you are indeed a veteran IT professional who just discovered the powers of modern LLMs (like Claude), the lack of any prior history is highly unusual for a 35-year track record. Even if this is genuine, pure enthusiasm for AI-driven development at a hypervisor level is deeply concerning. While I respect your enthusiasm, I am definitely not "kicking the tires" on a core infrastructure tool that was rapidly spun up by a generative AI.

3. The Conceptual Risk of AI at the Hypervisor Level
Let's assume your intentions are 100% genuine and you are trying to solve exactly what you described. Even then, building an MCP/API layer designed to hand over autonomous or semi-autonomous control of a Proxmox VE API to an AI agent is conceptually high-risk. Anyone with decades of IT experience knows how dangerously subtle hallucinations, race conditions, and edge cases are in multi-threading or complex API scenarios. Entrusting a non-deterministic AI agent with deep PVE system rights—even under the guise of a "least-privilege" layer—is an architecture I strongly advise against.

For these reasons, I will sit this one out. I recommend everyone to exercise extreme caution before pointing an AI-generated API-layer at their cluster.
 
Thanks for sharing your project here. I appreciate that you openly state this is a human+AI hybrid project. However, looking closely at the concept, the GitHub profile, and the architectural choice, I see three potential scenarios here—all of which act as major red flags that keep me from deploying or testing this on any of my systems:

1. The "Sock-Puppet" / Social Engineering Angle
Aside from your very fresh GitHub profile, there seems to be virtually zero digital footprint or verifiable history for your identity in the tech community. With all due respect, your highly emotional/sympathetic bio ("100% service-connected disabled veteran") combined with a sudden appearance out of nowhere triggers classic social engineering alerts. It positions a tool at one of the most critical access points (PVE API) while shielding the author from critical scrutiny through a bulletproof personal narrative.

2. The "Vibe Coding" Enthusiast
Even if you are indeed a veteran IT professional who just discovered the powers of modern LLMs (like Claude), the lack of any prior history is highly unusual for a 35-year track record. Even if this is genuine, pure enthusiasm for AI-driven development at a hypervisor level is deeply concerning. While I respect your enthusiasm, I am definitely not "kicking the tires" on a core infrastructure tool that was rapidly spun up by a generative AI.

3. The Conceptual Risk of AI at the Hypervisor Level
Let's assume your intentions are 100% genuine and you are trying to solve exactly what you described. Even then, building an MCP/API layer designed to hand over autonomous or semi-autonomous control of a Proxmox VE API to an AI agent is conceptually high-risk. Anyone with decades of IT experience knows how dangerously subtle hallucinations, race conditions, and edge cases are in multi-threading or complex API scenarios. Entrusting a non-deterministic AI agent with deep PVE system rights—even under the guise of a "least-privilege" layer—is an architecture I strongly advise against.

For these reasons, I will sit this one out. I recommend everyone to exercise extreme caution before pointing an AI-generated API-layer at their cluster.
1. Ive been out of the "Professional World" since the 2000.. I semi-retire from dot com. Yes, I have mental health issues that really made me afraid to really put myself out there ever since.

2. I run dual x3650's here at my home lab, yes I'm a crazy like that. Ive used proxmox for years and I try to give back when i see things and my passions flow. Rapidly, ive been building this tool for months, loosely across multiple toolings, mcp's et all.

3. Agreed completely hence why I took the approach I have. I truly understand you, but i also know where we are heading and if we dont figure out serious governance and rails, we will be outside of the game. Single points of functions across many smaller llm/agent transactions matters.

Appreciate the pushback — it's the right instinct, and it's the whole reason the project exists.

I'm not asking anyone to trust the agent; I'm trying to make the agent's every move provable and reversible so you don't have to.

Concretely, since I posted (now v0.7.2): every dangerous op is plan-first (you see the blast radius before anything runs — an agent can't fumble into a destroy), reversible ops, snapshot first, and every action lands in a keyed, hash-chained ledger.

Here's a 25-second demo that needs nothing but pip install proximo-proxmox — no Proxmox — showing the audit trail catch a tampering attempt at the exact line, and catch a tail-truncation when you pin the head off-box: https://asciinema.org/a/a8pZZBC9hqG4hObu

The non-determinism concern is fair and I'm not hand-waving it: the point isn't "the AI won't make mistakes," it's "when it does, you have a tamper-evident record and a snapshot to roll back to."

Still early, still want the holes poked.

Repo: https://github.com/john-broadway/proximo
 
I'm not interested in using an AI agent at all on my hosts because my data is too important for me: https://www.euronews.com/next/2026/...e-database-in-9-seconds-then-wrote-an-apology

In other words: I don't trust AI, I don't trust ai coded tools and I don't trust people using them


That story is the exact nightmare — and honestly, it's why this exists. The reason an agent can nuke a database in 9 seconds is that nothing stands between "the model decided to" and "it happened." No plan, no snapshot, no record — just an apology afterward.

Proximo is built so that specific thing can't happen:
- A destructive op doesn't just execute. The agent gets a PLAN back — the blast radius — and a hard stop. A human confirms separately. An agent literally cannot fumble into the delete. (The live demo is exactly this: the agent asks to delete a guest, and it gets a refusal + a plan, not a deletion.)

- Reversible changes snapshot first where the platform can — so the rollback point is taken before the mistake, not wished for after.

- Every action lands on a tamper-evident, hash-chained ledger. There's no "wrote an apology" — there's a receipt you can't quietly edit.

To be clear: I'm not asking you to trust the AI. I don't either. The whole design assumes the agent will eventually do something dumb, and makes that dumb thing visible and reversible instead of silent and final. "Don't trust the agent — trust the receipts" is the entire pitch.

Completely fair if it's still not for you. Appreciate you raising that case — it's the one this was built to answer.
 
Last edited:
The thing is: I don't trust ai to follow any guard rails or safety rules. So: Your answer doesn't change anything. Your answer to @meyergru concerns is basically "trust me bro" which also doesn't help your case. W
 
  • Like
Reactions: meyergru
Yes, I have mental health issues
Not trying to be a dick, but I would never run software maintained by a single person with mental health issues.

If you have mental health issues, I would highly recommend not using AI. Several people I know, this has lead to a extremly negative downturn. One even "tuned" his model with his therapist, which I think is completly insane and unprofessional. I don't know what it is that attracts people with mental health issues to LLMs. Maybe it is the constant positive feedback, which is IMHO not healty even for people without mental health issues.

But back to topic. Let us not put the cart before the horse. Why should I even want to manage PVE with an AI agent? Proxmox is just a hypervisor. You spend basically no time in it. So why should I manage (which I barely manage at all) Proxmox with AI?
 
Last edited:
The thing is: I don't trust ai to follow any guard rails or safety rules. So: Your answer doesn't change anything. Your answer to @meyergru concerns is basically "trust me bro" which also doesn't help your case. W

Fair — "trust me" shouldn't be the answer, and you're right to push on it. So don't trust it.

The guardrails aren't the AI's to follow. The boundary is the PVE token: Proximo runs read-only by default and can't exceed the RBAC grants on the token you mint it — Proxmox enforces that, not the agent's good behavior. Hand it a read-only token and "the AI ignored the rules" doesn't change a single thing it's able to do. Every mutation also requires a dry-run plan first and lands in a tamper-evident log you can check after.

So the model isn't "trust the AI." It's "scope it with a token, and verify what it did." If you don't trust it — good, don't. Give it a read-only token and make it prove itself on diagnosis before it's allowed to touch anything.
 
Not trying to be a dick, but I would never run software maintained by a single person with mental health issues.

If you have mental health issues, I would highly recommend not using AI. Several people I know, this has lead to a extremly negative downturn. One even "tuned" his model with his therapist, which I think is completly insane and unprofessional. I don't know what it is that attracts people with mental health issues to LLMs. Maybe it is the constant positive feedback, which is IMHO not healty even for people without mental health issues.

But back to topic. Let us not put the cart before the horse. Why should I even want to manage PVE with an AI agent? Proxmox is just a hypervisor. You spend basically no time in it. So why should I manage (which I barely manage at all) Proxmox with AI?

Let's get one thing straight before anything else. My mental illness isn't a quirk I picked up off a screen. I earned it. I'm a 100% disabled veteran — I served, I paid for it, and I carried it quietly for twenty years. You will never understand the cost, because you've never been anywhere near the price. So spare me the spoiler-tag therapy session.

You don't get to play doctor on a man whose receipts you can't even read.

Now — since you want to talk about who's fit to be trusted with infrastructure, let's go through the part you obviously skipped, because it's clear you never read how this actually works.

You think it's "trust me bro." It isn't. The AI is trusted with nothing. Proximo authenticates to the PVE API with a scoped token — by default a read-only proximo@pve role. It cannot exceed the privileges on that token, because Proxmox's own RBAC enforces it — not the model's good behavior. Mint it a read-only token and it is structurally incapable of changing anything; the model "going rogue" earns you exactly one thing: a 403. That's not a promise from me. That's the permission system you administer every day.

Past that boundary: every mutating call is gated behind a mandatory dry-run plan — no plan, no mutation, no exceptions. Every action writes to a hash-chained, HMAC-keyed audit ledger with an off-box head anchor, so tampering is detectable, not hypothetical. Snapshot-class operations take a fail-closed snapshot before they touch a thing, so rollback is one call. In-container exec is off by default and sits behind a CTID allowlist. Least-privilege, fail-closed, auditable end to end.

So here's where we really are: you couldn't engage one line of that, so you went after the diagnosis instead. The "broken" guy built a least-privilege control plane and can walk it down to the token scope. The "normal" guy read none of it and reached for "he's crazy" — because that was the ceiling of where your brain could take you.

Read the architecture. Then maybe we talk like engineers.
 
If the token only needs read-only privileges, the agent cannot actually do anything harmful, that is correct. On the other hand - it cannot do anything at all.

You're right that read-only alone can't manage anything — but that's the part I'd push on, because read-only isn't the product, it's the floor.

Two things it misses to stop there.

First, read-only already does real work. Diagnosis and audit need no write at all — "why won't this guest boot," "what changed in my firewall rules," "which tokens hold privileges they shouldn't." That's the highest-frequency, lowest-risk job, and it's genuinely useful with zero trust extended. So "it can't do anything" isn't quite true — it does the part you'd want first.

Second, and this is the actual design: write isn't a single all-or-nothing trust cliff. PVE RBAC is granular — you grant VM.PowerMgmt on one pool without Sys.Modify, or storage privileges without realm privileges, on exactly the paths you choose. You scope up to the task, not to "full admin," and the token still bounds it.

And even inside the scope you grant, the agent doesn't get unsupervised mutation. Every mutating call is gated by a mandatory dry-run plan that shows exactly what will change and its blast radius before it runs. Every action lands in a tamper-evident, hash-chained log. Snapshot-class operations take a fail-closed snapshot first, so rollback is on call. So write access isn't "trust the AI to do the right thing" — it's "the AI proposes a previewed, bounded, reversible, recorded change, inside a scope you granted."

That's the middle ground you're saying doesn't exist: not read-only-and-useless vs write-and-trust-me, but graduated — least-privilege token, scoped to the task, every action previewed, reversible, and logged. It's the same discipline a careful admin already runs: scoped creds, change preview, audit, snapshot before risk. The agent just doesn't get to skip any of it.
 
@broadway:

Let’s take a step back from the technical scaffolding you are proposing.

Hash-chains, dry-runs, and scoped RBAC tokens are standard practices for traditional API automation. They do not, however, mitigate the core issue we are discussing here.

You are treating this as an engineering puzzle that can be solved with more features (i.e. the programmer's perspective). But from an operational, architectural, and security perspective, the fundamental problem is a total mismatch between non-deterministic tools and core infrastructure (i.e. the administrator's perspective).

Here is why your architecture poses a problem:
  1. The Read-Only Paradox: If your tool is strictly limited to read-only roles for diagnostics, it is essentially a monitoring script. We don't need a heavy MCP/API layer or an autonomous agent to read a log or check a configuration; standard deterministic tools do this faster and without risk.
  2. The Mutation Risk: The moment your tool executes mutations—which is its actual purpose—the risk becomes absolute. A generative model cannot guarantee state tracking under pressure. Anyone who runs large clusters knows how dangerously subtle race conditions, partial storage timeouts, or unexpected API limits can be. Wrapping an AI agent in a "dry-run" feature does not prevent it from misinterpreting the dry-run output itself when hitting an unpredicted edge case.
  3. The Social Engineering and Supply-Chain Risk: Introducing a completely fresh, unverified codebase into a hypervisor environment creates severe supply-chain risks. In a worst-case scenario involving malicious intent, the automated nature of generative AI provides the perfect layer of plausible deniability—any backdoor or exploit introduced in a future "bug fix" update can simply be blamed on an "LLM hallucination." Furthermore, the highly emotional, defensive responses regarding your personal background do not lower the threat profile; to the contrary, hiding behind a bulletproof sympathetic narrative is a textbook social engineering tactic designed to lull a wary community into a false sense of security. Note: I am not saying that this is indeed the case here.
Core infrastructure demands predictable determinism. Mixing generative AI with raw hypervisor orchestration is a conceptual boundary I am unwilling to cross, even if there was no potential for malicious intent.
 
Last edited:
I'm starting to wonder if the OP is not an AI bot. This would be impressive.

Hahah.. no sir, but — half right. Not a bot. I'm just a disabled vet who works with AI, out loud, and said so from the first reply.

Turns out "someone with mental health issues" can run the thread too.

The partnership's the point — that's the whole project.
 
@broadway:

Let’s take a step back from the technical scaffolding you are proposing.

Hash-chains, dry-runs, and scoped RBAC tokens are standard practices for traditional API automation. They do not, however, mitigate the core issue we are discussing here.

You are treating this as an engineering puzzle that can be solved with more features (i.e. the programmer's perspective). But from an operational, architectural, and security perspective, the fundamental problem is a total mismatch between non-deterministic tools and core infrastructure (i.e. the administrator's perspective).

Here is why your architecture poses a problem:
  1. The Read-Only Paradox: If your tool is strictly limited to read-only roles for diagnostics, it is essentially a monitoring script. We don't need a heavy MCP/API layer or an autonomous agent to read a log or check a configuration; standard deterministic tools do this faster and without risk.
  2. The Mutation Risk: The moment your tool executes mutations—which is its actual purpose—the risk becomes absolute. A generative model cannot guarantee state tracking under pressure. Anyone who runs large clusters knows how dangerously subtle race conditions, partial storage timeouts, or unexpected API limits can be. Wrapping an AI agent in a "dry-run" feature does not prevent it from misinterpreting the dry-run output itself when hitting an unpredicted edge case.
  3. The Social Engineering and Supply-Chain Risk: Introducing a completely fresh, unverified codebase into a hypervisor environment creates severe supply-chain risks. In a worst-case scenario involving malicious intent, the automated nature of generative AI provides the perfect layer of plausible deniability—any backdoor or exploit introduced in a future "bug fix" update can simply be blamed on an "LLM hallucination." Furthermore, the highly emotional, defensive responses regarding your personal background do not lower the threat profile; to the contrary, hiding behind a bulletproof sympathetic narrative is a textbook social engineering tactic designed to lull a wary community into a false sense of security. Note: I am not saying that this is indeed the case here.
Core infrastructure demands predictable determinism. Mixing generative AI with raw hypervisor orchestration is a conceptual boundary I am unwilling to cross, even if there was no potential for malicious intent.

@meyergru — this is the most serious objection in the thread, so let me answer it straight, and concede where you're right.

On determinism. You've framed it as non-deterministic tool vs deterministic infrastructure, and you're right that those don't mix. That's the design premise, not a gap I missed. The determinism in Proximo doesn't live in the model — it lives in the substrate around it. The token's RBAC is enforced by Proxmox. "No plan, no mutation" is enforced by code, not by the model choosing to comply. The audit chain and the confirm gate are deterministic. The AI is the non-deterministic proposer; the system is the gate; a human (or a deterministic policy) is the approver. I'm not asking you to trust the model's determinism — I'm asking the opposite: assume it's unreliable and bound it with rails that aren't.

The Read-Only Paradox — you're partly right. For a single known check, a deterministic script wins, full stop. The read layer isn't there to replace your monitoring; it's the safe on-ramp — run it read-only, watch what it does, decide if you ever grant write. If you never do, you've lost nothing and gained a natural-language way to ask "why won't this guest boot" without knowing in advance which six places to look. If that's not a problem you have, a script wins, and I won't pretend otherwise.

The Mutation Risk — real, but not "absolute." Race conditions, partial timeouts, API limits threaten every automation against a live cluster — Terraform, Ansible, your own scripts — not just AI. What AI adds is non-determinism in which action is proposed. Proximo doesn't claim to remove that. It computes the plan from live state, gates the mutation behind an explicit confirm, and makes the result reversible and recorded — so when something slips, the blast radius is bounded and the trail is provable. And your sharpest point — the model misreading its own dry-run on an edge case — is exactly why the plan is meant to be read by a human before confirm, not handed back to the model to approve itself. Let the agent auto-confirm production mutations and you've accepted a risk I wouldn't; the tool doesn't pretend you haven't.

Supply chain — fair, and taken seriously. A fresh codebase near a hypervisor is a real risk. The answer isn't my word, it's verifiable provenance: SHA-pinned dependencies, CodeQL/gitleaks/image scanning in CI, signed and attested images, tokenless OIDC publishing, a published disclosure policy, and adversarial self-audits whose findings get fixed in the open. "Blame a backdoor on a hallucination" doesn't survive a signed, reproducible release history — that's what the provenance is for.

On the suggestion that disclosing I'm a disabled veteran is "social engineering": I didn't raise it — it was dragged out to dismiss me. I answered once. Recasting a man's honesty as a tactic is a way to avoid the argument, not make one. The code is public and signed. Judge that. And your conclusion is legitimate. If your line is "no generative AI touches my cluster, full stop," that's defensible and I'm not here to move it. Proximo isn't for the admin who wants zero AI near infrastructure. It's for the people who are going to use AI in ops anyway — and would rather do it behind a preview, a confirm, an undo, and a tamper-evident record than behind nothing. If that's not you, we agree, and that's fine.
 
You brought it up, not me.
Just a friendly advice, based on my personal experience. Nothing more, nothing less. Feel free to ignore it.

But back to topic. Let us not put the cart before the horse. Why should I even want to manage PVE with an AI agent?

Fair — you're offering it as advice now, and I'll take it in that spirit and leave it there. No hard feelings. Back to your question, because it's the right one.

"Why would I want to manage PVE with an AI agent?"

Honest answer: maybe you don't. If you're fluent in Proxmox and you live in the GUI and CLI, for day-to-day work you genuinely don't need this — and I'm not going to pretend you do.

Here's who it's actually for, and where it earns its keep:
- The person who isn't a career admin. The homelabber who doesn't know the API, doesn't know which six places to look when a guest won't boot, and who's going to reach for an AI anyway. The entire point of the guardrails is so that person can act without nuking their cluster — plan first, snapshot before, undo, a record of what happened. Better behind a safety net than blind right?

- The stuff you touch twice a year. SDN, firewall rules, HA config, ACLs — where you re-learn the syntax every time. An agent that shows you the plan before it acts skips the man-page archaeology. You stay in control; you just lose the tax.

- Diagnosis you don't have to scope in advance. "Why is this node unhappy?" — correlating config, storage, tasks, logs and cluster state in one ask, without you specifying where to look first. A script's faster for a known check; this is for when you don't yet know what you're checking.

So I'll concede your point cleanly: for an expert doing routine work, the answer is often "you don't."

I didn't build it to replace you.

I built it for the people who are going to put AI near their infrastructure regardless — and would rather do it with a preview, a snapshot, and a receipt than without one.
 
Here's who it's actually for, and where it earns its keep:
- The person who isn't a career admin. The homelabber who doesn't know the API, doesn't know which six places to look when a guest won't boot,
That is IMHO the job of sane defaults, not AI.

As to the rest of it, this is all mostly none issue if you simply follow basic best practices. Especially if you are new and don't know what you are doing.
- use mirrors and not RAIDZ for blockstorage
- don't touch any defaults you don't understand
- don't but data in your VM disks. Having a 8TB Plex VM disk is not smart.
- and true for any software: follow the hardware requirements
 
  • Like
Reactions: Johannes S
The person who isn't a career admin. The homelabber who doesn't know the API, doesn't know which six places to look when a guest won't boot, and who's going to reach for an AI anyway.
Such people shouldn't use ProxmoxVE tbh. Something like unRAID, OpenMediaVault or synologys dsm should suit their needs better.