Subject: Proxmox VE 8.x Single Node - pmxcfs Not Mounting /etc/pve Despite Running Services (chown/lsattr fail)
Hi all,
I'm facing a critical issue on my single-node Proxmox VE server where the `/etc/pve` directory is not being mounted by `pmxcfs`, leading to various problems including Web UI inaccessibility (SSL errors) and failures in VM management commands. I need help diagnosing the root cause and finding a solution accessible via SSH only (no physical access or ISO booting possible).
**System Environment:**
* Proxmox VE Version: 8.4.0 base (pve-manager: 8.4.1, pve-cluster: 8.1.0)
* Kernel: 6.8.12-9-pve
* Node Type: Single Node (Hostname: Cloud9)
* Root Filesystem: ZFS (rpool/ROOT/pve-1 on /)
* Network: Configured correctly with two static IP blocks via vmbr0 (verified persistent after reboot).
* Access: SSH access as sudo user (`lgv`) is working fine.
**Core Problem:**
* The `pmxcfs` FUSE filesystem does not mount on `/etc/pve`. The `mount` command does not show `/etc/pve` mounted via `fuse.pmxcfs`.
* The underlying directory `/etc/pve` (on the ZFS root filesystem) contains the expected configuration files and subdirectories, but seems to have corrupted metadata (incorrect group ownership `www-data`, strange timestamps).
**Symptoms:**
* **Web UI:** Inaccessible with SSL certificate errors (`certificate verify failed` or browser security warnings). Manually creating certificate symlinks failed due to permissions. Was briefly accessible with a self-signed cert warning after reboot.
* **VM Management:** Commands like `qm status`, `qm shutdown` often fail reporting missing configuration files (though `qm list` worked intermittently after the last reboot).
* **Filesystem Errors on `/etc/pve`:**
* `lsattr -d /etc/pve` fails with "Operation not supported".
* `sudo chown root:root /etc/pve` fails with "Operation not permitted".
* `sudo chown root:root /etc/pve/nodes/Cloud9` also fails ("Operation not permitted").
**Current Status (What IS Working):**
* Host boots successfully.
* SSH access is fully functional.
* Network configuration (`/etc/network/interfaces`) is correct and both IP blocks are active on `vmbr0`.
* Core services `pve-cluster` and `corosync` report as `active (running)` via `systemctl status`, and `corosync` reports quorum is achieved.
* Other services like `pveproxy`, `pvedaemon`, `pvestatd` also report as `active (running)`.
* ZFS pool (`rpool`) reports as ONLINE.
**Troubleshooting Steps Performed (Via SSH):**
1. **Initial State:** Problem likely started after package updates or system interruption. Initially observed `pmxcfs.service` not found by `systemctl`.
2. **Package Reinstall:** Reinstalled `pve-manager` (incorrectly thought to contain `pmxcfs.service`).
3. **Package Reinstall (Corrected):** Reinstalled `pve-cluster` (v8.1.0). Confirmed via `dpkg -L` that `pmxcfs.service` is *not* part of this package, but `pve-cluster.service` is.
4. **Service Restarts:** Multiple attempts to restart `pve-cluster`, `corosync`, and all related PVE services in various orders. Services report successful restarts, but `/etc/pve` remains unmounted.
5. **Lock File Check:** Checked for and removed `/var/lib/pve-cluster/config.db-lock` (was not present).
6. **Reboot:** Rebooted the host. Network config persisted correctly, `pve-cluster`/`corosync` started correctly, `qm list` worked briefly, but `/etc/pve` remained unmounted and UI was still inaccessible/insecure.
7. **Directory Rename/Recreate:** Stopped services, renamed `/etc/pve` to `/etc/pve.bak`, created a new empty `/etc/pve` with `root:root` 755 permissions. Restarted services - `/etc/pve` *still* failed to mount. Restored the original `/etc/pve` by renaming `/etc/pve.bak` back.
8. **Permissions/Attributes:** Confirmed `/etc/pve` had `root:www-data` ownership (incorrect) and strange timestamps after restore. Attempted `sudo chown root:root /etc/pve` which failed ("Operation not permitted"). Attempted `lsattr -d /etc/pve` which failed ("Operation not supported").
9. **Certificate Links:** Confirmed valid Let's Encrypt certificate exists for the hostname. Attempted to manually create symlinks (`/etc/pve/nodes/Cloud9/pveproxy-ssl.pem` -> LE cert) but failed due to inability to modify `/etc/pve/nodes/Cloud9` (`chown` failed).
**Request:**
Given the persistent failure of `pmxcfs` to mount `/etc/pve` despite running services and the inability to modify the underlying `/etc/pve` directory even as root (suggesting a filesystem-level lock or corruption specific to that path), what further steps can be taken via SSH to diagnose and fix this? Is there a way to force `pmxcfs` to mount, debug its failure more deeply, or repair the state of the `/etc/pve` directory on ZFS?
Any help or pointers would be greatly appreciated!
Hi all,
I'm facing a critical issue on my single-node Proxmox VE server where the `/etc/pve` directory is not being mounted by `pmxcfs`, leading to various problems including Web UI inaccessibility (SSL errors) and failures in VM management commands. I need help diagnosing the root cause and finding a solution accessible via SSH only (no physical access or ISO booting possible).
**System Environment:**
* Proxmox VE Version: 8.4.0 base (pve-manager: 8.4.1, pve-cluster: 8.1.0)
* Kernel: 6.8.12-9-pve
* Node Type: Single Node (Hostname: Cloud9)
* Root Filesystem: ZFS (rpool/ROOT/pve-1 on /)
* Network: Configured correctly with two static IP blocks via vmbr0 (verified persistent after reboot).
* Access: SSH access as sudo user (`lgv`) is working fine.
**Core Problem:**
* The `pmxcfs` FUSE filesystem does not mount on `/etc/pve`. The `mount` command does not show `/etc/pve` mounted via `fuse.pmxcfs`.
* The underlying directory `/etc/pve` (on the ZFS root filesystem) contains the expected configuration files and subdirectories, but seems to have corrupted metadata (incorrect group ownership `www-data`, strange timestamps).
**Symptoms:**
* **Web UI:** Inaccessible with SSL certificate errors (`certificate verify failed` or browser security warnings). Manually creating certificate symlinks failed due to permissions. Was briefly accessible with a self-signed cert warning after reboot.
* **VM Management:** Commands like `qm status`, `qm shutdown` often fail reporting missing configuration files (though `qm list` worked intermittently after the last reboot).
* **Filesystem Errors on `/etc/pve`:**
* `lsattr -d /etc/pve` fails with "Operation not supported".
* `sudo chown root:root /etc/pve` fails with "Operation not permitted".
* `sudo chown root:root /etc/pve/nodes/Cloud9` also fails ("Operation not permitted").
**Current Status (What IS Working):**
* Host boots successfully.
* SSH access is fully functional.
* Network configuration (`/etc/network/interfaces`) is correct and both IP blocks are active on `vmbr0`.
* Core services `pve-cluster` and `corosync` report as `active (running)` via `systemctl status`, and `corosync` reports quorum is achieved.
* Other services like `pveproxy`, `pvedaemon`, `pvestatd` also report as `active (running)`.
* ZFS pool (`rpool`) reports as ONLINE.
**Troubleshooting Steps Performed (Via SSH):**
1. **Initial State:** Problem likely started after package updates or system interruption. Initially observed `pmxcfs.service` not found by `systemctl`.
2. **Package Reinstall:** Reinstalled `pve-manager` (incorrectly thought to contain `pmxcfs.service`).
3. **Package Reinstall (Corrected):** Reinstalled `pve-cluster` (v8.1.0). Confirmed via `dpkg -L` that `pmxcfs.service` is *not* part of this package, but `pve-cluster.service` is.
4. **Service Restarts:** Multiple attempts to restart `pve-cluster`, `corosync`, and all related PVE services in various orders. Services report successful restarts, but `/etc/pve` remains unmounted.
5. **Lock File Check:** Checked for and removed `/var/lib/pve-cluster/config.db-lock` (was not present).
6. **Reboot:** Rebooted the host. Network config persisted correctly, `pve-cluster`/`corosync` started correctly, `qm list` worked briefly, but `/etc/pve` remained unmounted and UI was still inaccessible/insecure.
7. **Directory Rename/Recreate:** Stopped services, renamed `/etc/pve` to `/etc/pve.bak`, created a new empty `/etc/pve` with `root:root` 755 permissions. Restarted services - `/etc/pve` *still* failed to mount. Restored the original `/etc/pve` by renaming `/etc/pve.bak` back.
8. **Permissions/Attributes:** Confirmed `/etc/pve` had `root:www-data` ownership (incorrect) and strange timestamps after restore. Attempted `sudo chown root:root /etc/pve` which failed ("Operation not permitted"). Attempted `lsattr -d /etc/pve` which failed ("Operation not supported").
9. **Certificate Links:** Confirmed valid Let's Encrypt certificate exists for the hostname. Attempted to manually create symlinks (`/etc/pve/nodes/Cloud9/pveproxy-ssl.pem` -> LE cert) but failed due to inability to modify `/etc/pve/nodes/Cloud9` (`chown` failed).
**Request:**
Given the persistent failure of `pmxcfs` to mount `/etc/pve` despite running services and the inability to modify the underlying `/etc/pve` directory even as root (suggesting a filesystem-level lock or corruption specific to that path), what further steps can be taken via SSH to diagnose and fix this? Is there a way to force `pmxcfs` to mount, debug its failure more deeply, or repair the state of the `/etc/pve` directory on ZFS?
Any help or pointers would be greatly appreciated!