Cannot write inside /etc/pve

fcorbelli · May 9, 2023

I should mention that I am using a freshly-installed proxmox system not made by me (using a template from the supplier), so I do not know exactly how it was done

For some reason unknown to me, it is not possible to write inside /etc/pve and its folders (getting something like)

unable to create VM unable to open file /etc/pve/nodes/nsxxxxx/qemu-server/102.conf.tmp
Input/output error (500)

Quick-and dirty check: /etc/pve seems to be a "special" folder

The system (should) have 2xmirror-zfs based, + 1 spare SSD-zfs drive


root@ns337400:/etc# sudo su
root@ns337400:/etc# echo prova >/etc/pve/test1.txt
bash: /etc/pve/test1.txt: Input/output error
root@ns337400:/etc#


root@ns337400:/etc# df -h /etc/pve
Filesystem      Size  Used Avail Use% Mounted on
/dev/fuse       128M   16K  128M   1% /etc/pve
root@ns337400:/etc# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs           3.2G  1.4M  3.2G   1% /run
zp0/zd1          20G  3.6G   17G  18% /
tmpfs            16G   46M   16G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
zp0/zd0         1.0G   92M  933M   9% /boot
tank            431G  128K  431G   1% /tank
zp0/zd2         7.2T  123G  7.1T   2% /var/lib/vz
/dev/fuse       128M   16K  128M   1% /etc/pve
tmpfs           3.2G     0  3.2G   0% /run/user/0


root@ns337400:/etc# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=16226532k,nr_inodes=4056633,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3252000k,mode=755,inode64)
zp0/zd1 on / type zfs (rw,xattr,posixacl)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=19184)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
/dev/sdc1 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
tank on /tank type zfs (rw,xattr,noacl)
zp0/zd0 on /boot type zfs (rw,xattr,posixacl)
zp0/zd2 on /var/lib/vz type zfs (rw,xattr,posixacl)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=3251996k,nr_inodes=812999,mode=700,inode64)


root@ns337400:/etc# zfs list
NAME      USED  AVAIL     REFER  MOUNTPOINT
tank      620K   430G      104K  /tank
zp0       126G  7.02T       96K  none
zp0/zd0  91.7M   932M     91.7M  /boot
zp0/zd1  3.51G  16.5G     3.51G  /
zp0/zd2   122G  7.02T      122G  /var/lib/vz

root@ns337400:/etc# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue May  9 13:28:25 2023
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          sda       ONLINE       0     0     0

errors: No known data errors

  pool: zp0
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:00:23 with 0 errors on Tue May  9 13:28:22 2023
config:

        NAME        STATE     READ WRITE CKSUM
        zp0         ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb2    ONLINE       0     0     0
            sdc2    ONLINE       0     0     0

errors: No known data errors

Any ideas? Thanks!

Chris · May 9, 2023

Hi,
/etc/pve is a fuse based mount backed by a sqlite database. This is provided by the pmxcfs as described in more detail here https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pmxcfs

In order identify the cause of your write issues, please post the output of

Bash:

journalctl -b -u pve-cluster.service
systemctl status pve-cluster.service
pvecm status

fcorbelli · May 9, 2023

Something seems really wrong

Code:

journalctl -b -u pve-cluster.service
-- Journal begins at Tue 2023-05-09 12:37:58 UTC, ends at Tue 2023-05-09 16:40:46 UTC. --
May 09 13:39:26 ns337400 systemd[1]: Starting The Proxmox VE cluster filesystem...
May 09 13:39:27 ns337400 systemd[1]: Started The Proxmox VE cluster filesystem.
May 09 14:36:18 ns337400 pmxcfs[2412]: [database] crit: commit transaction failed: disk I/O>
May 09 14:36:18 ns337400 pmxcfs[2412]: [database] crit: rollback transaction failed: cannot>

Code:

root@ns337400:~# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabl>
     Active: active (running) since Tue 2023-05-09 13:39:27 UTC; 3h 2min ago
    Process: 2382 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 2412 (pmxcfs)
      Tasks: 6 (limit: 38030)
     Memory: 57.0M
        CPU: 7.134s
     CGroup: /system.slice/pve-cluster.service
             └─2412 /usr/bin/pmxcfs

May 09 13:39:26 ns337400 systemd[1]: Starting The Proxmox VE cluster filesystem...
May 09 13:39:27 ns337400 systemd[1]: Started The Proxmox VE cluster filesystem.
May 09 14:36:18 ns337400 pmxcfs[2412]: [database] crit: commit transaction failed: disk I/O>
May 09 14:36:18 ns337400 pmxcfs[2412]: [database] crit: rollback transaction failed: cannot

Code:

root@ns337400:~# pvecm status
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

Should (...should!) not be a HW error, can copy ~300GB OK

fcorbelli · May 9, 2023

If you have major problems with your Proxmox VE host, for example hardware issues, it could be helpful to copy the pmxcfs database file /var/lib/pve-cluster/config.db, and move it to a new Proxmox VE host. On the new host (with nothing running), you need to stop the pve-cluster service and replace the config.db file (required permissions 0600). Following this, adapt /etc/hostname and /etc/hosts according to the lost Proxmox VE host, then reboot and check (and don’t forget your VM/CT data).

Seems a bit... radical...

fcorbelli · May 9, 2023

OK, so far I made

Code:

systemctl stop pve-cluster
cd /var/lib/pve-cluster
mv config.db config.kaputt
sftp (another config.db from another proxmox server)...
shutdown -r now

But now, of course, I do not get the local storage, getting the remote storage, and remote "dangling" VM

Chris · May 9, 2023

fcorbelli said:
If you have major problems with your Proxmox VE host, for example hardware issues, it could be helpful to copy the pmxcfs database file /var/lib/pve-cluster/config.db, and move it to a new Proxmox VE host. On the new host (with nothing running), you need to stop the pve-cluster service and replace the config.db file (required permissions 0600). Following this, adapt /etc/hostname and /etc/hosts according to the lost Proxmox VE host, then reboot and check (and don’t forget your VM/CT data).

Seems a bit... radical...

This is intended for disaster recovery, in your case there might still be a chance to recover the database.

fcorbelli said:
OK, so far I made

Code:

systemctl stop pve-cluster cd /var/lib/pve-cluster mv config.db config.kaputt sftp (another config.db from another proxmox server)... shutdown -r now

But now, of course, I do not get the local storage, getting the remote storage, and remote "dangling" VM

Hmm, you where to quick, I would have suggested to simply try to restart the service with the existing DB... Can you swap the databases again and see if the transaction errors persist?

Chris · May 9, 2023

fcorbelli said:
root@ns337400:~# pvecm status Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

BTW, this is fine, if this is a standalone node.

fcorbelli · May 9, 2023

Chris said:
This is intended for disaster recovery, in your case there might still be a chance to recover the database.

Hmm, you where to quick, I would have suggested to simply try to restart the service with the existing DB... Can you swap the databases again and see if the transaction errors persist?

Copied-back the SQL "kaputt" and restarted, seems to work
Weird

Thank you very much
Just another info: is therefore necessary to backup the config.db (via a ZFS snapshot, for example) as a backup-disaster/recovery measure?

Chris · May 9, 2023

fcorbelli said:
Copied-back the SQL "kaputt" and restarted, seems to work
Weird

Thank you very much
Just another info: is therefore necessary to backup the config.db (via a ZFS snapshot, for example) as a backup-disaster/recovery measure?

Well, the transaction was not persisted because of the IO error and the rollback failed as well, so restarting the DB got rid of the transaction I presume. The question remains why the IO error arouse to begin with, it can have multiple causes https://www.sqlite.org/rescode.html#ioerr

A backup of the config.db is definitely recommended for quick disaster recovery.

Search

Search

Cannot write inside /etc/pve

fcorbelli

New Member

Chris

Proxmox Staff Member

fcorbelli

New Member

fcorbelli

New Member

fcorbelli

New Member

Chris

Proxmox Staff Member

Chris

Proxmox Staff Member

fcorbelli

New Member

Chris

Proxmox Staff Member