GUI Login failure

luison · Aug 12, 2024

On a clean new PVE install we are having some repeting issues with failing logins with system users.

Journal shows:

Code:

Aug 12 18:00:00 m24 pvedaemon[279146]: authentication failure; rhost=::ffff:79.XXX.XX.95 user=luison@pam msg=cfs-lock 'authkey' error: got lock request timeout

Code:

systemctl status pve-cluster.service
Aug 08 13:52:03 m24 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
Aug 09 06:02:40 m24 pmxcfs[1722]: [database] crit: commit transaction failed: database or disk is full#010
Aug 09 06:02:40 m24 pmxcfs[1722]: [database] crit: rollback transaction failed: cannot rollback - no transaction is active#010

systemctl restart pve-cluster (or a full restart) seems to solve the problem.
Have seen some similar issues relating to cluster and HA services which as we won't use them in this case, we will be disabling.
A bit old, but I'm following this thread to disable, basically:

Code:

# systemctl stop corosync pve-ha-crm pve-ha-lrm
# systemctl disable corosync pve-ha-crm pve-ha-lrm

# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.12-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-1
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.8-3-pve-signed: 6.8.8-3
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
dnsmasq: 2.89-1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.2
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.13-1
pve-ha-manager: 4.0.5

UdoB · Aug 13, 2024

luison said:
Aug 09 06:02:40 m24 pmxcfs[1722]: [database] crit: commit transaction failed: database or disk is full#010

Suspicion: this is your root cause. Check and post df. Depending on the situation you may delete some uncritical files...

luison · Aug 13, 2024

Thanks. I was guessing it was not the case but it could have well been.

Still getting used to ZFS pool space sharing we maintained by mistake default backups pointing to "local" which is /var/lib/vz" and that got too crowded!

So, not directly related to the thread subject (I should perhaps open a new one) but that leads me to how to avoid situations like these in a ZFS environment.

RESERVATIONS
We have two pools for data and I already had some "reservations" and quotas created trying to avoid situations like these, which obviously failed in this case. Good to know now that we are still testing the server!

Code:

# zfs list -o name,quota,reservation,used,avail -r $1
NAME                                               QUOTA  RESERV   USED  AVAIL
noraidpool                                          none    none  62.5G   329G
noraidpool/backups                                  375G    none  62.3G   313G
noraidpool/cache                                     25G    none   207M  24.8G
noraidpool/tmp                                      none    none    58K   329G
rpool                                               none    none   396G   267G
rpool/ROOT                                          none     24G   214G   267G
rpool/ROOT/pve-1                                    none    none   214G   267G
rpool/data-core                                     none    none   182G   267G
rpool/data-core/data                                none    none    96K   267G
rpool/data-core/home                                none    none  37.9G   267G
rpool/data-core/vmthin                              none    none   144G   267G

The reserved 24G for root likely worked but did not avoid PVE becoming unstable in this case. I will likely be adding /var/lib/vz as a datastore now so I can force a max QUOTA for that, but I am also wondering if I should perhaps reserve some space to "rpool/ROOT/pve-1" which I am guessing is the one that provoked that pve-cluster could not write to disk.

Does it make sense? How much?
Are there any guides regarding recommended reservations/quotas for PVE base system that avoids whatsoever that the core system can get full/unstable?

UPDATE
After this comment, I noticed this:

# zfs list -t filesystem -o name,used,available,quota,reservation,refquota,refreservation
rpool/ROOT 214G 267G none 24G none none
rpool/ROOT/pve-1 214G 267G none none none 20G

Which means as I understand that somehow ROOT/pve-1 already has a 20G "refreservation" (perhaps PVE installer?).
Nevertheless also realized that reservations would have not worked on this case as (/var/lib/vz) is/was part or ROOT!

Search

Search

GUI Login failure

luison

Renowned Member

UdoB

Famous Member

luison

Renowned Member