Unable to login via Web GUI; SSH works

I do not think I can help you better than @fabian who knows what's in the postinst script, but ...



I do not see how upgrade if you subsequently run full-upgrade would be a problem in and of itself though.



All along I thought the Unable to load access control list: No buffer space available was a red herring, but now I wonder if it's not related.
I never ran a full-upgrade, though.

You're right about the buffer space. It's basically preventing me from running any commands. I tried to backup the VMs using vzdump and it gave me that same buffer error...
 
I never ran a full-upgrade, though.

Uh? Are you saying you never actually ran apt full-upgrade? I scrolled up, you were advised to run ... dist-upgrade, but that's just older (still supported) notation. You ran neither?

You're right about the buffer space. It's basically preventing me from running any commands. I tried to backup the VMs using vzdump and it gave me that same buffer error...

I thought the telling one is "Unable to load access control list", sometimes the errors however are not all that telling, it might a resulting failure somewhere down the line, not the initial cause.
 
If yes, and didn't help could you try to fix broken packages by run the following command:
Bash:
apt -f install
and then apt dist-upgrade
If the above didn't even help please, check the syslog looking for any error message during the login.

Here it was. But I got a bit lost too in the thread initially.
 
Uh? Are you saying you never actually ran apt full-upgrade? I scrolled up, you were advised to run ... dist-upgrade, but that's just older (still supported) notation. You ran neither?



I thought the telling one is "Unable to load access control list", sometimes the errors however are not all that telling, it might a resulting failure somewhere down the line, not the initial cause.
Here it was. But I got a bit lost too in the thread initially.
Yes, I ran it after the problems started, but I meant when I originally built the servers I didn't.
 
Yes, I ran it after the problems started, but I meant when I originally built the servers I didn't.

Yeah so I just do not think that's a problem if you ran it now (eventually). Can you afford to revert the AD changes on one of the nodes to test if that's related?
 
Yeah so I just do not think that's a problem if you ran it now (eventually). Can you afford to revert the AD changes on one of the nodes to test if that's related?
I'm not sure how to revert all those changes as configuration was required in both the CLI and GUI. Also, there is the issue that running any command that tries to change something results in:
Code:
Unable to load access control list: No buffer space available
 
Last edited:
I'm not sure how to revert all those changes as configuration was required in both the CLI and GUI. Also, there is the issue that running any command that tries to change something results in:
Code:
Unable to load access control list: No buffer space available

My bad.

Does date -u appear to work correctly on your system?
 
Apologies as I am flying blind, but could you try:

Code:
systemctl stop sssd
systemctl restart pvestatd
 
Assuming this is a new installation that hasnt been put in production, fastest way to recover is to just reinstall pve. Otherwise, fixing may require you to uninstall proxmox-ve, bring your dpkg state to healthy (eg, apt dist-upgrade completes without issue) and reinstalling proxmox-ve.
 
Apologies as I am flying blind, but could you try:

Code:
systemctl stop sssd
systemctl restart pvestatd
Code:
root@STP-HV:~# systemctl stop sssd
root@STP-HV:~# systemctl restart pvestatd
Job for pvestatd.service failed because the control process exited with error code.
See "systemctl status pvestatd.service" and "journalctl -xeu pvestatd.service" for details.
root@STP-HV:~# systemctl status pvestatd
× pvestatd.service - PVE Status Daemon
     Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Tue 2024-02-06 13:08:31 EST; 23s ago
   Duration: 1month 3w 2d 9h 25min 1.587s
    Process: 3614594 ExecStart=/usr/bin/pvestatd start (code=exited, status=105)
        CPU: 309ms


Feb 06 13:08:30 STP-HV systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Feb 06 13:08:31 STP-HV pvestatd[3614594]: Unable to load access control list: No buffer space available
Feb 06 13:08:31 STP-HV systemd[1]: pvestatd.service: Control process exited, code=exited, status=105/n/a
Feb 06 13:08:31 STP-HV systemd[1]: pvestatd.service: Failed with result 'exit-code'.
Feb 06 13:08:31 STP-HV systemd[1]: Failed to start pvestatd.service - PVE Status Daemon.

Assuming this is a new installation that hasnt been put in production, fastest way to recover is to just reinstall pve. Otherwise, fixing may require you to uninstall proxmox-ve, bring your dpkg state to healthy (eg, apt dist-upgrade completes without issue) and reinstalling proxmox-ve.
It is in production which is the issue. What does reinstalling proxmox-ve entail? Will I lose configuration data for the VMs? What are the consequences of doing this while the VMs remain running?
 
Assuming this is a new installation that hasnt been put in production, fastest way to recover is to just reinstall pve. Otherwise, fixing may require you to uninstall proxmox-ve, bring your dpkg state to healthy (eg, apt dist-upgrade completes without issue) and reinstalling proxmox-ve.

I would still suggest he keeps one of these standalone nodes "aside" to find out what it was - it is not normal to run into this just because he is using AD, so it would be important it does not happen in the future.
 
If you still have patience, can you show ls -1 /etc/pam*?
Code:
root@STP-HV:~# ls -1 /etc/pam*
/etc/pam.conf


/etc/pam.d:
chfn
chpasswd
chsh
common-account
common-auth
common-password
common-session
common-session-noninteractive
cron
login
newusers
other
passwd
runuser
runuser-l
samba
sshd
sssd-shadowutils
su
su-l
 
t is in production which is the issue. What does reinstalling proxmox-ve entail? Will I lose configuration data for the VMs? What are the consequences of doing this while the VMs remain running?
You wont lose the configuration data, and the VMs will continue to function. you will lose the pve command and control plane as some pve services will be uninstalled in the process.

the problem is that with the system running, there is the possibility that even uninstalling proxmox-ve would not restore dpkg to health as many dependant packages will remain installed. here is what I'd recommend- since the guests are all running at this time, schedule outage, during which time you will
1. backup your vms
2. backup /etc/network/interfaces and /etc/pve/storage.cfg. If your storage is network, or not shared with your boot disk. /etc/pve/qemu-server and /etc/pve/lxc.
3. reinstall pve.
4. if your disk setup keeps the vdisk storage seperate, congratulations. restore what you backed up in step two, and you're off to the races.
if it wasnt, restore all your vms too.

it is not normal to run into this just because he is using AD,
fify. not normal is what HA is made for. @cjbruck you have 9 nodes- cluster them.
 
fify. not normal is what HA is made for. @cjbruck you have 9 nodes- cluster them.

It is completely his call how to run his nodes, in this situation it actually helps troubleshooting - it is simpler as there's no cluster and whatever it is that's causing the issue the OP just managed to reproduce it 9 times. I have a hunch this will be reproduced 10th time (maybe later on) and he has a cluster, even more fun to troubleshoot/recover.
 
t is completely his call how to run his nodes,
no doubt.

in this situation it actually helps troubleshooting
I dont want to sound snobby or dismissive, but have you actually ever run a cluster of anything? the WHOLE POINT is that you can do things one node at a time and not risk your operation. you can troubleshoot to your hearts content.

me, I prefer not to have to.
 
You wont lose the configuration data, and the VMs will continue to function. you will lose the pve command and control plane as some pve services will be uninstalled in the process.

the problem is that with the system running, there is the possibility that even uninstalling proxmox-ve would not restore dpkg to health as many dependant packages will remain installed. here is what I'd recommend- since the guests are all running at this time, schedule outage, during which time you will
1. backup your vms
2. backup /etc/network/interfaces and /etc/pve/storage.cfg. If your storage is network, or not shared with your boot disk. /etc/pve/qemu-server and /etc/pve/lxc.
3. reinstall pve.
4. if your disk setup keeps the vdisk storage seperate, congratulations. restore what you backed up in step two, and you're off to the races.
if it wasnt, restore all your vms too.
Thank you for the recommendation. I can't backup the VMs themselves since the vzdump command doesn't work right now, but I can and already have backed up the most vital information from them so they can be recreated. I'm debating whether I should just start from scratch during a scheduled outage.

fify. not normal is what HA is made for. @cjbruck you have 9 nodes- cluster them.
It is completely his call how to run his nodes, in this situation it actually helps troubleshooting - it is simpler as there's no cluster and whatever it is that's causing the issue the OP just managed to reproduce it 9 times. I have a hunch this will be reproduced 10th time (maybe later on) and he has a cluster, even more fun to troubleshoot/recover.
no doubt.


I dont want to sound snobby or dismissive, but have you actually ever run a cluster of anything? the WHOLE POINT is that you can do things one node at a time and not risk your operation. you can troubleshoot to your hearts content.

me, I prefer not to have to.
Let me explain a bit about the environment. Each server is at a different physical "site" and runs various services for them. While we do have low-latency connections between them utilizing a MAN, according to Proxmox Network Requirements having 5ms is recommended and some of our sites are far enough away from each other that they exceed that. My understanding of clustering and HA in general is that it is really meant for when you have two or more servers at a single site and you want redundancy for that site. This is not the case in my situation; each site has only 1 server dedicated to it.
 
  • Like
Reactions: esi_y
I dont want to sound snobby or dismissive, but have you actually ever run a cluster of anything? the WHOLE POINT is that you can do things one node at a time and not risk your operation. you can troubleshoot to your hearts content.

When you look at the OP post, he got 9 nodes and basically just set up AD on them. If he set it up as cluster, each node equally, how exactly would that be of any help now? With pvestatd not running, what would the HA do there?

I take this like a post where someone has a single node problem of unknown origin and it is not something "random" for sure.

If I was PVE staff, I would want to know what went wrong also, the error message is cryptic at best. It's perfect material to find out. It's not great to encounter in production, but for that there's backups, nothing to do with clustering.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!