Unable to login via Web GUI; SSH works

cjbruck · Feb 5, 2024

tempacc346235 said:
I do not think I can help you better than @fabian who knows what's in the postinst script, but ...

I do not see how upgrade if you subsequently run full-upgrade would be a problem in and of itself though.

All along I thought the Unable to load access control list: No buffer space available was a red herring, but now I wonder if it's not related.

I never ran a full-upgrade, though.

You're right about the buffer space. It's basically preventing me from running any commands. I tried to backup the VMs using vzdump and it gave me that same buffer error...

esi_y · Feb 5, 2024

cjbruck said:
I never ran a full-upgrade, though.

Uh? Are you saying you never actually ran apt full-upgrade? I scrolled up, you were advised to run ... dist-upgrade, but that's just older (still supported) notation. You ran neither?

cjbruck said:
You're right about the buffer space. It's basically preventing me from running any commands. I tried to backup the VMs using vzdump and it gave me that same buffer error...

I thought the telling one is "Unable to load access control list", sometimes the errors however are not all that telling, it might a resulting failure somewhere down the line, not the initial cause.

esi_y · Feb 5, 2024

Moayad said:
If yes, and didn't help could you try to fix broken packages by run the following command:

Bash:

apt -f install

and then apt dist-upgrade
If the above didn't even help please, check the syslog looking for any error message during the login.

Here it was. But I got a bit lost too in the thread initially.

cjbruck · Feb 5, 2024

tempacc346235 said:
Uh? Are you saying you never actually ran apt full-upgrade? I scrolled up, you were advised to run ... dist-upgrade, but that's just older (still supported) notation. You ran neither?

I thought the telling one is "Unable to load access control list", sometimes the errors however are not all that telling, it might a resulting failure somewhere down the line, not the initial cause.

tempacc346235 said:
Here it was. But I got a bit lost too in the thread initially.

Yes, I ran it after the problems started, but I meant when I originally built the servers I didn't.

esi_y · Feb 5, 2024

cjbruck said:
Yes, I ran it after the problems started, but I meant when I originally built the servers I didn't.

Yeah so I just do not think that's a problem if you ran it now (eventually). Can you afford to revert the AD changes on one of the nodes to test if that's related?

cjbruck · Feb 6, 2024

tempacc346235 said:
Yeah so I just do not think that's a problem if you ran it now (eventually). Can you afford to revert the AD changes on one of the nodes to test if that's related?

I'm not sure how to revert all those changes as configuration was required in both the CLI and GUI. Also, there is the issue that running any command that tries to change something results in:

Code:

Unable to load access control list: No buffer space available

esi_y · Feb 6, 2024

cjbruck said:
I'm not sure how to revert all those changes as configuration was required in both the CLI and GUI. Also, there is the issue that running any command that tries to change something results in:

Code:

Unable to load access control list: No buffer space available

My bad.

Does date -u appear to work correctly on your system?

esi_y · Feb 6, 2024

Also, can you show id root?

cjbruck · Feb 6, 2024

tempacc346235 said:
My bad.

Does date -u appear to work correctly on your system?

Yes

Code:

root@STP-HV:~# date -u
Tue Feb  6 05:51:49 PM UTC 2024

tempacc346235 said:
Also, can you show id root?

Code:

root@STP-HV:~# id root
uid=0(root) gid=0(root) groups=0(root)

esi_y · Feb 6, 2024

Apologies as I am flying blind, but could you try:

Code:

systemctl stop sssd
systemctl restart pvestatd

alexskysilk · Feb 6, 2024

Assuming this is a new installation that hasnt been put in production, fastest way to recover is to just reinstall pve. Otherwise, fixing may require you to uninstall proxmox-ve, bring your dpkg state to healthy (eg, apt dist-upgrade completes without issue) and reinstalling proxmox-ve.

cjbruck · Feb 6, 2024

tempacc346235 said:
Apologies as I am flying blind, but could you try:

Code:

systemctl stop sssd systemctl restart pvestatd

Code:

root@STP-HV:~# systemctl stop sssd
root@STP-HV:~# systemctl restart pvestatd
Job for pvestatd.service failed because the control process exited with error code.
See "systemctl status pvestatd.service" and "journalctl -xeu pvestatd.service" for details.
root@STP-HV:~# systemctl status pvestatd
× pvestatd.service - PVE Status Daemon
     Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Tue 2024-02-06 13:08:31 EST; 23s ago
   Duration: 1month 3w 2d 9h 25min 1.587s
    Process: 3614594 ExecStart=/usr/bin/pvestatd start (code=exited, status=105)
        CPU: 309ms


Feb 06 13:08:30 STP-HV systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Feb 06 13:08:31 STP-HV pvestatd[3614594]: Unable to load access control list: No buffer space available
Feb 06 13:08:31 STP-HV systemd[1]: pvestatd.service: Control process exited, code=exited, status=105/n/a
Feb 06 13:08:31 STP-HV systemd[1]: pvestatd.service: Failed with result 'exit-code'.
Feb 06 13:08:31 STP-HV systemd[1]: Failed to start pvestatd.service - PVE Status Daemon.

alexskysilk said:
Assuming this is a new installation that hasnt been put in production, fastest way to recover is to just reinstall pve. Otherwise, fixing may require you to uninstall proxmox-ve, bring your dpkg state to healthy (eg, apt dist-upgrade completes without issue) and reinstalling proxmox-ve.

It is in production which is the issue. What does reinstalling proxmox-ve entail? Will I lose configuration data for the VMs? What are the consequences of doing this while the VMs remain running?

esi_y · Feb 6, 2024

alexskysilk said:
Assuming this is a new installation that hasnt been put in production, fastest way to recover is to just reinstall pve. Otherwise, fixing may require you to uninstall proxmox-ve, bring your dpkg state to healthy (eg, apt dist-upgrade completes without issue) and reinstalling proxmox-ve.

I would still suggest he keeps one of these standalone nodes "aside" to find out what it was - it is not normal to run into this just because he is using AD, so it would be important it does not happen in the future.

esi_y · Feb 6, 2024

If you still have patience, can you show ls -1 /etc/pam*?

cjbruck · Feb 6, 2024

tempacc346235 said:
If you still have patience, can you show ls -1 /etc/pam*?

Code:

root@STP-HV:~# ls -1 /etc/pam*
/etc/pam.conf


/etc/pam.d:
chfn
chpasswd
chsh
common-account
common-auth
common-password
common-session
common-session-noninteractive
cron
login
newusers
other
passwd
runuser
runuser-l
samba
sshd
sssd-shadowutils
su
su-l

alexskysilk · Feb 6, 2024

cjbruck said:
t is in production which is the issue. What does reinstalling proxmox-ve entail? Will I lose configuration data for the VMs? What are the consequences of doing this while the VMs remain running?

You wont lose the configuration data, and the VMs will continue to function. you will lose the pve command and control plane as some pve services will be uninstalled in the process.

the problem is that with the system running, there is the possibility that even uninstalling proxmox-ve would not restore dpkg to health as many dependant packages will remain installed. here is what I'd recommend- since the guests are all running at this time, schedule outage, during which time you will
1. backup your vms
2. backup /etc/network/interfaces and /etc/pve/storage.cfg. If your storage is network, or not shared with your boot disk. /etc/pve/qemu-server and /etc/pve/lxc.
3. reinstall pve.
4. if your disk setup keeps the vdisk storage seperate, congratulations. restore what you backed up in step two, and you're off to the races.
if it wasnt, restore all your vms too.

tempacc346235 said:
it is not normal to run into this ~~just because he is using AD,~~

fify. not normal is what HA is made for. @cjbruck you have 9 nodes- cluster them.

esi_y · Feb 6, 2024

alexskysilk said:
fify. not normal is what HA is made for. @cjbruck you have 9 nodes- cluster them.

It is completely his call how to run his nodes, in this situation it actually helps troubleshooting - it is simpler as there's no cluster and whatever it is that's causing the issue the OP just managed to reproduce it 9 times. I have a hunch this will be reproduced 10th time (maybe later on) and he has a cluster, even more fun to troubleshoot/recover.

alexskysilk · Feb 6, 2024

tempacc346235 said:
t is completely his call how to run his nodes,

no doubt.

tempacc346235 said:
in this situation it actually helps troubleshooting

I dont want to sound snobby or dismissive, but have you actually ever run a cluster of anything? the WHOLE POINT is that you can do things one node at a time and not risk your operation. you can troubleshoot to your hearts content.

me, I prefer not to have to.

cjbruck · Feb 6, 2024

alexskysilk said:
You wont lose the configuration data, and the VMs will continue to function. you will lose the pve command and control plane as some pve services will be uninstalled in the process.

the problem is that with the system running, there is the possibility that even uninstalling proxmox-ve would not restore dpkg to health as many dependant packages will remain installed. here is what I'd recommend- since the guests are all running at this time, schedule outage, during which time you will
1. backup your vms
2. backup /etc/network/interfaces and /etc/pve/storage.cfg. If your storage is network, or not shared with your boot disk. /etc/pve/qemu-server and /etc/pve/lxc.
3. reinstall pve.
4. if your disk setup keeps the vdisk storage seperate, congratulations. restore what you backed up in step two, and you're off to the races.
if it wasnt, restore all your vms too.

Thank you for the recommendation. I can't backup the VMs themselves since the vzdump command doesn't work right now, but I can and already have backed up the most vital information from them so they can be recreated. I'm debating whether I should just start from scratch during a scheduled outage.

alexskysilk said:
fify. not normal is what HA is made for. @cjbruck you have 9 nodes- cluster them.

tempacc346235 said:
It is completely his call how to run his nodes, in this situation it actually helps troubleshooting - it is simpler as there's no cluster and whatever it is that's causing the issue the OP just managed to reproduce it 9 times. I have a hunch this will be reproduced 10th time (maybe later on) and he has a cluster, even more fun to troubleshoot/recover.

alexskysilk said:
no doubt.

I dont want to sound snobby or dismissive, but have you actually ever run a cluster of anything? the WHOLE POINT is that you can do things one node at a time and not risk your operation. you can troubleshoot to your hearts content.

me, I prefer not to have to.

Let me explain a bit about the environment. Each server is at a different physical "site" and runs various services for them. While we do have low-latency connections between them utilizing a MAN, according to Proxmox Network Requirements having 5ms is recommended and some of our sites are far enough away from each other that they exceed that. My understanding of clustering and HA in general is that it is really meant for when you have two or more servers at a single site and you want redundancy for that site. This is not the case in my situation; each site has only 1 server dedicated to it.

esi_y · Feb 6, 2024

alexskysilk said:
I dont want to sound snobby or dismissive, but have you actually ever run a cluster of anything? the WHOLE POINT is that you can do things one node at a time and not risk your operation. you can troubleshoot to your hearts content.

When you look at the OP post, he got 9 nodes and basically just set up AD on them. If he set it up as cluster, each node equally, how exactly would that be of any help now? With pvestatd not running, what would the HA do there?

I take this like a post where someone has a single node problem of unknown origin and it is not something "random" for sure.

If I was PVE staff, I would want to know what went wrong also, the error message is cryptic at best. It's perfect material to find out. It's not great to encounter in production, but for that there's backups, nothing to do with clustering.

Unable to login via Web GUI; SSH works

New Member

Renowned Member

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

Renowned Member

New Member

Renowned Member

Distinguished Member

New Member

Renowned Member

Renowned Member

New Member

Distinguished Member

Renowned Member

Distinguished Member

New Member

Renowned Member