Connection error 500: RPCEnvironment init request failed: Unable to load access control list: Connection refused

Maksimus

Member
May 16, 2022
78
3
13
On June 19 at 14:20-14:30 the server rebooted for no reason.
06/22 in the morning around 9:00 the server hangs,
27\06 14:16 the server freezes completely (we cannot control it via the console, see screenshot 214), after which gui pve stops loading

Connection error 500: RPCEnvironment init request failed: Unable to load access control list: Connection refused
when you try to enter the /ets/pve directory (see screenshot 215 and 216), an error is also thrown, if you start the pve-cluster service manually, the directory becomes accessible and pve gui also starts working, after about 30 seconds of operation it crashes.

pveversion
pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-1-pve)
 

Attachments

  • Screenshot_215.png
    Screenshot_215.png
    3.2 KB · Views: 9
  • Screenshot_216.png
    Screenshot_216.png
    3.4 KB · Views: 9
  • Screenshot_214.png
    Screenshot_214.png
    86.9 KB · Views: 9
Last edited:
After upgrading to kernel 6.8.8-2, the logs contain tons of errors, the gui sometimes works, sometimes it doesn’t.

Code:
2024-06-27T17:52:20.599777+03:00 Host034 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
2024-06-27T17:52:20.612371+03:00 Host034 pmxcfs[43173]: [main] notice: resolved node name 'Host034' to '95.101.219.125' for default node IP address
2024-06-27T17:52:20.612485+03:00 Host034 pmxcfs[43173]: [main] notice: resolved node name 'Host034' to '95.101.219.125' for default node IP address
2024-06-27T17:52:20.854465+03:00 Host034 systemd[1]: etc-pve.mount: Deactivated successfully.
2024-06-27T17:52:20.873829+03:00 Host034 pmxcfs[43180]: [status] notice: update cluster info (cluster name  Storage02, version = 2)
2024-06-27T17:52:20.881152+03:00 Host034 pmxcfs[43180]: [status] notice: node has quorum
2024-06-27T17:52:20.881288+03:00 Host034 pmxcfs[43180]: [dcdb] notice: members: 1/43180, 2/4582
2024-06-27T17:52:20.881326+03:00 Host034 pmxcfs[43180]: [dcdb] notice: starting data syncronisation
2024-06-27T17:52:20.881487+03:00 Host034 pmxcfs[43180]: [status] notice: members: 1/43180, 2/4582
2024-06-27T17:52:20.881523+03:00 Host034 pmxcfs[43180]: [status] notice: starting data syncronisation
2024-06-27T17:52:20.881566+03:00 Host034 pmxcfs[43180]: [dcdb] notice: received sync request (epoch 1/43180/00000001)
2024-06-27T17:52:20.882091+03:00 Host034 pmxcfs[43180]: [status] notice: received sync request (epoch 1/43180/00000001)
2024-06-27T17:52:20.889914+03:00 Host034 pmxcfs[43180]: [dcdb] notice: received all states
2024-06-27T17:52:20.890002+03:00 Host034 pmxcfs[43180]: [dcdb] notice: leader is 1/43180
2024-06-27T17:52:20.890040+03:00 Host034 pmxcfs[43180]: [dcdb] notice: synced members: 1/43180, 2/4582
2024-06-27T17:52:20.890115+03:00 Host034 pmxcfs[43180]: [dcdb] notice: start sending inode updates
2024-06-27T17:52:20.890163+03:00 Host034 pmxcfs[43180]: [dcdb] notice: sent all (0) updates
2024-06-27T17:52:20.890215+03:00 Host034 pmxcfs[43180]: [dcdb] notice: all data is up to date
2024-06-27T17:52:20.891820+03:00 Host034 pmxcfs[43180]: [status] notice: received all states
2024-06-27T17:52:20.892339+03:00 Host034 pmxcfs[43180]: [status] notice: all data is up to date
2024-06-27T17:52:21.728831+03:00 Host034 pve-ha-crm[4921]: cluster file system update failed - ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:21.731286+03:00 Host034 pve-ha-lrm[4945]: cluster file system update failed - ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:21.731469+03:00 Host034 pve-ha-lrm[4945]: updating service status from manager failed: Connection refused
2024-06-27T17:52:21.858073+03:00 Host034 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
2024-06-27T17:52:22.449057+03:00 Host034 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=7/BUS
2024-06-27T17:52:22.449309+03:00 Host034 systemd[1]: pve-cluster.service: Failed with result 'signal'.
2024-06-27T17:52:22.589512+03:00 Host034 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 262.
2024-06-27T17:52:22.589715+03:00 Host034 systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
2024-06-27T17:52:22.598881+03:00 Host034 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
2024-06-27T17:52:22.611933+03:00 Host034 pmxcfs[43189]: [main] notice: resolved node name 'Host034' to '95.101.219.125' for default node IP address
2024-06-27T17:52:22.612052+03:00 Host034 pmxcfs[43189]: [main] notice: resolved node name 'Host034' to '95.101.219.125' for default node IP address
2024-06-27T17:52:23.586438+03:00 Host034 pveproxy[4938]: ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:23.586647+03:00 Host034 pveproxy[4938]: ipcc_send_rec[2] failed: Connection refused
2024-06-27T17:52:23.586687+03:00 Host034 pveproxy[4938]: ipcc_send_rec[3] failed: Connection refused
2024-06-27T17:52:23.589309+03:00 Host034 pveproxy[4937]: ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:23.589389+03:00 Host034 pveproxy[4937]: ipcc_send_rec[2] failed: Connection refused
2024-06-27T17:52:23.589443+03:00 Host034 pveproxy[4937]: ipcc_send_rec[3] failed: Connection refused
2024-06-27T17:52:23.590118+03:00 Host034 pveproxy[4939]: ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:23.590189+03:00 Host034 pveproxy[4939]: ipcc_send_rec[2] failed: Connection refused
2024-06-27T17:52:23.590249+03:00 Host034 pveproxy[4939]: ipcc_send_rec[3] failed: Connection refused
2024-06-27T17:52:23.970925+03:00 Host034 pve-firewall[4868]: status update error: Connection refused
2024-06-27T17:52:24.446165+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Transport endpoint is not connected
2024-06-27T17:52:24.446409+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.446471+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.446573+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.446770+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.446947+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447124+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447315+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447562+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447788+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447977+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.448145+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.490328+03:00 Host034 pvestatd[4877]: sdn status update error: Connection refused
2024-06-27T17:52:25.180918+03:00 Host034 kernel: [ 1699.600492] DMAR: DRHD: handling fault status reg 402
2024-06-27T17:52:25.180947+03:00 Host034 kernel: [ 1699.601077] DMAR: [DMA Write NO_PASID] Request device [04:00.0] fault addr 0x791f4000 [fault reason 0x05] PTE Write access is not set
2024-06-27T17:52:25.992215+03:00 Host034 pmxcfs[43198]: [status] notice: update cluster info (cluster name  Storage02, version = 2)
2024-06-27T17:52:25.999941+03:00 Host034 pmxcfs[43198]: [status] notice: node has quorum
2024-06-27T17:52:26.000047+03:00 Host034 pmxcfs[43198]: [dcdb] notice: members: 1/43198, 2/4582
2024-06-27T17:52:26.000088+03:00 Host034 pmxcfs[43198]: [dcdb] notice: starting data syncronisation
2024-06-27T17:52:26.000179+03:00 Host034 pmxcfs[43198]: [dcdb] notice: received sync request (epoch 1/43198/00000001)
2024-06-27T17:52:26.000822+03:00 Host034 pmxcfs[43198]: [status] notice: members: 1/43198, 2/4582
2024-06-27T17:52:26.000872+03:00 Host034 pmxcfs[43198]: [status] notice: starting data syncronisation
2024-06-27T17:52:26.001951+03:00 Host034 pmxcfs[43198]: [dcdb] notice: received all states
2024-06-27T17:52:26.002027+03:00 Host034 pmxcfs[43198]: [dcdb] notice: leader is 1/43198
2024-06-27T17:52:26.002084+03:00 Host034 pmxcfs[43198]: [dcdb] notice: synced members: 1/43198, 2/4582
2024-06-27T17:52:26.002145+03:00 Host034 pmxcfs[43198]: [dcdb] notice: start sending inode updates
2024-06-27T17:52:26.002181+03:00 Host034 pmxcfs[43198]: [dcdb] notice: sent all (0) updates
2024-06-27T17:52:26.002217+03:00 Host034 pmxcfs[43198]: [dcdb] notice: all data is up to date
2024-06-27T17:52:26.002250+03:00 Host034 pmxcfs[43198]: [status] notice: received sync request (epoch 1/43198/00000001)
2024-06-27T17:52:26.010716+03:00 Host034 pmxcfs[43198]: [status] notice: received all states
2024-06-27T17:52:26.011277+03:00 Host034 pmxcfs[43198]: [status] notice: all data is up to date
2024-06-27T17:52:26.591501+03:00 Host034 pveproxy[4938]: ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:26.591729+03:00 Host034 pveproxy[4938]: ipcc_send_rec[2] failed: Connection refused
2024-06-27T17:52:26.591779+03:00 Host034 pveproxy[4938]: ipcc_send_rec[3] failed: Connection refused
2024-06-27T17:52:26.857795+03:00 Host034 pve-ha-crm[4921]: cluster file system update failed - ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:26.860232+03:00 Host034 pve-ha-lrm[4945]: cluster file system update failed - ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:26.860476+03:00 Host034 pve-ha-lrm[4945]: updating service status from manager failed: Connection refused
2024-06-27T17:52:26.975574+03:00 Host034 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
 
Code:
2024-06-27T17:52:22.449057+03:00 Host034 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=7/BUS
2024-06-27T17:52:22.449309+03:00 Host034 systemd[1]: pve-cluster.service: Failed with result 'signal'.
2024-06-27T17:52:22.589512+03:00 Host034 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 262.

this is definitely not normal..

did the problems start after a kernel update? if so, which was the old version that worked?
 
Code:
2024-06-27T17:52:22.449057+03:00 Host034 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=7/BUS
2024-06-27T17:52:22.449309+03:00 Host034 systemd[1]: pve-cluster.service: Failed with result 'signal'.
2024-06-27T17:52:22.589512+03:00 Host034 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 262.

this is definitely not normal..

did the problems start after a kernel update? if so, which was the old version that worked?
yes, the problem arose after 3 crashes on kernel 6.8.8.-1 updating to 6.8.8.-2 did not help

The last working kernel version was 6.8.4-3

There is another server running in parallel with the same hardware configuration, but everything is fine with it even on 6.8.8.-1 (is it worth upgrading to 6.8.8.-2?)
 
could you maybe post a full boot up log?