Connection error 500: RPCEnvironment init request failed: Unable to load access control list: Connection refused

Maksimus

Member
May 16, 2022
71
2
13
On June 19 at 14:20-14:30 the server rebooted for no reason.
06/22 in the morning around 9:00 the server hangs,
27\06 14:16 the server freezes completely (we cannot control it via the console, see screenshot 214), after which gui pve stops loading

Connection error 500: RPCEnvironment init request failed: Unable to load access control list: Connection refused
when you try to enter the /ets/pve directory (see screenshot 215 and 216), an error is also thrown, if you start the pve-cluster service manually, the directory becomes accessible and pve gui also starts working, after about 30 seconds of operation it crashes.

pveversion
pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-1-pve)
 

Attachments

  • Screenshot_215.png
    Screenshot_215.png
    3.2 KB · Views: 4
  • Screenshot_216.png
    Screenshot_216.png
    3.4 KB · Views: 4
  • Screenshot_214.png
    Screenshot_214.png
    86.9 KB · Views: 4
Last edited:
After upgrading to kernel 6.8.8-2, the logs contain tons of errors, the gui sometimes works, sometimes it doesn’t.

Code:
2024-06-27T17:52:20.599777+03:00 Host034 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
2024-06-27T17:52:20.612371+03:00 Host034 pmxcfs[43173]: [main] notice: resolved node name 'Host034' to '95.101.219.125' for default node IP address
2024-06-27T17:52:20.612485+03:00 Host034 pmxcfs[43173]: [main] notice: resolved node name 'Host034' to '95.101.219.125' for default node IP address
2024-06-27T17:52:20.854465+03:00 Host034 systemd[1]: etc-pve.mount: Deactivated successfully.
2024-06-27T17:52:20.873829+03:00 Host034 pmxcfs[43180]: [status] notice: update cluster info (cluster name  Storage02, version = 2)
2024-06-27T17:52:20.881152+03:00 Host034 pmxcfs[43180]: [status] notice: node has quorum
2024-06-27T17:52:20.881288+03:00 Host034 pmxcfs[43180]: [dcdb] notice: members: 1/43180, 2/4582
2024-06-27T17:52:20.881326+03:00 Host034 pmxcfs[43180]: [dcdb] notice: starting data syncronisation
2024-06-27T17:52:20.881487+03:00 Host034 pmxcfs[43180]: [status] notice: members: 1/43180, 2/4582
2024-06-27T17:52:20.881523+03:00 Host034 pmxcfs[43180]: [status] notice: starting data syncronisation
2024-06-27T17:52:20.881566+03:00 Host034 pmxcfs[43180]: [dcdb] notice: received sync request (epoch 1/43180/00000001)
2024-06-27T17:52:20.882091+03:00 Host034 pmxcfs[43180]: [status] notice: received sync request (epoch 1/43180/00000001)
2024-06-27T17:52:20.889914+03:00 Host034 pmxcfs[43180]: [dcdb] notice: received all states
2024-06-27T17:52:20.890002+03:00 Host034 pmxcfs[43180]: [dcdb] notice: leader is 1/43180
2024-06-27T17:52:20.890040+03:00 Host034 pmxcfs[43180]: [dcdb] notice: synced members: 1/43180, 2/4582
2024-06-27T17:52:20.890115+03:00 Host034 pmxcfs[43180]: [dcdb] notice: start sending inode updates
2024-06-27T17:52:20.890163+03:00 Host034 pmxcfs[43180]: [dcdb] notice: sent all (0) updates
2024-06-27T17:52:20.890215+03:00 Host034 pmxcfs[43180]: [dcdb] notice: all data is up to date
2024-06-27T17:52:20.891820+03:00 Host034 pmxcfs[43180]: [status] notice: received all states
2024-06-27T17:52:20.892339+03:00 Host034 pmxcfs[43180]: [status] notice: all data is up to date
2024-06-27T17:52:21.728831+03:00 Host034 pve-ha-crm[4921]: cluster file system update failed - ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:21.731286+03:00 Host034 pve-ha-lrm[4945]: cluster file system update failed - ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:21.731469+03:00 Host034 pve-ha-lrm[4945]: updating service status from manager failed: Connection refused
2024-06-27T17:52:21.858073+03:00 Host034 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
2024-06-27T17:52:22.449057+03:00 Host034 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=7/BUS
2024-06-27T17:52:22.449309+03:00 Host034 systemd[1]: pve-cluster.service: Failed with result 'signal'.
2024-06-27T17:52:22.589512+03:00 Host034 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 262.
2024-06-27T17:52:22.589715+03:00 Host034 systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
2024-06-27T17:52:22.598881+03:00 Host034 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
2024-06-27T17:52:22.611933+03:00 Host034 pmxcfs[43189]: [main] notice: resolved node name 'Host034' to '95.101.219.125' for default node IP address
2024-06-27T17:52:22.612052+03:00 Host034 pmxcfs[43189]: [main] notice: resolved node name 'Host034' to '95.101.219.125' for default node IP address
2024-06-27T17:52:23.586438+03:00 Host034 pveproxy[4938]: ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:23.586647+03:00 Host034 pveproxy[4938]: ipcc_send_rec[2] failed: Connection refused
2024-06-27T17:52:23.586687+03:00 Host034 pveproxy[4938]: ipcc_send_rec[3] failed: Connection refused
2024-06-27T17:52:23.589309+03:00 Host034 pveproxy[4937]: ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:23.589389+03:00 Host034 pveproxy[4937]: ipcc_send_rec[2] failed: Connection refused
2024-06-27T17:52:23.589443+03:00 Host034 pveproxy[4937]: ipcc_send_rec[3] failed: Connection refused
2024-06-27T17:52:23.590118+03:00 Host034 pveproxy[4939]: ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:23.590189+03:00 Host034 pveproxy[4939]: ipcc_send_rec[2] failed: Connection refused
2024-06-27T17:52:23.590249+03:00 Host034 pveproxy[4939]: ipcc_send_rec[3] failed: Connection refused
2024-06-27T17:52:23.970925+03:00 Host034 pve-firewall[4868]: status update error: Connection refused
2024-06-27T17:52:24.446165+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Transport endpoint is not connected
2024-06-27T17:52:24.446409+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.446471+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.446573+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.446770+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.446947+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447124+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447315+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447562+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447788+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.447977+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.448145+03:00 Host034 pvestatd[4877]: ipcc_send_rec[4] failed: Connection refused
2024-06-27T17:52:24.490328+03:00 Host034 pvestatd[4877]: sdn status update error: Connection refused
2024-06-27T17:52:25.180918+03:00 Host034 kernel: [ 1699.600492] DMAR: DRHD: handling fault status reg 402
2024-06-27T17:52:25.180947+03:00 Host034 kernel: [ 1699.601077] DMAR: [DMA Write NO_PASID] Request device [04:00.0] fault addr 0x791f4000 [fault reason 0x05] PTE Write access is not set
2024-06-27T17:52:25.992215+03:00 Host034 pmxcfs[43198]: [status] notice: update cluster info (cluster name  Storage02, version = 2)
2024-06-27T17:52:25.999941+03:00 Host034 pmxcfs[43198]: [status] notice: node has quorum
2024-06-27T17:52:26.000047+03:00 Host034 pmxcfs[43198]: [dcdb] notice: members: 1/43198, 2/4582
2024-06-27T17:52:26.000088+03:00 Host034 pmxcfs[43198]: [dcdb] notice: starting data syncronisation
2024-06-27T17:52:26.000179+03:00 Host034 pmxcfs[43198]: [dcdb] notice: received sync request (epoch 1/43198/00000001)
2024-06-27T17:52:26.000822+03:00 Host034 pmxcfs[43198]: [status] notice: members: 1/43198, 2/4582
2024-06-27T17:52:26.000872+03:00 Host034 pmxcfs[43198]: [status] notice: starting data syncronisation
2024-06-27T17:52:26.001951+03:00 Host034 pmxcfs[43198]: [dcdb] notice: received all states
2024-06-27T17:52:26.002027+03:00 Host034 pmxcfs[43198]: [dcdb] notice: leader is 1/43198
2024-06-27T17:52:26.002084+03:00 Host034 pmxcfs[43198]: [dcdb] notice: synced members: 1/43198, 2/4582
2024-06-27T17:52:26.002145+03:00 Host034 pmxcfs[43198]: [dcdb] notice: start sending inode updates
2024-06-27T17:52:26.002181+03:00 Host034 pmxcfs[43198]: [dcdb] notice: sent all (0) updates
2024-06-27T17:52:26.002217+03:00 Host034 pmxcfs[43198]: [dcdb] notice: all data is up to date
2024-06-27T17:52:26.002250+03:00 Host034 pmxcfs[43198]: [status] notice: received sync request (epoch 1/43198/00000001)
2024-06-27T17:52:26.010716+03:00 Host034 pmxcfs[43198]: [status] notice: received all states
2024-06-27T17:52:26.011277+03:00 Host034 pmxcfs[43198]: [status] notice: all data is up to date
2024-06-27T17:52:26.591501+03:00 Host034 pveproxy[4938]: ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:26.591729+03:00 Host034 pveproxy[4938]: ipcc_send_rec[2] failed: Connection refused
2024-06-27T17:52:26.591779+03:00 Host034 pveproxy[4938]: ipcc_send_rec[3] failed: Connection refused
2024-06-27T17:52:26.857795+03:00 Host034 pve-ha-crm[4921]: cluster file system update failed - ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:26.860232+03:00 Host034 pve-ha-lrm[4945]: cluster file system update failed - ipcc_send_rec[1] failed: Connection refused
2024-06-27T17:52:26.860476+03:00 Host034 pve-ha-lrm[4945]: updating service status from manager failed: Connection refused
2024-06-27T17:52:26.975574+03:00 Host034 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
 
Code:
2024-06-27T17:52:22.449057+03:00 Host034 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=7/BUS
2024-06-27T17:52:22.449309+03:00 Host034 systemd[1]: pve-cluster.service: Failed with result 'signal'.
2024-06-27T17:52:22.589512+03:00 Host034 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 262.

this is definitely not normal..

did the problems start after a kernel update? if so, which was the old version that worked?
 
Code:
2024-06-27T17:52:22.449057+03:00 Host034 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=7/BUS
2024-06-27T17:52:22.449309+03:00 Host034 systemd[1]: pve-cluster.service: Failed with result 'signal'.
2024-06-27T17:52:22.589512+03:00 Host034 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 262.

this is definitely not normal..

did the problems start after a kernel update? if so, which was the old version that worked?
yes, the problem arose after 3 crashes on kernel 6.8.8.-1 updating to 6.8.8.-2 did not help

The last working kernel version was 6.8.4-3

There is another server running in parallel with the same hardware configuration, but everything is fine with it even on 6.8.8.-1 (is it worth upgrading to 6.8.8.-2?)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!