Pvedaemon - writing cluster log failed ipcc_send_rec[7] failed - invalid argument

Nov 3, 2021
5
0
1
44
Over the last 3-4 weeks, our main login server has started seeing this error appearing when users log into the system.
When we restart services returns to normal but the more people connect the faster we start to see these errors.
The slower the server responds. While our other login servers seem fine with almost the same user load.
(In the past we have had more than double the amount of users we have now with little to no slowdown)

Those connecting through this server report very slow responses getting to their vm's. Seeing messages with broken pipe error or just just connection timed out.
Then have to refresh and hope it loads. Once they do get to their VM's those run fine with no errors.

We currently run a 10 node cluster, with vm running across multiple nodes. I have made sure we are up to date with our patches across all nodes.
I have reinstalled certain packages to be sure and running apt install --fix-broken.

Details:
As you can see not supper busy, yet.
1635951177465.png

proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-14
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

Cluster information
-------------------
Name: SoftwareLabs
Config Version: 10
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Nov 3 10:18:15 2021
Quorum provider: corosync_votequorum
Nodes: 10
Node ID: 0x00000002
Ring ID: 1.172
Quorate: Yes

Votequorum information
----------------------
Expected votes: 10
Highest expected: 10
Total votes: 10
Quorum: 6
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.0.5.201
0x00000002 1 192.0.5.202 (local)
0x00000003 1 192.0.5.203
0x00000004 1 192.0.5.205
0x00000005 1 192.0.5.210
0x00000006 1 192.0.5.206
0x00000007 1 192.0.5.204
0x00000008 1 192.0.5.207
0x00000009 1 192.0.5.208
0x0000000a 1 192.0.5.209

● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-11-03 08:39:17 EDT; 1h 41min ago
Process: 3141223 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 3141225 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 3141226 (pveproxy)
Tasks: 37 (limit: 309337)
Memory: 1.8G
CPU: 5h 5min 17.911s
CGroup: /system.slice/pveproxy.service
├─3141226 pveproxy
├─3174935 pveproxy worker (shutdown)
├─3179472 pveproxy worker (shutdown)
├─3180549 pveproxy worker (shutdown)
├─3184945 pveproxy worker (shutdown)
├─3185368 pveproxy worker (shutdown)
├─3187263 pveproxy worker (shutdown)
├─3189947 pveproxy worker (shutdown)
├─3198803 pveproxy worker (shutdown)
├─3200285 pveproxy worker (shutdown)
├─3205330 pveproxy worker (shutdown)
├─3206918 pveproxy worker (shutdown)
├─3209765 pveproxy worker (shutdown)
├─3214301 pveproxy worker (shutdown)
├─3222634 pveproxy worker (shutdown)
├─3223680 pveproxy worker (shutdown)
├─3227867 pveproxy worker (shutdown)
├─3228813 pveproxy worker (shutdown)
├─3246347 pveproxy worker (shutdown)
├─3256405 pveproxy worker (shutdown)
├─3259920 pveproxy worker (shutdown)
├─3260467 pveproxy worker (shutdown)
├─3264765 pveproxy worker (shutdown)
├─3266740 pveproxy worker (shutdown)
├─3267771 pveproxy worker (shutdown)
├─3271148 pveproxy worker (shutdown)
├─3274110 pveproxy worker (shutdown)
├─3275912 pveproxy worker (shutdown)
├─3277710 pveproxy worker (shutdown)
├─3280890 pveproxy worker (shutdown)
├─3283104 pveproxy worker (shutdown)
├─3294307 pveproxy worker (shutdown)
├─3294610 pveproxy worker (shutdown)
├─3295548 pveproxy worker (shutdown)
├─3298678 pveproxy worker
├─3300291 pveproxy worker
└─3300414 pveproxy worker

Nov 03 10:18:44 pvl02 pveproxy[3141226]: worker 3300291 started
Nov 03 10:18:48 pvl02 pveproxy[3295549]: 2021-11-03 10:18:48.203453 -0400 error AnyEvent::Util: Runtime error in AnyEvent::guard callback: Can't call method "_put_session" on an undef>
Nov 03 10:18:48 pvl02 pveproxy[3141226]: worker 3295549 finished
Nov 03 10:18:48 pvl02 pveproxy[3141226]: starting 1 worker(s)
Nov 03 10:18:48 pvl02 pveproxy[3141226]: worker 3300414 started
Nov 03 10:18:49 pvl02 pveproxy[3300414]: Clearing outdated entries from certificate cache
Nov 03 10:18:49 pvl02 pveproxy[3300291]: Clearing outdated entries from certificate cache
Nov 03 10:18:52 pvl02 pveproxy[3300413]: got inotify poll request in wrong process - disabling inotify
Nov 03 10:18:53 pvl02 pveproxy[3300413]: worker exit
Nov 03 10:20:10 pvl02 pveproxy[3197536]: worker exit

As seen in the last spoiler for service pveproxy status, we are also seeing this AnyEvent error related to login's.
Nov 3 08:04:18 pvl02 pveproxy[3070985]: worker exit
Nov 3 08:04:18 pvl02 pveproxy[3070985]: 2021-11-03 08:04:18.317764 -0400 error AnyEvent::Util: Runtime error in AnyEvent::guard callback: Can't call method "_put_session" on an undefined value at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Handle.pm line 2259 during global destruction.
Nov 3 08:04:18 pvl02 pveproxy[3070985]: 2021-11-03 08:04:18.319482 -0400 error AnyEvent::Util: Runtime error in AnyEvent::guard callback: Can't call method "_put_session" on an undefined value at /usr/lib/x86_64-linux-gnu/perl5/5.32/AnyEvent/Handle.pm line 2259 during global destruction.
Nov 3 08:04:18 pvl02 pveproxy[1913139]: worker 3070985 finished
Nov 3 08:04:18 pvl02 pveproxy[1913139]: starting 1 worker(s)
Nov 3 08:04:18 pvl02 pvedaemon[2167596]: <user1@example.com> end task UPID:pvl02:002EE505:05F6CF0D:61827A93:vncproxy:333:user1@example.com: successful
Nov 3 08:04:18 pvl02 pvedaemon[2167596]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Nov 3 08:04:18 pvl02 pveproxy[1913139]: worker 3074900 started
Nov 3 08:04:18 pvl02 pvedaemon[2167598]: <user2@example.com> end task UPID:pvl02:002EDC96:05F6B1AA:61827A48:vncproxy:371:user2@example.com: successful
Nov 3 08:04:18 pvl02 pvedaemon[2167598]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Nov 3 08:04:18 pvl02 pvedaemon[2167597]: <user3@example.com> end task UPID:pvl02:002EE0E8:05F6C14E:61827A70:vncproxy:384:user3@example.com: successful
Nov 3 08:04:18 pvl02 pvedaemon[2167597]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Nov 3 08:04:18 pvl02 pvedaemon[2167596]: <user4@example.com> end task UPID:pvl02:002EDDF0:05F6B65D:61827A54:vncproxy:365:user4@example.com: successful
Nov 3 08:04:18 pvl02 pvedaemon[2167598]: <user5@example.com> end task UPID:pvl02:002EE72E:05F6D54A:61827AA3:vncproxy:354:user5@example.com: successful
Nov 3 08:04:18 pvl02 pvedaemon[2167598]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument
Nov 3 08:04:18 pvl02 pvedaemon[2167596]: <user6@example.com> end task UPID:pvl02:002EDDDB:05F6B651:61827A54:vncproxy:364:user6@example.com: successful
Nov 3 08:04:18 pvl02 pvedaemon[2167596]: writing cluster log failed: ipcc_send_rec[7] failed: Invalid argument

At this time we are doing our best to redirect traffic to other servers. We have reviewed our network switches etc. Traffic appears minimal and no packet errors reported.
The servers where restarted between the time these errors started appearing and now. We would really not like to take this machine down to rebuild it, if we can track down why this is happening on just this machine.

Any input towards this issue would be most welcome.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!