Memory leak(?) on 6.8.4-2/3 PVE 8.2

Hello,
I'm on kernel 6.8.8-2-pve and have the same dmesg messages.

Bash:
# dmesg -T | grep RPC
[Sun Jun 30 01:33:57 2024] RPC: Could not send backchannel reply error: -110
[Mon Jul  1 00:05:22 2024] RPC: Could not send backchannel reply error: -110
[Mon Jul  1 00:19:27 2024] RPC: Could not send backchannel reply error: -110
[Mon Jul  1 00:31:04 2024] RPC: Could not send backchannel reply error: -110
[Mon Jul  1 03:44:59 2024] RPC: Could not send backchannel reply error: -110
[Tue Jul  2 00:05:13 2024] RPC: Could not send backchannel reply error: -110
[Tue Jul  2 00:17:45 2024] RPC: Could not send backchannel reply error: -110
[Tue Jul  2 00:29:00 2024] RPC: Could not send backchannel reply error: -110
[Wed Jul  3 00:05:22 2024] RPC: Could not send backchannel reply error: -110
[Wed Jul  3 03:03:18 2024] RPC: Could not send backchannel reply error: -110
[Wed Jul  3 04:05:35 2024] RPC: Could not send backchannel reply error: -110
[Thu Jul  4 00:05:26 2024] RPC: Could not send backchannel reply error: -110
[Thu Jul  4 00:17:51 2024] RPC: Could not send backchannel reply error: -110
[Thu Jul  4 00:29:06 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul  5 00:05:27 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul  5 00:17:50 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul  5 00:28:53 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul  5 03:03:28 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul  5 04:03:05 2024] RPC: Could not send backchannel reply error: -110
[Sun Jul  7 01:32:28 2024] RPC: Could not send backchannel reply error: -110
[Sun Jul  7 01:34:20 2024] RPC: Could not send backchannel reply error: -110
[Sun Jul  7 02:02:35 2024] RPC: Could not send backchannel reply error: -110
[Mon Jul  8 00:17:40 2024] RPC: Could not send backchannel reply error: -110
[Mon Jul  8 00:28:26 2024] RPC: Could not send backchannel reply error: -110
[Mon Jul  8 03:25:51 2024] RPC: Could not send backchannel reply error: -110
[Mon Jul  8 03:33:03 2024] RPC: Could not send backchannel reply error: -110

# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.5-1
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx8
intel-microcode: 3.20240514.1~deb12u1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

Since messages after 00 hours are to a server with kernel 6.5.13-5-pve, backups are made via NFS.
messages after 03 hours are to another server with kernel 6.8.8-2-pve, backups are made via NFS.

The server I am receiving messages on is part of a cluster (three servers). The disk array is built from glusterfs over the network, and zfs for the local arrays.
I also noticed another problem that I didn't have before switching to proxmox 8.2 and kernel 6.8.8-2-pve

When trying to migrate a disk on glusterfs, I see the following:
Code:
create full clone of drive scsi0 (Backup_Nasa:701/vm-701-disk-0.qcow2)
Formatting 'gluster://10.0.1.18/storage/images/701/vm-701-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=falloc compression_type=zlib size=64424509440 lazy_refcounts=off refcount_bits=16
[2024-07-08 08:32:24.699434 +0000] I [io-stats.c:3711:ios_sample_buf_size_configure] 0-storage: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-07-08 08:32:24.840683 +0000] E [MSGID: 108006] [afr-common.c:6123:__afr_handle_child_down_event] 0-storage-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-07-08 08:32:34.706295 +0000] I [io-stats.c:4043:fini] 0-storage: io-stats translator unloaded
[2024-07-08 08:32:35.713191 +0000] I [io-stats.c:3711:ios_sample_buf_size_configure] 0-storage: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-07-08 08:32:39.029405 +0000] E [MSGID: 108006] [afr-common.c:6123:__afr_handle_child_down_event] 0-storage-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-07-08 08:32:45.719002 +0000] I [io-stats.c:4043:fini] 0-storage: io-stats translator unloaded
drive mirror is starting for drive-scsi0
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: storage migration failed: mirroring error: VM 701 qmp command 'drive-mirror' failed - Could not open 'gluster://10.0.1.18/storage/images/701/vm-701-disk-0.qcow2': No such file or directory


The strange thing is that while the disc creation procedure is starting, it appears and then disappears for some reason.

I noticed that when writing to the glusterfs array, I get very slow speed. I don't have this problem when reading.

I am currently installing kernel 6.5.11-4-pve with which I know I have no problems, I will switch to it.

I hope this helps someone.
 
With kernel: 6.5.11-4-pve did not happen. The VM behaved as if the disk was very slow.
I used kernel for months in a row before upgrading to Proxmox 8.2. I'm currently running 6.5.13-5-pve, I think things are patched, but I'll wait for the next day.
And with this kernel I can't migrate the VM to glusterfs array. Apparently there is something in proxmox itself that is not happening correctly.
 
Hi,
I also noticed another problem that I didn't have before switching to proxmox 8.2 and kernel 6.8.8-2-pve

When trying to migrate a disk on glusterfs, I see the following:
Code:
create full clone of drive scsi0 (Backup_Nasa:701/vm-701-disk-0.qcow2)
Formatting 'gluster://10.0.1.18/storage/images/701/vm-701-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=falloc compression_type=zlib size=64424509440 lazy_refcounts=off refcount_bits=16
[2024-07-08 08:32:24.699434 +0000] I [io-stats.c:3711:ios_sample_buf_size_configure] 0-storage: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-07-08 08:32:24.840683 +0000] E [MSGID: 108006] [afr-common.c:6123:__afr_handle_child_down_event] 0-storage-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-07-08 08:32:34.706295 +0000] I [io-stats.c:4043:fini] 0-storage: io-stats translator unloaded
[2024-07-08 08:32:35.713191 +0000] I [io-stats.c:3711:ios_sample_buf_size_configure] 0-storage: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2024-07-08 08:32:39.029405 +0000] E [MSGID: 108006] [afr-common.c:6123:__afr_handle_child_down_event] 0-storage-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2024-07-08 08:32:45.719002 +0000] I [io-stats.c:4043:fini] 0-storage: io-stats translator unloaded
drive mirror is starting for drive-scsi0
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: storage migration failed: mirroring error: VM 701 qmp command 'drive-mirror' failed - Could not open 'gluster://10.0.1.18/storage/images/701/vm-701-disk-0.qcow2': No such file or directory


The strange thing is that while the disc creation procedure is starting, it appears and then disappears for some reason.
this most likely has the same cause as the issue reported here, because filename parsing is broken in special cases like clone/import with pve-qemu-kvm=9.0.0-5: https://forum.proxmox.com/threads/q...-no-subscription-as-of-now.149772/post-682012
It should be fixed in pve-qemu-kvm=9.0.0-6 currently available on the pvetest repository: https://forum.proxmox.com/threads/q...-no-subscription-as-of-now.149772/post-682174
 
Hi,
Not patch for me with 6.8.8-2-pve

Rollback to 6.5.13-3-pve ...

View attachment 71016
please give more details about your setup, i.e. what is your storage configuration, what kind of workloads are running? Please monitor the memory usage of user space processes, e.g. using htop and also check buffer/cache usage to check if it's actually the kernel eating the memory or something else.
 
Hi @fiona
as I went back to 6.5 I can't do it at the moment.Afterwards, from what I understood, it's a problem with Python since the new kernel which opens instances indefinitely. On my LXC, Frigate use python
 
Hello @fiona

Thank you for your answer.
I will consider an option to use to try with pvetest.
Is there an approximate release date for pve-qemu-kvm to a working repo?

Regards,
It has already been moved to no-subscription earlier today. The enterprise repository still only has QEMU 8.1 so was not affected.
 
I still seem to be getting a large amount of RPC related returns as of late.
Bash:
[Thu Jul 11 01:21:16 2024] RPC: Could not send backchannel reply error: -110
[Thu Jul 11 01:21:16 2024] RPC: Could not send backchannel reply error: -110
[Thu Jul 11 01:24:16 2024] RPC: Could not send backchannel reply error: -110
...
[Fri Jul 12 12:05:06 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 12:10:16 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 12:24:35 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 12:26:36 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 13:11:36 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 13:15:45 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 13:33:06 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 14:00:45 2024] RPC: Could not send backchannel reply error: -110

# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

There are no further available updates. While the system seems fine, this RPC return is just clogging the logs.
 
I still seem to be getting a large amount of RPC related returns as of late.
Bash:
[Thu Jul 11 01:21:16 2024] RPC: Could not send backchannel reply error: -110
[Thu Jul 11 01:21:16 2024] RPC: Could not send backchannel reply error: -110
[Thu Jul 11 01:24:16 2024] RPC: Could not send backchannel reply error: -110
...
[Fri Jul 12 12:05:06 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 12:10:16 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 12:24:35 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 12:26:36 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 13:11:36 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 13:15:45 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 13:33:06 2024] RPC: Could not send backchannel reply error: -110
[Fri Jul 12 14:00:45 2024] RPC: Could not send backchannel reply error: -110

# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

There are no further available updates. While the system seems fine, this RPC return is just clogging the logs.
Sounds like it could be https://bugzilla.proxmox.com/show_bug.cgi?id=5558
A fix has been sent to the mailing list by @fabian: https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!