unexpected cgroup entry error

tsumaru720 · Jul 12, 2023

Hi,

I have a 3 node pve cluster. I have an "opnsense" vm running on iscsi storage. I've noticed the following error getting spammed in syslog

Code:

2023-07-12T13:40:58.394754+01:00 pve1 qmeventd[1946]: unexpected cgroup entry 13:blkio:/qemu.slice
2023-07-12T13:40:58.395072+01:00 pve1 qmeventd[1946]: could not get vmid from pid 236351
2023-07-12T13:41:03.396247+01:00 pve1 qmeventd[1946]: unexpected cgroup entry 13:blkio:/qemu.slice
2023-07-12T13:41:03.396533+01:00 pve1 qmeventd[1946]: could not get vmid from pid 236351
2023-07-12T13:41:08.397322+01:00 pve1 qmeventd[1946]: unexpected cgroup entry 13:blkio:/qemu.slice
2023-07-12T13:41:08.397626+01:00 pve1 qmeventd[1946]: could not get vmid from pid 236351

the PID in question corresponds to the KVM process attached to my VM.

Curiously, I get this error only on two out of my three nodes - the third one doesnt report the same thing in syslog. Despite all this, the VM itself does appear to be working correctly despite the error.

My cluster does currently have

systemd.unified_cgroup_hierarchy=0

enabled for some legacy LXC containers that i'm working on migrating. Does anyone know what the error actually means?

VM config

Code:

agent: 1,fstrim_cloned_disks=1
bootdisk: scsi0
cores: 1
cpu: host
memory: 1024
name: vpn2
net0: virtio=3E:6A:2B:E6:1A:57,bridge=vlan110
numa: 0
onboot: 1
ostype: l26
scsi0: san:vm-102-disk-0,discard=on,iothread=1,size=8G
scsihw: virtio-scsi-single
smbios1: uuid=113c06af-3815-4e07-9be1-686397541ba0
sockets: 2
tablet: 0
tags: san;ha
vmgenid: a121ca48-1522-4631-90ce-c6908448450d

tsumaru720 · Jul 12, 2023

I'm not sure if it makes a difference or not, but here is the contents of /proc/<pid>/cgroup of the VM

On the "working" node, ie, the one that doesn't fill syslog

Code:

# cat /proc/$(ps auxf | grep kvm | grep " 102 " | awk '{print $2}')/cgroup
13:hugetlb:/
12:memory:/qemu.slice/102.scope
11:perf_event:/
10:pids:/qemu.slice/102.scope
9:freezer:/
8:devices:/qemu.slice
7:net_cls,net_prio:/
6:cpu,cpuacct:/qemu.slice/102.scope
5:misc:/
4:cpuset:/
3:blkio:/qemu.slice
2:rdma:/
1:name=systemd:/qemu.slice/102.scope
0::/qemu.slice/102.scope

vs what it looks like on a node that gives the syslog errors

Code:

# cat /proc/$(ps auxf | grep kvm | grep " 102 " | awk '{print $2}')/cgroup
13:blkio:/qemu.slice
12:cpuset:/
11:devices:/qemu.slice
10:hugetlb:/
9:rdma:/
8:misc:/
7:memory:/qemu.slice/102.scope
6:net_cls,net_prio:/
5:perf_event:/
4:cpu,cpuacct:/qemu.slice/102.scope
3:freezer:/
2:pids:/qemu.slice/102.scope
1:name=systemd:/qemu.slice/102.scope
0::/qemu.slice/102.scope

tsumaru720 · Jul 12, 2023

Alright, I think this is a bug in qmeventd

https://github.com/proxmox/qemu-server/blob/master/qmeventd/qmeventd.c#L104-L140

Looking at the code, it appears to look for lines that have "qemu.slice" and tries to extract the VM id from it. For some reason the list of cgroups on my other two servers are a different ordering

So, on the "working" server, the first line it hits is

Code:

12:memory:/qemu.slice/102.scope

And thus it works

Whereas on the servers that're having issues, the first line is

Code:

13:blkio:/qemu.slice

Which doesnt have a VM ID so i think it bails
I'll submit a bug report

tsumaru720 · Jul 12, 2023

https://bugzilla.proxmox.com/show_bug.cgi?id=4844

Fra · Jul 17, 2023

I'm having the same issue (latest Proxmox 8): I'm available for reporting my context

I've systemd.unified_cgroup_hierarchy=0 too

Code:

pveversion
pve-manager/8.0.3/bbf3993334bfa916 (running kernel: 6.2.16-4-pve)

lotharea · Aug 18, 2023

Hi,
What you could do is:

1. Set the "systemd.unified_cgroup_hierarchy=0" to 1 or remove it:

Bash:

# Edit grub menu
nano /etc/default/grub

# Remove systemd.unified_cgroup_hierarchy=0 from CMDLINE:
GRUB_CMDLINE_LINUX_DEFAULT="quiet systemd.unified_cgroup_hierarchy=0"

# So it looks like:
GRUB_CMDLINE_LINUX_DEFAULT="quiet"

# Run update-grub
update-grub

# Reboot the host
sudo reboot

2. Change the order of Mount Units (advanced):

Bash:

# View the mounts
systemctl list-units -t mount

# Create a new mount unit for blkio:
nano /etc/systemd/system/sys-fs-cgroup-blkio.mount

# Add the following code:
[Unit]
Description=Cgroup blkio Mountpoint
DefaultDependencies=no
Conflicts=umount.target
Before=umount.target

[Mount]
What=cgroup
Where=/sys/fs/cgroup/blkio
Type=cgroup
Options=blkio

[Install]
WantedBy=multi-user.target

# If you want it to load before a different Mount Unit:
Before=sys-fs-cgroup-memory.mount

# If you want it to load after a different Mount Unit:
After=sys-fs-cgroup-blkio.mount

# Enable and start the custom Mount Units
systemctl daemon-reload
systemctl enable sys-fs-cgroup-blkio.mount
systemctl start sys-fs-cgroup-blkio.mount

# Reboot the server
sudo reboot

Given this is a strong customisation and can result in your system not booting up correctly, make sure you test the above approach on a test environment first.

Fra · Aug 19, 2023

lotharea both 1. and 2. or either one?

in my default (pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-6-pve)

I already have


GRUB_CMDLINE_LINUX_DEFAULT="quiet"

but still see this noise

lotharea · Aug 21, 2023

@Fra
Either - I'd suggest you try the first one first as it should be much easier to revert.

When you run:
cat /proc/cmdline

Does the value returned contain systemd.unified_cgroup_hierarchy=0 ?

If yes, check your /etc/default/grub.

If any of the lines starting with:
GRUB_CMDLINE_LINUX_DEFAULT
GRUB_CMDLINE_LINUX

have systemd.unified_cgroup_hierarchy=0 - remove it and save the file.
Run update-grub and reboot the machine.

Fra · Aug 22, 2023

ok, thanks.
So, I do not have to do anything since my pristine installation of pve 8 has

Code:

root@proxmox:~# pveversion 
pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-6-pve)

root@proxmox:~# grep GRUB_CMDLINE_LINUX /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

root@proxmox:~# grep systemd.unified_cgroup_hierarchy /proc/cmdline
initrd=\EFI\proxmox\6.2.16-6-pve\initrd.img-6.2.16-6-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs systemd.unified_cgroup_hierarchy=0

Fra · Aug 22, 2023

uh...
I have this one:

Code:

# grep -r cgroup /etc/
/etc/kernel/cmdline:root=ZFS=rpool/ROOT/pve-1 boot=zfs systemd.unified_cgroup_hierarchy=0

root@proxmox:~# df -h
Filesystem                           Size  Used Avail Use% Mounted on
udev                                 126G     0  126G   0% /dev
tmpfs                                 26G  1.9M   26G   1% /run
rpool/ROOT/pve-1                     6.7T  2.3T  4.5T  34% /
tmpfs                                126G   46M  126G   1% /dev/shm
tmpfs                                5.0M     0  5.0M   0% /run/lock
tmpfs                                4.0M     0  4.0M   0% /sys/fs/cgroup
rpool                                4.5T  128K  4.5T   1% /rpool
rpool/ROOT                           4.5T  128K  4.5T   1% /rpool/ROOT
rpool/data                           4.5T  128K  4.5T   1% /rpool/data
rpool/data/subvol-701-disk-1         250G  172G   79G  69% /rpool/data/subvol-701-disk-1
rpool/data/subvol-701-disk-0          80G  7.1G   73G   9% /rpool/data/subvol-701-disk-0
/dev/fuse                            128M   32K  128M   1% /etc/pve
tmpfs                                 26G     0   26G   0% /run/user/1000

Search

Search

unexpected cgroup entry error

tsumaru720

Well-Known Member

tsumaru720

Well-Known Member

tsumaru720

Well-Known Member

tsumaru720

Well-Known Member

Fra

Renowned Member

lotharea

New Member

Fra

Renowned Member

lotharea both 1. and 2. or either one?

in my default (pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-6-pve)

I already have

lotharea

New Member

Fra

Renowned Member

Fra

Renowned Member

unexpected cgroup entry error

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Renowned Member

New Member

Renowned Member

lotharea both 1. and 2. or either one?​

​

in my default (pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-6-pve)​

I already have​

New Member

Renowned Member

Renowned Member

lotharea both 1. and 2. or either one?

in my default (pve-manager/8.0.4/d258a813cfa6b390 (running kernel: 6.2.16-6-pve)

I already have