System is a M11SDV-8C-LN4F with an X710-DA2 (SR-IOV used). System has been working fine for over a year now.
Also working fine is the base OS (Debian Bullseye), no kernel errors logged and no problems with heavy harddisk or network I/O, e.g. when backing up VMs to a NAS.
An OPNsense VM is also not affected, can push through gigabytes of traffic and write I/O with Zenarmor's database is also not causing troubles.
Going back through kern.log of the Ubuntu machines, it seems to have started after I updated to this:
Starting with the day after the update, e.g. speedest started causing the abovementioned problems. First log entry is Oct 17th.
Example crash, but various other processes (e.g. kswapd,kworker,swapper) are also showing similar behaviour:
Also working fine is the base OS (Debian Bullseye), no kernel errors logged and no problems with heavy harddisk or network I/O, e.g. when backing up VMs to a NAS.
An OPNsense VM is also not affected, can push through gigabytes of traffic and write I/O with Zenarmor's database is also not causing troubles.
Going back through kern.log of the Ubuntu machines, it seems to have started after I updated to this:
Code:
Start-Date: 2022-10-16 07:55:41
Commandline: apt-get dist-upgrade
Install: pve-kernel-5.15.60-2-pve:amd64 (5.15.60-2, automatic)
Upgrade: dbus-user-session:amd64 (1.12.20-2, 1.12.24-0+deb11u1), pve-firmware:amd64 (3.5-3, 3.5-4), tzdata:amd64 (2021a-1+deb11u5, 2021a-1+deb11u7), zfs-zed:amd64 (2.1.5-pve1, 2.1.6-pve1), libnvpair3linux:amd64 (2.1.5-pve1, 2.1.6-pve1), libuutil3linux:amd64 (2.1.5-pve1, 2.1.6-pve1), libpve-storage-perl:amd64 (7.2-9, 7.2-10), libzpool5linux:amd64 (2.1.5-pve1, 2.1.6-pve1), libpve-guest-common-perl:amd64 (4.1-2, 4.1-3), libdbus-1-3:amd64 (1.12.20-2, 1.12.24-0+deb11u1), isc-dhcp-common:amd64 (4.4.1-2.3, 4.4.1-2.3+deb11u1), proxmox-backup-file-restore:amd64 (2.2.6-1, 2.2.7-1), isc-dhcp-client:amd64 (4.4.1-2.3, 4.4.1-2.3+deb11u1), proxmox-backup-client:amd64 (2.2.6-1, 2.2.7-1), libpve-http-server-perl:amd64 (4.1-3, 4.1-4), libpve-common-perl:amd64 (7.2-2, 7.2-3), pve-kernel-5.15:amd64 (7.2-11, 7.2-12), libzfs4linux:amd64 (2.1.5-pve1, 2.1.6-pve1), dbus:amd64 (1.12.20-2, 1.12.24-0+deb11u1), pve-kernel-helper:amd64 (7.2-12, 7.2-13), zfsutils-linux:amd64 (2.1.5-pve1, 2.1.6-pve1)
End-Date: 2022-10-16 07:56:36
Starting with the day after the update, e.g. speedest started causing the abovementioned problems. First log entry is Oct 17th.
Code:
Oct 17 09:00:10 monitor kernel: [89066.126736] BUG: Bad page state in process speedtest pfn:5148f
Oct 18 12:00:12 monitor kernel: [186269.143191] BUG: Bad page state in process speedtest pfn:2650c
Oct 19 12:00:13 monitor kernel: [272671.359707] BUG: Bad page state in process speedtest pfn:0a3e0
Oct 22 09:00:12 monitor kernel: [ 8768.674228] BUG: Bad page state in process speedtest pfn:10a86a
Oct 22 12:00:07 monitor kernel: [19563.611729] BUG: Bad page state in process speedtest pfn:24d08
Example crash, but various other processes (e.g. kswapd,kworker,swapper) are also showing similar behaviour:
Code:
[Thu Oct 27 05:32:06 2022] BUG: Bad page state in process speedtest pfn:1091e
[Thu Oct 27 05:32:06 2022] page:00000000d7d00019 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1091e
[Thu Oct 27 05:32:06 2022] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
[Thu Oct 27 05:32:06 2022] raw: 000fffffc0000000 dead000000000100 dead000000000122 0000000000000000
[Thu Oct 27 05:32:06 2022] raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
[Thu Oct 27 05:32:06 2022] page dumped because: nonzero _refcount
[Thu Oct 27 05:32:06 2022] Modules linked in: tls ipmi_devintf ipmi_msghandler intel_rapl_msr intel_rapl_common kvm_amd ccp kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd bochs drm_vram_helper drm_ttm_helper joydev input_leds ttm serio_raw drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt qemu_fw_cfg mac_hid sch_fq_codel lp parport nfsd ramoops pstore_blk reed_solomon pstore_zone auth_rpcgss nfs_acl efi_pstore lockd drm grace sunrpc ip_tables x_tables autofs4 crc32_pclmul psmouse iavf virtio_scsi i2c_piix4 pata_acpi floppy
[Thu Oct 27 05:32:06 2022] CPU: 3 PID: 7434 Comm: speedtest Not tainted 5.15.0-52-generic #58-Ubuntu
[Thu Oct 27 05:32:06 2022] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[Thu Oct 27 05:32:06 2022] Call Trace:
[Thu Oct 27 05:32:06 2022] <TASK>
[Thu Oct 27 05:32:06 2022] show_stack+0x52/0x5c
[Thu Oct 27 05:32:06 2022] dump_stack_lvl+0x4a/0x63
[Thu Oct 27 05:32:06 2022] dump_stack+0x10/0x16
[Thu Oct 27 05:32:06 2022] bad_page.cold+0x63/0x94
[Thu Oct 27 05:32:06 2022] check_free_page_bad+0x66/0x70
[Thu Oct 27 05:32:06 2022] free_pcppages_bulk+0x1bf/0x390
[Thu Oct 27 05:32:06 2022] free_unref_page_commit.constprop.0+0x122/0x160
[Thu Oct 27 05:32:06 2022] free_unref_page+0xe3/0x190
[Thu Oct 27 05:32:06 2022] __put_page+0x77/0xe0
[Thu Oct 27 05:32:06 2022] skb_release_data+0x10d/0x180
[Thu Oct 27 05:32:06 2022] __kfree_skb+0x26/0x40
[Thu Oct 27 05:32:06 2022] tcp_recvmsg_locked+0x763/0x9e0
[Thu Oct 27 05:32:06 2022] tcp_recvmsg+0x79/0x1c0
[Thu Oct 27 05:32:06 2022] inet_recvmsg+0x5c/0x120
[Thu Oct 27 05:32:06 2022] ? security_socket_recvmsg+0x3d/0x60
[Thu Oct 27 05:32:06 2022] sock_recvmsg+0x71/0x80
[Thu Oct 27 05:32:06 2022] __sys_recvfrom+0x1a2/0x1d0
[Thu Oct 27 05:32:06 2022] __x64_sys_recvfrom+0x24/0x30
[Thu Oct 27 05:32:06 2022] do_syscall_64+0x5c/0xc0
[Thu Oct 27 05:32:06 2022] ? exit_to_user_mode_prepare+0x37/0xb0
[Thu Oct 27 05:32:06 2022] ? syscall_exit_to_user_mode+0x27/0x50
[Thu Oct 27 05:32:06 2022] ? __x64_sys_recvfrom+0x24/0x30
[Thu Oct 27 05:32:06 2022] ? do_syscall_64+0x69/0xc0
[Thu Oct 27 05:32:06 2022] ? exit_to_user_mode_prepare+0x37/0xb0
[Thu Oct 27 05:32:06 2022] ? syscall_exit_to_user_mode+0x27/0x50
[Thu Oct 27 05:32:06 2022] ? __x64_sys_recvfrom+0x24/0x30
[Thu Oct 27 05:32:06 2022] ? do_syscall_64+0x69/0xc0
[Thu Oct 27 05:32:06 2022] ? do_syscall_64+0x69/0xc0
[Thu Oct 27 05:32:06 2022] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[Thu Oct 27 05:32:06 2022] RIP: 0033:0x5c1d56
[Thu Oct 27 05:32:06 2022] Code: 44 24 08 75 c4 48 89 e8 49 03 47 08 e9 a8 fe ff ff 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <c3> 83 ff 0e 75 10 48 83 3e 00 ba 90 6d 5f 00 b8 fe 6e 5f 00 eb 28
[Thu Oct 27 05:32:06 2022] RSP: 002b:00007f9ff0151538 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
[Thu Oct 27 05:32:06 2022] RAX: ffffffffffffffda RBX: 00007f9ff0151850 RCX: 00000000005c1d56
[Thu Oct 27 05:32:06 2022] RDX: 0000000000008000 RSI: 000000000184bc80 RDI: 0000000000000008
[Thu Oct 27 05:32:06 2022] RBP: 00007f9ff0151660 R08: 0000000000000000 R09: 0000000000000000
[Thu Oct 27 05:32:06 2022] R10: 0000000000000020 R11: 0000000000000246 R12: 0000000000000000
[Thu Oct 27 05:32:06 2022] R13: 00000000018479c0 R14: 0000000000008000 R15: 00007f9ff0151701
[Thu Oct 27 05:32:06 2022] </TASK>
[Thu Oct 27 05:32:06 2022] Disabling lock debugging due to kernel taint
Code:
root@epyc:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.64-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-5.15: 7.2-13
pve-kernel-helper: 7.2-13
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-4
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-3
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
Last edited: