Kernel messages dump_header, page_fault, Bad RIP value

Paspao

Active Member
Aug 1, 2017
69
2
28
55
Hello,

I have a Proxmox 6 cluster on ZFS Mirror (pve-manager/6.0-5/f8a710d7 (running kernel: 5.0.18-1-pve)

I run 31 LXC with 1.75 GB assigned (54 GB) on host with 96GB.

Code:
free -h
              total        used        free      shared  buff/cache   available
Mem:           94Gi        73Gi         9Gi       1.2Gi        11Gi        21Gi

The LXC are not showing low memory in GUI.

In syslog I find:

Code:
kernel: [1234333.445570] Code: Bad RIP value.
...
kernel: [1235198.622790]  dump_header+0x54/0x308
kernel: [1235198.622849] kmem: usage 98648kB, limit 9007199254740988kB, failcnt 0
...
then a list of guest processes
...
kernel: [1235565.764182]  __x64_sys_clone+0x27/0x30
...
kernel: [1235630.933173] RDX: 0000000000000000 RSI: 0000000001c12798 RDI: 00000000018f2010
...
kernel: [1235678.497954]  __do_fault+0x3c/0x130
...
kernel: [1235774.672525] CPU: 5 PID: 18371 Comm: mailgraph Tainted: P           O      5.0.18-1-pve #1
kernel: [1235774.672584]  ? xas_load+0xc/0x80
...
kernel: [1235802.301627]  ? filemap_map_pages+0x1ae/0x380


Any suggestion on how to troubleshoot these issues?

Thank you
P.
 
Please attach the complete log as a file here. Also the output of 'pveversion -v'.
 
Hello,

thank you

Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-5.0.18-1-pve: 5.0.18-1
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-6
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-6
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1
 
So, all the 'Bad RIP value' call traces are from containers? What's running in there?
Did you run a memtest already?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!