Kernel messages dump_header, page_fault, Bad RIP value

Paspao

Active Member
Aug 1, 2017
69
2
28
56
Hello,

I have a Proxmox 6 cluster on ZFS Mirror (pve-manager/6.0-5/f8a710d7 (running kernel: 5.0.18-1-pve)

I run 31 LXC with 1.75 GB assigned (54 GB) on host with 96GB.

Code:
free -h
              total        used        free      shared  buff/cache   available
Mem:           94Gi        73Gi         9Gi       1.2Gi        11Gi        21Gi

The LXC are not showing low memory in GUI.

In syslog I find:

Code:
kernel: [1234333.445570] Code: Bad RIP value.
...
kernel: [1235198.622790]  dump_header+0x54/0x308
kernel: [1235198.622849] kmem: usage 98648kB, limit 9007199254740988kB, failcnt 0
...
then a list of guest processes
...
kernel: [1235565.764182]  __x64_sys_clone+0x27/0x30
...
kernel: [1235630.933173] RDX: 0000000000000000 RSI: 0000000001c12798 RDI: 00000000018f2010
...
kernel: [1235678.497954]  __do_fault+0x3c/0x130
...
kernel: [1235774.672525] CPU: 5 PID: 18371 Comm: mailgraph Tainted: P           O      5.0.18-1-pve #1
kernel: [1235774.672584]  ? xas_load+0xc/0x80
...
kernel: [1235802.301627]  ? filemap_map_pages+0x1ae/0x380


Any suggestion on how to troubleshoot these issues?

Thank you
P.
 
Please attach the complete log as a file here. Also the output of 'pveversion -v'.
 
Hello,

thank you

Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-5.0.18-1-pve: 5.0.18-1
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-6
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-6
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1
 
So, all the 'Bad RIP value' call traces are from containers? What's running in there?
Did you run a memtest already?