Weird errors in dmesg

onepamopa

Well-Known Member
Dec 1, 2019
94
13
48
39
Can someone check this:

[ 0.258339] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CA.WT1A], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258347] fbcon: Taking over console
[ 0.258350] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258353] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258354] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CA.MT1A], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258357] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258358] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258359] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CA.WT2A], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258362] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258363] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258364] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CA.MT2A], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258366] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258368] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258369] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CA.WT3A], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258371] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258373] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258374] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CA.MT3A], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258376] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
. . .
[ 0.258523] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CD.MT3D], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258526] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258527] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258528] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CD.WT4D], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258531] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258533] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258534] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CD.MT4D], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258536] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258538] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258539] ACPI BIOS Error (bug): Failure creating named object [\_SB.I2CD.MT5D], AE_ALREADY_EXISTS (20190816/dswload2-324)
[ 0.258542] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-221)
[ 0.258543] ACPI: Skipping parse of AML opcode: Device (0x5B82)
[ 0.258628] ACPI: 8 ACPI AML tables successfully acquired and loaded
[ 0.263932] ACPI: EC: EC started
[ 0.263932] ACPI: EC: interrupt blocked
[ 0.263992] ACPI: \_SB_.PCI0.SBRG.EC0_: Used as first EC
[ 0.263993] ACPI: \_SB_.PCI0.SBRG.EC0_: GPE=0x2, EC_CMD/EC_SC=0x66, EC_DATA=0x62
[ 0.263994] ACPI: \_SB_.PCI0.SBRG.EC0_: Boot DSDT EC used to handle transactions
[ 0.263994] ACPI: Interpreter enabled
[ 0.264006] ACPI: (supports S0 S3 S4 S5)
[ 0.264006] ACPI: Using IOAPIC for interrupt routing
 
Do you experience any issues with your system? If not, these "errors" usually just indicate that the kernel doesn't fully support the BIOS, or the BIOS is doing funky stuff it shouldn't be doing. You can try updating your BIOS to see if that helps.

I've seen these kinds of messages regularly on very recent AMD chips, but I suppose they could appear on Intel as well.

In general, if everything works as intended, these should be okay to ignore.
 
Well, I do get full system freezes every 2-4 days..
Can't figure out whats causing them, because - there is nothing in the logs, and the freezes occur randomly.
There are 2 GPUs but they are allocated to VMs, there's no PVE GPU.

Once the system freezes - there's nothing to do but a hard reset, and.. there's absolutely nothing in the logs..

The CPU is Threadripper 3960X @ ASUS TRX40-pro running the latest available bios.
 
Here are some other messages from the log:
[ 0.850073] ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index (0x000000007) is beyond end of object (length 0x6) (20190816/exoparg2-396)
[ 0.850080] No Local Variables are initialized for Method [_PLD]
[ 0.850081] No Arguments are initialized for method [_PLD]
[ 0.850082] ACPI Error: Aborting method \_SB.S0D2.D2A0.BYUP.BYD8.XHC1.RHUB.PRT6._PLD due to previous error (AE_AML_PACKAGE_LIMIT) (20190816/psparse-531)

[ 1.174234] wmi: module verification failed: signature and/or required key missing - tainting kernel
[ 1.175813] acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[ 1.176021] acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[ 1.176274] acpi PNP0C14:04: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)

[ 11.384328] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
[ 11.384406] nct6775 nct6775.656: Invalid temperature source 28 at index 0, source register 0x100, temp register 0x73
[ 11.384418] nct6775 nct6775.656: Invalid temperature source 28 at index 1, source register 0x200, temp register 0x75
[ 11.384430] nct6775 nct6775.656: Invalid temperature source 28 at index 2, source register 0x300, temp register 0x77
[ 11.384452] nct6775 nct6775.656: Invalid temperature source 28 at index 4, source register 0x900, temp register 0x7b
[ 11.384464] nct6775 nct6775.656: Invalid temperature source 28 at index 5, source register 0xa00, temp register 0x7d

[130991.412409] hid-generic 0003:1A2C:2D23.000A: input,hidraw0: USB HID v1.10 Keyboard [USB USB Keyboard] on usb-0000:03:00.3-2/input0
[130991.417077] input: USB USB Keyboard Consumer Control as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb1/1-2/1-2:1.1/0003:1A2C:2D23.000B/input/input21
[130991.476347] input: USB USB Keyboard System Control as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb1/1-2/1-2:1.1/0003:1A2C:2D23.000B/input/input22
[130991.476395] hid-generic 0003:1A2C:2D23.000B: input,hidraw1: USB HID v1.10 Device [USB USB Keyboard] on usb-0000:03:00.3-2/input1
[162168.362573] nvme nvme0: I/O 54 QID 5 timeout, aborting
[162168.368137] nvme nvme0: Abort status: 0x0
[162170.314630] INFO: task kvm:91464 blocked for more than 30 seconds.
[162170.314636] Tainted: P OE 5.4.34-1-pve #1
[162170.314638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[162170.314640] kvm D 0 91464 1 0x00000000
[162170.314642] Call Trace:
[162170.314648] __schedule+0x2e6/0x700
[162170.314649] schedule+0x33/0xa0
[162170.314650] schedule_timeout+0x205/0x300
[162170.314652] ? blk_flush_plug_list+0xe2/0x110
[162170.314653] io_schedule_timeout+0x1e/0x50
[162170.314654] wait_for_completion_io+0xb7/0x140
[162170.314656] ? wake_up_q+0x80/0x80
[162170.314658] submit_bio_wait+0x61/0x90
[162170.314660] blkdev_issue_zeroout+0x140/0x220
[162170.314661] blkdev_ioctl+0x5cd/0x9e0
[162170.314663] block_ioctl+0x3d/0x50
[162170.314664] do_vfs_ioctl+0xa9/0x640
[162170.314666] ? _copy_from_user+0x3e/0x60
[162170.314667] ksys_ioctl+0x67/0x90
[162170.314668] __x64_sys_ioctl+0x1a/0x20
[162170.314669] do_syscall_64+0x57/0x190
[162170.314671] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[162170.314672] RIP: 0033:0x7f1e1ab1b427
[162170.314676] Code: Bad RIP value.
[162170.314677] RSP: 002b:00007f1bec71c4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[162170.314678] RAX: ffffffffffffffda RBX: 00007f1bff7ff7e0 RCX: 00007f1e1ab1b427
[162170.314678] RDX: 00007f1bec71c4b0 RSI: 000000000000127f RDI: 0000000000000016
[162170.314679] RBP: 00007f1e0d6dbbd0 R08: 0000000000000000 R09: 00000000ffffffff
[162170.314679] R10: 00007f1bec71c4b0 R11: 0000000000000246 R12: 00007f1bec71c4b0
[162170.314680] R13: 00007f1e0c013078 R14: 00007f1e0c03d000 R15: 00007f1c00202010
[162198.474317] nvme nvme0: I/O 54 QID 5 timeout, reset controller
[162198.837979] nvme nvme0: 7/0/0 default/read/poll queues
 
Hi, there is a kernel message about your nvme m.2 ssd, 2 seconds before the freeze happens.

Also the freezes states i/o hang e.g hard disk issues.

Do you have any processes running inside a vm that does heavy i/o for example backups ?

It seems like your ssd is faulty or the usage is to much and it ends up dead locking your server.

That is especialy true if you use the nvme for your host and vm's.

You might want to throttle your vm disk with "thread i/o"


How much space on your nvme is used ? Ssd's slow down if <=20% is free, depends on model of course
 
Last edited:
@H4R0

This dmesg log is from the boot after the freeze, not before. The ssd is used only for the VM disks, the host is on a HDD:
Content Disk image, Container
Type LVM
Usage 92.47% (882.02 GiB of 953.87 GiB)

There is no heavy read/write on any of the VMs. As far as the freezes - once freeze occurs, there's nothing in any of the logs, the system just hangs and stays there, I've tried to wait 5-10 minutes before rebooting, still no log entries.

root@proxmox:~# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: XPG GAMMIX S5
Serial Number: 2J4520102511
Firmware Version: V9002s16
PCI Vendor/Subsystem ID: 0x10ec
IEEE OUI Identifier: 0x00e04c
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Sun May 17 19:55:47 2020 EEST
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x0014): DS_Mngmt Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 118 Celsius
Critical Comp. Temp. Threshold: 150 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.00W - - 0 0 0 0 0 0
1 + 4.00W - - 1 1 1 1 0 0
2 + 3.00W - - 2 2 2 2 0 0
3 - 0.0128W - - 3 3 3 3 4000 8000
4 - 0.0080W - - 4 4 4 4 8000 30000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 42 Celsius
Available Spare: 100%
Available Spare Threshold: 32%
Percentage Used: 1%
Data Units Read: 15,599,498 [7.98 TB]
Data Units Written: 19,579,260 [10.0 TB]
Host Read Commands: 172,489,030
Host Write Commands: 147,982,217
Controller Busy Time: 0
Power Cycles: 82
Power On Hours: 1,071
Unsafe Shutdowns: 15
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, max 8 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1 0 0x0000 0x0000 0x000 0 0 -
6 1219368206019475265 0 0x0000 0x0000 0x000 0 0 -

root@proxmox:~#
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!