[SOLVED] kern.log, syslog and messages growing too big

Fathi · Oct 24, 2018

Hi,
First I thought that the pve no subscription channel has some kernel debugging enabled that filled my root device in less than two days of non continual usage, but even dmesg is listing some errors intead of the usual boot and peripheral information. My logs are bloated of repeated messages. dmesg returns a lot of ones similar to the following:
[34256.095328] pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0
[34256.095333] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=00e0(Receiver ID)
[34256.095337] pcieport 0000:00:1c.0: device [8086:a293] error status/mask=00000001/00002000
[34256.095354] pcieport 0000:00:1c.0: [ 0] Receiver Error (First)
[34256.097803] pcieport 0000:00:1c.0: AER: Multiple Corrected error received: id=00e0
[34256.097917] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
[34256.097921] pcieport 0000:00:1c.0: device [8086:a293] error status/mask=00001100/00002000
[34256.097923] pcieport 0000:00:1c.0: [ 8] RELAY_NUM Rollover
[34256.097925] pcieport 0000:00:1c.0: [12] Replay Timer Timeout
[34256.097929] pcieport 0000:00:1c.0: AER: Multiple Corrected error received: id=00e0
[34256.098090] pcieport 0000:00:1c.0: can't find device of ID00e0
[34256.098092] pcieport 0000:00:1c.0: AER: Multiple Corrected error received: id=00e0
[34256.098151] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
[34256.098155] pcieport 0000:00:1c.0: device [8086:a293] error status/mask=00003100/00002000
[34256.098159] pcieport 0000:00:1c.0: [ 8] RELAY_NUM Rollover
[34256.098162] pcieport 0000:00:1c.0: [12] Replay Timer Timeout

I am setting up this server for an unmanaged poc embedded on a train, which should be running before this weekend. When root partition filled, no vm, no container could start.

Could someone please help me debug this ?

P.S.: This is on a new optiplex 5050 with one intel onboard nic and one rtl8111 added nic.
TIA.

Fathi · Oct 24, 2018

realtek rtl 8169

Stoiko Ivanov · Oct 24, 2018

seems like a problem with a pci-device - look at the output of `lspci -v` and see which device is behind `0000:00:1c.0`

Fathi · Oct 24, 2018

I was nearly certain that the problem comes from the second nic card, but lspc -v returned:

00:1c.0 PCI bridge: Intel Corporation Device a293 (rev f0) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 122
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: f7000000-f70fffff
Prefetchable memory behind bridge: 00000000f0000000-00000000f00fffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: Dell Device 07a2
Capabilities: [a0] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Access Control Services
Capabilities: [220] #19
Kernel driver in use: pcieport
Kernel modules: shpchp

This desktop has had a sencond nic card and a 8Gb ram added by the resseller. What could be the reason ? I know this is probably not proxmox fault, so forgive me for asking this here.

Stoiko Ivanov · Oct 24, 2018

Hm - assuming that hte pci-brigde on the mainboard is not actually broken - I would try the following:
* does the error remain, if you remove the second NIC
* upgrading the BIOS might help
* make sure you're running the latest kernel
* maybe setting a kernel commandline parameter might help:
https://askubuntu.com/questions/863...rected-type-physical-layer-id-00e5receiver-id
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173

Fathi · Nov 14, 2018

Hi,
Finally we bought a new, branded, nic and replaced the one added by the reseller (rtl8169 chip on unbranded nic) and all the problems disappeared.
That was a hardware problem. I could not even suspect a certified dell reseller adding an unbranded nic on original dell desktop.
Thank you all.

Stoiko Ivanov · Nov 14, 2018

Glad to hear your problem is resolved - Please mark the thread as solved, since this helps other users with similar problems!

Fathi · Nov 14, 2018

Please, how can I mark this thread as solved ?

Stoiko Ivanov · Nov 14, 2018

On top of the thread next to the subject there should be the menu "Thread Tools" -> Edit thread -> set the Prefix to "Solved"

Fathi · Nov 14, 2018

I have only "Thread Tools" -> Edit Title" and no other option.

tom · Nov 14, 2018

Fathi said:
I have only "Thread Tools" -> Edit Title" and no other option.

you need to edit the first post.

Fathi · Nov 16, 2018

I edited the first post, but "thread tTools" doesn't appear in the overlay window so I prfixed the title with [SOLVED].

Search

Search

[SOLVED] kern.log, syslog and messages growing too big

Fathi

Renowned Member

Fathi

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

Fathi

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

Fathi

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

Fathi

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

Fathi

Renowned Member

tom

Proxmox Staff Member

Fathi

Renowned Member

We value your privacy