New PVE 4.0 Install on AMD Based System Continues to Self Power Off

calvinlevy

New Member
Jan 8, 2015
4
0
1
I have installed Proxmox VE 4.0 Version: 4.0-48/0d8559d0 on two separate systems. One, Intel based (i7-6700K), the second AMD based (FX-8370). Both have applied subscription keys. Both have been updated to Version: 4.0-57/cc7c2b53. The Intel version runs fine and has been for weeks. The AMD system consistently powers off.

I suspect issues with the Realtek RTL8169 drivers for both motherboard embedded nics and PCI installed nics. Removing either produces the same results. Seems to be issue with EHCI support. I have also suspected IOMMU v.2. Happy to provide information as needed to assist with troubleshooting. Such as sysctl and lsmod file outputs. (I have sent email request to Realtek for advice when drivers will support Linux kernel 4.x. Seems the system stays powered up the longest when the USB controller has been completely disabled. Lastly, I also see AMD-Vi IO_PAGE_FAULT errors in DMESG output.
 
If it Kernel Panics I would've thought it'd just sit there dead until some human intervention. Actually powering off could only be done if initiated by some other trigger.

Have you checked the cpu temperature? I have an 8350 myself as a "server" (minecraft/vent/teamspeak type stuff) and you definitely do NOT want to use the stock cooler. When you get a few VMs up on it, it gets damn hot. I'm certain I could make mine thermal shutdown on the stock cooler. Even the new cooler struggles (~80 degrees).
 
Interestingly enough, seems following the errors being reported by DMESG encouraged me down one path...but thanks to Stewge for basic troubleshooting reminders, appears that I was overthinking the "immediate" problem. After installing lm-sensors, noticed some weird behavior. Yet, there's still errors present on the AMD system that are not present on the Intel system. Apologies, I had more details in the post but could not include the details yet until I earn greater posting rights.
 
I have a brand new system I built for homelab testing, it's specs: AMD FX-8320e, 32GB HyperX RAM, 2x 320GB HD's in RAID1 and 4x 2TB HD's in RAID10 and during the install process I see dozens of AMD-VI IO_PAGE_FAULT errors.
I had IOMMU disabled in the BIOS and my NIC subsequently wouldn't work, after enabling it the NIC started working but it somehow disabled the USB 3.0 ports. I now can't re-install Proxmox because of some weird issue with the USB CD-Rom which oddly enough I had no problem with when I installed Proxmox the first time round on that system. Very frustrating experience.

But like you I want to get to the root of the AMD-VI IO_PAGE_FAULT errors so if there's anything you need from me just holler.

Code:
Dec 08 07:13:56 proxmox2 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT  device=02:00.0 domain=0x000f address=0x00000000be25d880 flags=0x0010]
Dec 08 07:13:56 proxmox2 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.0 domain=0x000f address=0x00000000be25d880 flags=0x0010]
Dec 08 07:13:56 proxmox2 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.0 domain=0x000f address=0x00000000be25d880 flags=0x0010]
Dec 08 07:13:56 proxmox2 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.0 domain=0x000f address=0x00000000be25d880 flags=0x0010]

Code:
root@proxmox2:~# lspci | grep "02:00.0"
02:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)

So it's related to the USB 3.0 controller on the motherboard. Very peculiar.
 
Last edited:
I believe I may have narrowed down the issue. The RealTek drivers for kernel 4.x are not included in the ProxMox distribution as of yet...at least I presume that to be the case. I requested a copy of the NIC driver from the manufacturer.

My request to RealTek:
Dear Realtek:
Will there be a driver update soon for the RTL8169 "LINUX driver for kernel 3.x and 2.6.x and 2.4.x" to support the latest Linux kernel version 4.x?


RealTek's Response:
Dear Customer,

Thank you for your E-mail!
We enclose the driver .
Best regards,
Technical Support Dept.
Realtek Semiconductor Corp.

Either you would need to recompile the kernel to include the driver or request the inclusion from ProxMox staff members. My vote is to request from ProxMox as it may benefit all PVE users and admins. I have the driver. I will email ProxMox and ask for the inclusion.

Note 1: I waited too long to post this update so I've not included some pertinent details. I have to review my notes. But my research seems to point to a subsytem that controls both USB and NIC drivers. I'll add that info later.

Note 2: I was not able to upload the driver to this forum. Driver file name: r8169-6.021.00.tar.bz2
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!