Network card stops randomly PVE 8.0.3

en4ble

Member
Feb 24, 2023
69
5
8
Hello,

I have a new system on 8.0.3 that is having issues with NIC (or something else). Network activity stops, you can see system is running but network connectivity is shut. Tried to connect monitor but its blank/black so can't see.

Rebooting entire server server fixes it temporarily but it seems like system cannot hold even 24 hours without a outage.

System Info:
CPU(s) 16 x AMD Ryzen 7 5700G with Radeon Graphics (1 Socket)

Kernel Version
Linux 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z)
PVE Manager Version
pve-manager/8.0.3/bbf3993334bfa916

Motherboard (realtek)
B550M DS3H AC

Most recent happened around 2AM but can't really see anything around that time, journal snipper here as well as entire log from that timeframe (via pastebin)

https://pastebin.com/NXnvmDix

Around 02:18;00 is when the physical reboot happens but no messages prior that. 01;45;58 it seems like last message.

Mar 09 01:45:58 dub-pve02 pveproxy[1912]: starting 1 worker(s)
Mar 09 01:45:58 dub-pve02 pveproxy[1912]: worker 297445 started
[B]Mar 09 01:58:06 dub-pve02 pvedaemon[1905]: <root@pam> successful auth for user 'root@pam'[/B]
[B][B]-- Boot 39fda05cc3fc4d81af41dfb35d2baf2f --[/B][/B]
[B]Mar 09 02:18:27 dub-pve02 kernel: Linux version 6.2.16-3-pve (tom@sbuild) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for >[/B]

Code:
root@dub-pve02:~# journalctl --since "Mar 09 01:00:10"
Mar 09 01:07:54 dub-pve02 pveproxy[249870]: worker exit
Mar 09 01:07:54 dub-pve02 pveproxy[1912]: worker 249870 finished
Mar 09 01:07:54 dub-pve02 pveproxy[1912]: starting 1 worker(s)
Mar 09 01:07:54 dub-pve02 pveproxy[1912]: worker 279957 started
Mar 09 01:10:06 dub-pve02 pvedaemon[1905]: <root@pam> successful auth for user 'root@pam'
Mar 09 01:17:01 dub-pve02 CRON[284071]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 09 01:17:01 dub-pve02 CRON[284072]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Mar 09 01:17:01 dub-pve02 CRON[284071]: pam_unix(cron:session): session closed for user root
Mar 09 01:20:22 dub-pve02 pveproxy[268991]: worker exit
Mar 09 01:20:22 dub-pve02 pveproxy[1912]: worker 268991 finished
Mar 09 01:20:22 dub-pve02 pveproxy[1912]: starting 1 worker(s)
Mar 09 01:20:22 dub-pve02 pveproxy[1912]: worker 285489 started
Mar 09 01:21:21 dub-pve02 kernel: loop0: detected capacity change from 0 to 20971520
Mar 09 01:21:21 dub-pve02 kernel: EXT4-fs (loop0): mounted filesystem 9e3d8d69-2cc0-45f0-8b96-b05569fe3ab0 with ordered data mode. >
Mar 09 01:21:21 dub-pve02 kernel: EXT4-fs (loop0): unmounting filesystem 9e3d8d69-2cc0-45f0-8b96-b05569fe3ab0.
Mar 09 01:26:06 dub-pve02 pvedaemon[1905]: <root@pam> successful auth for user 'root@pam'
Mar 09 01:34:43 dub-pve02 pveproxy[262657]: worker exit
Mar 09 01:34:43 dub-pve02 pveproxy[1912]: worker 262657 finished
Mar 09 01:34:43 dub-pve02 pveproxy[1912]: starting 1 worker(s)
Mar 09 01:34:43 dub-pve02 pveproxy[1912]: worker 292300 started
Mar 09 01:42:06 dub-pve02 pvedaemon[1906]: <root@pam> successful auth for user 'root@pam'
Mar 09 01:45:58 dub-pve02 pveproxy[279957]: worker exit
Mar 09 01:45:58 dub-pve02 pveproxy[1912]: worker 279957 finished
Mar 09 01:45:58 dub-pve02 pveproxy[1912]: starting 1 worker(s)
Mar 09 01:45:58 dub-pve02 pveproxy[1912]: worker 297445 started
[B]Mar 09 01:58:06 dub-pve02 pvedaemon[1905]: <root@pam> successful auth for user 'root@pam'
-- Boot 39fda05cc3fc4d81af41dfb35d2baf2f --[/B]
Mar 09 02:18:27 dub-pve02 kernel: Linux version 6.2.16-3-pve (tom@sbuild) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for >
Mar 09 02:18:27 dub-pve02 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-3-pve root=/dev/mapper/pve-root ro quiet max_loop=2>
Mar 09 02:18:27 dub-pve02 kernel: KERNEL supported cpus:
Mar 09 02:18:27 dub-pve02 kernel:   Intel GenuineIntel
Mar 09 02:18:27 dub-pve02 kernel:   AMD AuthenticAMD
Mar 09 02:18:27 dub-pve02 kernel:   Hygon HygonGenuine
Mar 09 02:18:27 dub-pve02 kernel:   Centaur CentaurHauls
Mar 09 02:18:27 dub-pve02 kernel:   zhaoxin   Shanghai 
Mar 09 02:18:27 dub-pve02 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Mar 09 02:18:27 dub-pve02 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Mar 09 02:18:27 dub-pve02 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Mar 09 02:18:27 dub-pve02 kernel: x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
Mar 09 02:18:27 dub-pve02 kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Mar 09 02:18:27 dub-pve02 kernel: x86/fpu: xstate_offset[9]:  832, xstate_sizes[9]:    8
Mar 09 02:18:27 dub-pve02 kernel: x86/fpu: Enabled xstate features 0x207, context size is 840 bytes, using 'compacted' format.
Mar 09 02:18:27 dub-pve02 kernel: signal: max sigframe size: 3376
Mar 09 02:18:27 dub-pve02 kernel: BIOS-provided physical RAM map:


Any advise on what to look at?! I have other "exact" system that its been stable for 9+ days already. Was wondering if going 8.1 could help with perhaps NIC drivers?!
 
Last edited:
8.0 is missing security patches so yes, always a good idea to patch to the latest version.
And yes, PVE8.0 got problems with Realtek NICs like the RTL8111/8411/8168. But then your display still should output something if it would just be a network problem.
 
8.0 is missing security patches so yes, always a good idea to patch to the latest version.
And yes, PVE8.0 got problems with Realtek NICs like the RTL8111/8411/8168. But then your display still should output something if it would just be a network problem.
Thanks @Dunuin for reply.

Do we know if the Realtek NICs issues were remediated with 8.1 release?!

Can't be 100% sure on the display since it was remote helped - left the screen on from now on.
 
@spirit @Dunuin after the upgrade, I've been monitoring the screen for any anomalies. So far uptime been 6 hours but received this message:
"Memory cgroup out of memory: Killed process ...." - see pic below. Just throwing this out there, if this in any shape or form could be related...b664e853-1502-48c5-874b-348c078097e6.jpg

Not sure if this is false positive or could be related to my issue?!

Server installed with 128GB RAM and is running (2) LXC containers at 62GB each with 2GB Swap

Currently RAM utilization on PVE is under 50% (45%). I have multiple systems with same config and never had issues or system going 100%.
Capture.PNG
 
yes, it was fixed in kernel 6.5
Still hab problems with a r8168 refusing to work after updating from PVE 7.4 to 8.0 (which should skipped the 6.2 kernel). At least here updating 8.0 to latest 8.1 fixed it.

. Do you know if similar behavior was captured where NIC just goes inactive/down?!
Yes, thats the problem. NIC won't work at all or go down after some time.
 
  • Like
Reactions: en4ble

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!