Proxmox crash / network disconnect on single node in cluster.

sktrz

New Member
Mar 25, 2023
4
0
1
I am not entirely sure if the machine crashes/locks up or if eth0 just decides it wants to no longer connect to my router. The machine posts / boots normally makes it to proxmox web ui as a properly running node, I start up a few vms on the machine all running the same vgpu profile and then run a game inside all of the containers everything continues to run normally. After a few hours roughly 6/7 hours of the vms + games running the machine will completely drop from my lan almost like the machine is turned off, although it isnt if i check the machines hdmi output it still outputs a signal & image, if i detach the etho cable from the back of the nic i see it drop from my routers led panel.

I have installed kdump to try and see what is causing the lockup but even kdump displayed no dump log after running the machine through another test with it installed.

I am still quite new to the linux/proxmox enviroment so it could be something simple ive managed to skip over, would appreciate any help trying to solve this situation. thanks.

Server specs - Aurora r12
cpu - i7-11700F
nic - Killer E3100 Ethernet controller
gpu - gtx1050

edit - i do not think its an over heating issue, i was monitoring the temps last time it crashed.
 
Last edited:
I believe i've solved the issue, seems it was the XMP memory profile causing the system to become unstable.
 
Seems i was wrong the machine will run for up to 12-16 hours now before it crashes now... Really confused as to why this is still happening...
 
Code:
Mar 25 21:57:44 i7-2 kernel: perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Mar 25 21:58:22 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:58:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 21:58:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 21:58:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 21:58:33 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:59:14 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:59:16 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:59:37 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:59:37 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:00:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:08 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:08 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:09 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:09 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:11 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:11 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:13 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:13 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:13 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:13 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:18 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:18 i7-2 nvidia-vgpu-mgr[1749]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:03:19 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:19 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:20 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:20 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:29 i7-2 nvidia-vgpu-mgr[1749]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:04:11 i7-2 nvidia-vgpu-mgr[1749]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:04:20 i7-2 nvidia-vgpu-mgr[1749]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:04:30 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:02 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:06 i7-2 pmxcfs[1395]: [status] notice: received log
Mar 25 22:05:08 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:09 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:10 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:11 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:12 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:34 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:06:50 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:06:58 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:49 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:49 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:50 i7-2 nvidia-vgpu-mgr[1892]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:07:50 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:08:01 i7-2 nvidia-vgpu-mgr[1892]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:08:33 i7-2 nvidia-vgpu-mgr[1892]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:08:42 i7-2 nvidia-vgpu-mgr[1892]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:09:49 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:09:50 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:09:56 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:09:58 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:52 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:52 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 nvidia-vgpu-mgr[1689]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:22 i7-2 pmxcfs[1395]: [status] notice: received log
Mar 25 22:11:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:24 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:24 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:24 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:24 i7-2 nvidia-vgpu-mgr[2142]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000

The last reported logs in syslog displays the issues above.
Edit: Did a bit of searching on the errors listed above and it seems it could be a BIOS issue, ive updated the bios on the machine trying to run another test to see if it crashes again. I do have another i7 running a different series cpu & gpu that produces these same errors in syslog although the machine runs fine haven't had any issues with it.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!