Proxmox crash / network disconnect on single node in cluster.

sktrz

New Member
Mar 25, 2023
4
0
1
I am not entirely sure if the machine crashes/locks up or if eth0 just decides it wants to no longer connect to my router. The machine posts / boots normally makes it to proxmox web ui as a properly running node, I start up a few vms on the machine all running the same vgpu profile and then run a game inside all of the containers everything continues to run normally. After a few hours roughly 6/7 hours of the vms + games running the machine will completely drop from my lan almost like the machine is turned off, although it isnt if i check the machines hdmi output it still outputs a signal & image, if i detach the etho cable from the back of the nic i see it drop from my routers led panel.

I have installed kdump to try and see what is causing the lockup but even kdump displayed no dump log after running the machine through another test with it installed.

I am still quite new to the linux/proxmox enviroment so it could be something simple ive managed to skip over, would appreciate any help trying to solve this situation. thanks.

Server specs - Aurora r12
cpu - i7-11700F
nic - Killer E3100 Ethernet controller
gpu - gtx1050

edit - i do not think its an over heating issue, i was monitoring the temps last time it crashed.
 
Last edited:
I believe i've solved the issue, seems it was the XMP memory profile causing the system to become unstable.
 
Seems i was wrong the machine will run for up to 12-16 hours now before it crashes now... Really confused as to why this is still happening...
 
Code:
Mar 25 21:57:44 i7-2 kernel: perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Mar 25 21:58:22 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:58:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 21:58:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 21:58:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 21:58:33 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:59:14 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:59:16 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:59:37 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 21:59:37 i7-2 nvidia-vgpu-mgr[1804]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:00:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:08 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:08 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:09 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:09 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:11 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:00:11 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:13 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:13 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:13 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:13 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:18 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:18 i7-2 nvidia-vgpu-mgr[1749]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:03:19 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:19 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:20 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:20 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:03:29 i7-2 nvidia-vgpu-mgr[1749]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:04:11 i7-2 nvidia-vgpu-mgr[1749]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:04:20 i7-2 nvidia-vgpu-mgr[1749]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:04:30 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000133: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:02 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:06 i7-2 pmxcfs[1395]: [status] notice: received log
Mar 25 22:05:08 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:09 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:10 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:11 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:12 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:05:34 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:06:50 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:06:58 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:49 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000132: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:49 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:50 i7-2 nvidia-vgpu-mgr[1892]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:07:50 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:07:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:08:01 i7-2 nvidia-vgpu-mgr[1892]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:08:33 i7-2 nvidia-vgpu-mgr[1892]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:08:42 i7-2 nvidia-vgpu-mgr[1892]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:09:49 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:09:50 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:09:56 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:09:58 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:00 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:01 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000134: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:51 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:52 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:52 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:10:53 i7-2 nvidia-vgpu-mgr[1689]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000
Mar 25 22:10:53 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:22 i7-2 pmxcfs[1395]: [status] notice: received log
Mar 25 22:11:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:23 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:24 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:24 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:24 i7-2 kernel: [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000136: ERESTARTSYS received during reg access, waiting for 25000 milliseconds for operation to complete
Mar 25 22:11:24 i7-2 nvidia-vgpu-mgr[2142]: notice: vmiop_log: ref count tracking error for index 0x0 of page size 0x1000

The last reported logs in syslog displays the issues above.
Edit: Did a bit of searching on the errors listed above and it seems it could be a BIOS issue, ive updated the bios on the machine trying to run another test to see if it crashes again. I do have another i7 running a different series cpu & gpu that produces these same errors in syslog although the machine runs fine haven't had any issues with it.
 
Last edited: