I am just on a week deep into one of the most unpleasant technical problems I've encountered in a long time and that's why I'm in this thread......
I am definitely suffering this issue on 2 different HP EliteDesk 800 Mini G9 systems, with the following network card:
Ethernet controller: Intel Corporation Ethernet Connection (17) I219-LM (rev 11)
e1000e: Intel(R) PRO/1000 Network Driver
e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
Proxmox ver:
Linux 6.8.8-3-pve (2024-07-16T16:16Z)
I can confirm that I have the following error in my dmesg history.
vmbr0: port 1(eno1) entered blocking state
I have the following error directly on the console of the box, visible via AMT:
e1000e 0000:00:1f.6 eno1: NETDEV WATCHDOG: CPU: 6: transmit queue 0 timed out xxxxms
My symptoms for this problem was files copied by my docker container, on my dockervm (via SMB from my NAS, back to my NAS) were having quite infrequent file corruption.
I even have a Ubuntu 22.04 VM and a DietPi VM which will both exhibit the fault
very regularly (not always)
However a Ubuntu 24.04 VM which will barely exhibit the fault at all, on the same host. (Yes, I know, this makes little to no sense)
Even SSH into Proxmox host itself, manually mounting SMB and copying files, would also flake out, also far less frequently, than the Ubuntu VM.
In order to be certain without question, this problem has had me copy possibly terabytes upon terabytes of files in a variety of methods, then running a binary file compare against them. This included ftp, smb etc. I thought I'd stumbled upon an smb client bug in Ubuntu at one stage.
I enjoy a good technical one to figure out but golly this has been
painful!
Let me just state one interesting thing for my situation:
My machine has been
_rock_solid_ never crashed, performed wonderfully with the Intel 12500T on this HP mini system, I have not had drop outs or ping problems or any weird behaviour, I've ONLY seen corrupt files, as the issue I guess, flares up so sporadically for such a short duration?
None the less the command
ethtool -K eno1 tso off gso off totally fixes my issues.
Thank you to all and everyone who has contributed to threads on this offering the solution and to the developers making this software, thank you.
So now that's out of the way, on to my question.
1, Please, does anyone know what the command via ethtool, to
read my current flags for tso and gso? I would like to know that any changes I make to /etc/network/interfaces on boot, are sticking on reboot.
2, Can I get some opinions / feedback from the community here on a long term solution to this problem, this one really got me nasty and based on what I'm reading here, it feels to me like this flag should be hard set by the proxmox team to simply disable this functionality and save people a whole heap of hassle? Any thoughts?
(NOTE: due to the recent work with broadcom and VMWare, proxmox I believe is gaining more serious traction and I feel a solution for a serious problem like this should probably be seriously considered)
Thanks again. Thank goodness these were 'disposable' files I've been working with.