NIC loses connectivity after backup job

Jun 28, 2019
27
1
23
Hello,

I have this NIC in my Proxmox box and a FreeNAS box. The issue I'm having is on the Proxmox box with that NIC. Daily, at 3 AM, it is losing connectivity, causing the iSCSI connection to the FreeNAS box to break. I've tried disabling ASPM and switching PCIe slots, but the loss of connectivity still occurs. I also have a backup job running at 3 AM, which I just recently verified a correlation between that and the NIC's loss of connectivity. The NIC does not lose connectivity if I disable the backup job. The puzzling thing is that the backup job does not touch the NIC at all, as it backs up to a local logical volume on the Proxmox box. Attached is the network setup (interface that loses connectivity is highlighted) and the bit of syslog when the backup job kicks off and the NIC loses connectivity (don't mind the EXT4-fs errors, I'm aware).

I think the smoking gun is the backup job, however, I am at a loss as to how to mitigate it. Any suggestions are appreciated!

Thanks,
Craig
 

Attachments

  • SnipImage.JPG
    SnipImage.JPG
    54 KB · Views: 9
  • SnipImage.JPG
    SnipImage.JPG
    218.1 KB · Views: 7
Ok so what’s the free nas box being used for if there are no backups going to it?

I’ve seen network connections drop when there is too much disk IO or load on the CPU.

Are you running spinning disks or SSD locally?

What is the nas box running spinning disk or ssd?

How big is the network connection to the nas box 1,10 GB etc?

“”Cheers
G
 
Ok so what’s the free nas box being used for if there are no backups going to it?

I’ve seen network connections drop when there is too much disk IO or load on the CPU.

Are you running spinning disks or SSD locally?

What is the nas box running spinning disk or ssd?

How big is the network connection to the nas box 1,10 GB etc?

“”Cheers
G
Because of the disconnectivity, I can't use the FreeNAS box for much right now. I would like to make it shared storage for VMs. Currently VMs are running local to the Proxmox box. The Proxmox box is running SSDs for its root ZFS pool and spinning disks for the backup job's storage. The FreeNAS box is running spinning disks. The connection that is using the NICs is a 1Gb dedicated connection.

There is nothing using the disconnecting link during normal operation or during the backup job, just an iSCSI connection that throws the errors indicated in the logs. Maybe I should also add that the link doesn't come back until I reboot the Proxmox box.
 
I still have a feeling the drop is happening due to high load during the backup cycle.

Normally there is a iSCSI time out and if this is reached by the initiator the connection will drop and won't come back as there will be built up IO requests, hence why the reboot is required to reconnect again.

try this link and scroll down to "Debuging iSCSI issues"

https://www.thegeekdiary.com/how-to-troubleshoot-iscsi-issue-is-centos-rhel-67/

set up what's in the guide and then run the backups to capture the network flow or what the initiator is doing.

the paste the logs here.

the logs supplied are just the action logs of each step they need to be more verbose to see whats happening with the initiator.

Every iSCSI problem we have ever had over the last 12 years has been with initiator time out due to load even if the iSCSI connector isn't parsing network traffic the high load stalls any processes and the initiator drops the connection.

let me know how you go.

""Cheers
G
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!