I lose the network connection randomly

pga57

New Member
Jun 29, 2023
11
2
3
Hello,
I'm moving my conversation to this thread hoping to find a solution.
To resume, I lose the network connection randomly (after 1 hour to 4 days of operation).
I don't know what the cause is.
I had this problem with 7.4 and unfortunately also with 8.0.3 (fresh install with all updates).
How do I find what is wrong and fix it?
The syslog is not helping me at the moment.

At the request of Spirit, I am posting the various results he requested.
 

Attachments

  • pvecm status.txt
    124 bytes · Views: 7
  • systemctl status pvestatd.txt
    1.2 KB · Views: 4
  • systemctl status pveproxy.txt
    1.3 KB · Views: 3
  • systemctl status pvedaemon.txt
    1.8 KB · Views: 4
  • systemctl status pve-cluster.txt
    720 bytes · Views: 2
  • systemctl status corosync.txt
    623 bytes · Views: 3
I've not a realtek but an Intel X540 10Go. I had the same problem with the integrated network card Intel I211-AT.
So I think there is another reason... which I can't find :-(

Thank's for your suggestion.
 
  • Like
Reactions: mjaxon
Hi,
there's nothing special in the logs (assuming it's a standalone node ;)) except for
Code:
Jun 29 11:28:31 pve pvestatd[1201]: storage 'Diskstation' is not online
Was the connection lost at that point?

I've not a realtek but an Intel X540 10Go. I had the same problem with the integrated network card Intel I211-AT.
So I think there is another reason... which I can't find :-(
So the same issue occurred with two completely different network cards?

Please share the output of pveversion -v and after the issue occurre, the file created by journalctl -b0 > /tmp/journal.log. Is the status of the interface DOWN in the output of ip a after you lose connection?

Pinging @spirit because he had already answered you in the other thread.
 
there's nothing special in the logs (assuming it's a standalone node ;)) except for
Code:
Jun 29 11:28:31 pve pvestatd[1201]: storage 'Diskstation' is not online
Was the connection lost at that point?
Diskstation (my Synology Nas) was online but pve network down.
So the same issue occurred with two completely different network cards?
Yes, exactly.
Please share the output of pveversion -v and after the issue occurre, the file created by journalctl -b0 > /tmp/journal.log.
Pve network ok since 1 day. I'm waiting and will post the results when it happens.
Is the status of the interface DOWN in the output of ip a after you lose connection?
I can not access pve when it is append, and have no monitor but i will see to it next time.
Pinging @spirit because he had already answered you in the other thread.
Ok.
Thank's
Philippe
 

Attachments

  • pveversion.txt
    1.3 KB · Views: 2
Pve network ok since 1 day. I'm waiting and will post the results when it happens.
I can not access pve when it is append, and have no monitor but i will see to it next time.
If you need to reboot to access the node again, you need to use the command journalctl -b-1 > /tmp/journal.log instead to get the log from the previous boot (where the issue occurred).
 
Hi,
Network connection lost this night ;-(
Here the journal.log from the previous boot.
 

Attachments

  • journal.log
    256.2 KB · Views: 6
Lost connection again this morning.
Could this incident be the result of PVE not being able to connect via NFS to my NAS within a given time? (the connection can sometimes take several tens of seconds)
 

Attachments

  • journal 20230702.log
    373.8 KB · Views: 4
Do you also have network lost with a simple ping ?

or it is only on some specific protocol ? (nfs (tcp?udp ?) , corosync ,... ) ?

I ask that, because I had similar problem with kernel 6.2 with corosync in udp mode. (Never find the problem, but too much fragmented udp packet was breaking my kernel stack, only for udp, and other protocol (icmp, tcp) were working fine).
This was occuring between 1-4 days, but I was able to reproduce it with flooding bad udp packets in some minutes.
This was with mellanox nic, but I don't think that it's nic model related.
 
Does the issue happen around the time you take backups?
it seems so.
You could try setting a bandwidth limit there and reduce the amount of workers: https://forum.proxmox.com/threads/t...ad-behavior-during-backup.118430/#post-513106

I have no idea how much limit the bandwidth.
I didn't find where configure the amount of workers because I don't have a cluster directory on my pve as mentioned in the thread.

Yesterday, however, I created a new nfs share folder for the backups and no loss of network... for the moment ;-)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!