Strange issue with network interface

albgen

Member
Jan 9, 2021
18
0
21
44
Hi,
i'm running Proxmox VE 7.1-12 and today i faced a strange issue. I have checked the journal file but no anomaly found(attached).
At 06.00AM in the morning :), a customer calles and tells me that nothing is working. Started the troubleshooting and found that it was an issue with the virtual router running on the VE. The VM was down and it could not even restart. The log of the task was:
Code:
bridge 'vmbr3' does not exist
kvm: -netdev type=tap,id=net3,ifname=tap100i3,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on: network script /var/lib/qemu-server/pve-bridge failed with status 512
TASK ERROR: start failed: QEMU exited with code 1
so i tried to cat the newtork file:
Code:
cat /etc/network/interfaces
Result was empty. Nothing. Like an empty file.
Pulled a backup of the network interface, reboot and it started working.
What the heck happened, it's a mistery!
Need to mention also that nobody has access in this Proxmox except me. It a couple of days since i was not accessing with ssh or gui.

Any clue?
 

Attachments

Hi albgen, In your log file, from ata1 to ata6 all reported "SATA link down (SStatus 0 SControl 330)", and md3 also reported "Mar 24 06:30:12 AX1011728574 kernel: EXT4-fs (md3): recovery complete", the /etc/network/interfaces empty may caused by file system corrupt. So may you first to check all the SATA links also disk health, may be can to avoid same issue happened again in the future.
 
Hi albgen: ok, I check the syslog again, you only have two NVMe disk, right? is the system unexpected reboot at 06:30:12? could you post mdstat? thanks.
 
yes, 2 NVMe SSD Disks which are in Raid 1. mdstat is not installed.
Code:
-bash: mdstat: command not found
1742900563827.png
 
the reboot 06:30:12 was made by me.
Ok, may you can past cat /proc/mdstat result. But I have no idea, if the system reboot at 06:30 is issued by command, why system do fsck at same time? That is why I'm thinking it's unexpected reboot.
Code:
Mar 24 06:30:12 AX1011728574 systemd-fsck[806]: fsck.fat 4.2 (2021-01-31)
Mar 24 06:30:12 AX1011728574 systemd-fsck[806]: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
Mar 24 06:30:12 AX1011728574 systemd-fsck[806]:  Automatically removing dirty bit.
Mar 24 06:30:12 AX1011728574 systemd-fsck[806]: *** Filesystem was changed ***
Mar 24 06:30:12 AX1011728574 systemd-fsck[806]: Writing changes.
Mar 24 06:30:12 AX1011728574 systemd-fsck[806]: /dev/md0: 3 files, 39/65451 clusters
May you can consider to do memtest86 for a few days, to test the system stability, and see if any memory error or unexpected reboot happened again.
 
Code:
root@AX1011728574 ~ # cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0]
      262080 blocks super 1.0 [2/2] [UU]
      
md2 : active raid1 nvme1n1p3[1] nvme0n1p3[0]
      1046528 blocks super 1.2 [2/2] [UU]
      
md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
      4189184 blocks super 1.2 [2/2] [UU]
      
md3 : active raid1 nvme1n1p4[1] nvme0n1p4[0]
      2111699968 blocks super 1.2 [2/2] [UU]
      bitmap: 16/16 pages [64KB], 65536KB chunk

md4 : active raid1 nvme1n1p5[1] nvme0n1p5[0]
      1633267008 blocks super 1.2 [2/2] [UU]
      bitmap: 0/13 pages [0KB], 65536KB chunk

unused devices: <none>
root@AX1011728574 ~ #

Could be that the server was up&running from years and maybe there is a bug which comes out so randomly(hopefully only once)? Could it be a bug exploited as DOS but that nobody knows the existence. It's a mistery...

Also,I cannot do a memtest. This is a production system. I have to move to the backup. I'm thinking to order a new server if the problem appears again.

ps:The provider does not even want to look at it. They just told me, if you think there is a problem with the hardware just ask for a replacement.
 
I'm not sure too, may be is file system corrupt caused the /etc/network/interface be empty.

Oh, I think the provider is great, May you can ask to request a replacement for whole system... ;)