Proxmox shutdown VM and swap ethernet port

Djiraf · Jun 8, 2023

Hi, some days ago i got a new truble. Sometimes my proxmox server-1 just not seen in the cluster. Restart from console didnt not help. For normal work server-1 i just needs swap cable from 2nd ethernet port, to 1st. After that i can see how all VM was shutdown and start worak after that, as if the server was turned off. Syslog says nothing special. This problen happend. This problem happened on the night of the 7th to the 9th

Chris · Jun 8, 2023

Hi,
please post the output of pveversion -v as well as cat /etc/network/interfaces and cat /etc/pve/corosync.conf.

Djiraf · Jun 9, 2023

Output of pveversion -v:
root@Server-1:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Interfaces:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface eno0 inet manual

iface eth1 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.1.6/24
gateway 192.168.1.1
bridge-ports eno0
bridge-stp off
bridge-fd 0

corosync:
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: Server-1
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.1.6
}
node {
name: Server-3
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.1.10
}
node {
name: Server-4
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.1.15
}
node {
name: Server-6
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.1.19
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Expert-Servers
config_version: 8
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}

Chris · Jun 9, 2023

It might be that the kernel renamed you network interfaces in an unexpected way. From you syslog we see:

Code:

Jun  7 23:21:36 Server-1 systemd-udevd[657]: eth0: Failed to rename network interface 2 from 'eth0' to 'eno0': File exists
Jun  7 23:21:36 Server-1 systemd-udevd[657]: eth0: Failed to process device, ignoring: File exists
...
Jun  7 23:21:36 Server-1 kernel: [    7.332326] e1000e 0000:06:00.0 eno0: renamed from eth1
...

Did you maybe set a .link file with possible clashing nomeclature? If so, note that this is not supported, see https://github.com/systemd/systemd/issues/16665#issuecomment-669160229

Djiraf · Jun 9, 2023

No, i never set someshing with .link

Djiraf · Jun 9, 2023

Chris said:
It might be that the kernel renamed you network interfaces in an unexpected way. From you syslog we see:

Code:

Jun 7 23:21:36 Server-1 systemd-udevd[657]: eth0: Failed to rename network interface 2 from 'eth0' to 'eno0': File exists Jun 7 23:21:36 Server-1 systemd-udevd[657]: eth0: Failed to process device, ignoring: File exists ... Jun 7 23:21:36 Server-1 kernel: [ 7.332326] e1000e 0000:06:00.0 eno0: renamed from eth1 ...

Did you maybe set a .link file with possible clashing nomeclature? If so, note that this is not supported, see https://github.com/systemd/systemd/issues/16665#issuecomment-669160229

Its repeated again: https://youtu.be/NgO6bqdlGQQ

Chris · Jun 12, 2023

Could you attach the updated syslog? Verify that the correct interface is configured as bridge port, you can do this by e.g. blinking the LED using ethtool -p <DEVNAME>

Djiraf · Jun 13, 2023

It was at 11pm 09.06.2023

Djiraf · Jun 13, 2023

Djiraf · Jun 13, 2023

Chris said:
Could you attach the updated syslog? Verify that the correct interface is configured as bridge port, you can do this by e.g. blinking the LED using ethtool -p <DEVNAME>

I use correct ethernet port

Chris · Jun 13, 2023

Seems like your host gets fenced and reboots. Do you have HA enabled? Please provide also the syslog from the other nodes from around that time.

Djiraf · Jun 13, 2023

Chris said:
Seems like your host gets fenced and reboots. Do you have HA enabled? Please provide also the syslog from the other nodes from around that time.

Sorry, dont know what is HA?

jsterr · Jun 13, 2023

What server are you using and what chipset is the server using for onboard nics?

Whats the output of lshw -c network -businfo
What is the output of ls /etc/network

HA stands for high availabilty and is a feature in proxmox ve, which can be used to automatically start vms (that failed because of host failure) on the remaining hosts, aslong as you have a shared or replicated storage. You can check via ui if you have High Availbilty enabled for ressources by going to Datacenter -> HA and check if there are vms or ct that have this enabled.

Chris · Jun 13, 2023

Djiraf said:
Sorry, dont know what is HA?

Something might be unstable in your network, please check cables, ecc.

So the following seems to happen here:

Node 1 reboots as seen from the syslog (reason still unknown, might have been fenced), this happened in both cases.
The other two nodes which are online loose quorum, because the 4-th node (Server-6) is not online, as can be seen from the syslog and your video.
During boot, the network interfaces get renamed incorrectly, as again we see from the logs, eth0: Failed to rename network interface 2 from 'eth0' to 'eno0': File exists in the first syslog you send, eth1: Failed to rename network interface 3 from 'eth1' to 'eno0': File exists in the second one.
You switch cable and therefore the system brings up the correct interface again, which is attached to the bridge as configured and therefore allows the node to rejoin the cluster.

So the question remains:

Why did node 1 reboot?
Why does udev has this renaming conflict?

Djiraf · Jun 14, 2023

jsterr said:
What server are you using and what chipset is the server using for onboard nics?

Whats the output of lshw -c network -businfo
What is the output of ls /etc/network

HA stands for high availabilty and is a feature in proxmox ve, which can be used to automatically start vms (that failed because of host failure) on the remaining hosts, aslong as you have a shared or replicated storage. You can check via ui if you have High Availbilty enabled for ressources by going to Datacenter -> HA and check if there are vms or ct that have this enabled.

root@Server-1:~# lshw -c network -businfo
Bus info Device Class Description
=======================================================
pci@0000:05:00.0 eno0 network 82574L Gigabit Network Connection
pci@0000:06:00.0 eth1 network 82574L Gigabit Network Connection
vmbr0 network Ethernet interface
tap810i0 network Ethernet interface
tap810i1 network Ethernet interface
tap107i0 network Ethernet interface
tap108i0 network Ethernet interface
tap201i0 network Ethernet interface
tap100i0 network Ethernet interface
root@Server-1:~#

root@Server-1:~# ls /etc/network
if-down.d if-post-down.d if-pre-up.d if-up.d ifupdown2 interfaces interfaces.d run

Djiraf · Jun 14, 2023

Chris said:
Something might be unstable in your network, please check cables, ecc.

So the following seems to happen here:

Node 1 reboots as seen from the syslog (reason still unknown, might have been fenced), this happened in both cases.

The other two nodes which are online loose quorum, because the 4-th node (Server-6) is not online, as can be seen from the syslog and your video.

During boot, the network interfaces get renamed incorrectly, as again we see from the logs, eth0: Failed to rename network interface 2 from 'eth0' to 'eno0': File exists in the first syslog you send, eth1: Failed to rename network interface 3 from 'eth1' to 'eno0': File exists in the second one.

You switch cable and therefore the system brings up the correct interface again, which is attached to the bridge as configured and therefore allows the node to rejoin the cluster.

So the question remains:

Why did node 1 reboot?

Why does udev has this renaming conflict?

I cant answer on this questions. I needs replace the ethernet cable?

Djiraf · Jun 28, 2023

I solved this problem. In my cluster just was problem. This problem was on "Server-6, when i find this problem, I removed Server-6 from cluster

Proxmox shutdown VM and swap ethernet port

Member

Attachments

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Member

Attachments

Member

Proxmox Staff Member

Member

Attachments

Renowned Member

Proxmox Staff Member

Member

Member

Member

We value your privacy